Cloud Inference Engineer
Luminal | San Francisco, California, United States | 2mo ago
This role has closed. Here are similar open builder roles:
| 1. | AI Builder Intern - Agentic AI (Mondee) Austin, Texas, United States | on-site | internship | internship | ai, agentic ai, llms | 3w ago |
| 2. | AI Systems Weirdo (UniteGPS) South Portland, ME, United States | on-site | full-time | mid | ai systems design, route optimization, gps tracking | 3w ago |
| 3. | Software Engineering Intern (Maritime Technology Startup (Stealth)) El Segundo, California, United States | $40 – $48/hr | on-site | internship | internship | python, go, javascript | 3w ago |
| 4. | AI-Native Founding Engineer (Jobright.ai) San Francisco, California, United States | $130,000 – $170,000/yr | on-site | full-time | lead | typescript, react, sql | 3w ago |
| 5. | MLE @ Krnel (NYC, Full-Time) (krnel.ai) New York, New York, United States | on-site | full-time | mid | machine learning, devops, ci/cd | 3w ago |
| 6. | AI Builder Intern - Agentic AI (Tabhi) Austin, Texas, United States | on-site | internship | internship | agentic ai, llms, agent frameworks | 3w ago |
| 7. | Forward Deployed Engineer (Legion Intelligence) Washington DC, United States | $185,000 – $260,000/yr | on-site | full-time | mid | python, javascript, typescript | 3w ago |
| 8. | Senior Founding Engineer (Ambral) New York City, New York, United States | $185,000 – $245,000/yr | on-site | full-time | senior | typescript, nuxt, postgres | 3w ago |
| 9. | Full-Stack Engineer- Series B Ai · $200-300K + equity (Benchstack Ai) San Francisco, California, United States+1 | $200,000 – $300,000/yr | on-site | full-time | senior | react, typescript, python | 3w ago |
| 10. | Marketing Productivity Engineer (Sigma Computing) San Francisco, United States+2 | $130,000 – $165,000/yr | on-site | full-time | senior | performance marketing, growth engineering, marketing operations | 3w ago |
Original posting (closed) below
$150,000 – $350,000/yr| full-time | on-site | lead | 0+ years | visa sponsorship
skills: cuda, gpu inference optimization, vllm, sglang, tensorrt-llm, kv caching, paged attention, batching, token streaming, distributed compute, gpu
Qualifications
- CUDA + GPU inference optimization
- vLLM, SGLang, or TensorRT-LLM experience
- KV caching, paged attention, batching, token streaming, etc.
- Distributed compute (with GPUs is a super plus)
- No degree required
Company
Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.
Role
Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.
Day to day responsibilities:
- Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
- Conducting model performance reviews
- Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
- Sometimes write kernels and, yes, occasional tasteful shitposting
Benefits
health insurance
Get new builder jobs daily: