J
Jobs Base 0-to-1 builder jobs
2,399 active jobs 24 new today
Luminal logo

Cloud Inference Engineer

Luminal | San Francisco, California, United States | 3w ago
$150,000 – $350,000/yr| full-time | on-site | lead | 0+ years | visa sponsorship
skills: cuda, gpu inference optimization, vllm, sglang, tensorrt-llm, kv caching, paged attention, batching, token streaming, distributed compute, gpu

Qualifications

  • CUDA + GPU inference optimization
  • vLLM, SGLang, or TensorRT-LLM experience
  • KV caching, paged attention, batching, token streaming, etc.
  • Distributed compute (with GPUs is a super plus)
  • No degree required

Company

Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.

Role

Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.

Day to day responsibilities:

  • Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
  • Conducting model performance reviews
  • Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
  • Sometimes write kernels and, yes, occasional tasteful shitposting

Benefits

health insurance
Get new builder jobs daily: