J
Jobs Base
784 active jobs
Luminal logo

Cloud Inference Engineer

Luminal | San Francisco, California, United States | 2mo ago
This role has closed. Here are similar open builder roles:
1.
Austin, Texas, United States | on-site | internship | internship | ai, agentic ai, llms | 3w ago
2.
South Portland, ME, United States | on-site | full-time | mid | ai systems design, route optimization, gps tracking | 3w ago
3.
Software Engineering Intern (Maritime Technology Startup (Stealth))
El Segundo, California, United States | $40 – $48/hr | on-site | internship | internship | python, go, javascript | 3w ago
4.
San Francisco, California, United States | $130,000 – $170,000/yr | on-site | full-time | lead | typescript, react, sql | 3w ago
5.
New York, New York, United States | on-site | full-time | mid | machine learning, devops, ci/cd | 3w ago
6.
Austin, Texas, United States | on-site | internship | internship | agentic ai, llms, agent frameworks | 3w ago
7.
Forward Deployed Engineer (Legion Intelligence)
Washington DC, United States | $185,000 – $260,000/yr | on-site | full-time | mid | python, javascript, typescript | 3w ago
8.
New York City, New York, United States | $185,000 – $245,000/yr | on-site | full-time | senior | typescript, nuxt, postgres | 3w ago
9.
San Francisco, California, United States+1 | $200,000 – $300,000/yr | on-site | full-time | senior | react, typescript, python | 3w ago
10.
San Francisco, United States+2 | $130,000 – $165,000/yr | on-site | full-time | senior | performance marketing, growth engineering, marketing operations | 3w ago
Original posting (closed) below
$150,000 – $350,000/yr| full-time | on-site | lead | 0+ years | visa sponsorship
skills: cuda, gpu inference optimization, vllm, sglang, tensorrt-llm, kv caching, paged attention, batching, token streaming, distributed compute, gpu

Qualifications

  • CUDA + GPU inference optimization
  • vLLM, SGLang, or TensorRT-LLM experience
  • KV caching, paged attention, batching, token streaming, etc.
  • Distributed compute (with GPUs is a super plus)
  • No degree required

Company

Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.

Role

Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.

Day to day responsibilities:

  • Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
  • Conducting model performance reviews
  • Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
  • Sometimes write kernels and, yes, occasional tasteful shitposting

Benefits

health insurance
Get new builder jobs daily: