Cloud Inference Engineer

Luminal | San Francisco, California, United States | 2mo ago

This role has closed. Here are similar open builder roles:

1.	AI Builder Intern - Agentic AI (Mondee) Austin, Texas, United States \| on-site \| internship \| internship \| ai, agentic ai, llms \| 3w ago
2.	AI Systems Weirdo (UniteGPS) South Portland, ME, United States \| on-site \| full-time \| mid \| ai systems design, route optimization, gps tracking \| 3w ago
3.	Software Engineering Intern (Maritime Technology Startup (Stealth)) El Segundo, California, United States \| $40 – $48/hr \| on-site \| internship \| internship \| python, go, javascript \| 3w ago
4.	AI-Native Founding Engineer (Jobright.ai) San Francisco, California, United States \| $130,000 – $170,000/yr \| on-site \| full-time \| lead \| typescript, react, sql \| 3w ago
5.	MLE @ Krnel (NYC, Full-Time) (krnel.ai) New York, New York, United States \| on-site \| full-time \| mid \| machine learning, devops, ci/cd \| 3w ago
6.	AI Builder Intern - Agentic AI (Tabhi) Austin, Texas, United States \| on-site \| internship \| internship \| agentic ai, llms, agent frameworks \| 3w ago
7.	Forward Deployed Engineer (Legion Intelligence) Washington DC, United States \| $185,000 – $260,000/yr \| on-site \| full-time \| mid \| python, javascript, typescript \| 3w ago
8.	Senior Founding Engineer (Ambral) New York City, New York, United States \| $185,000 – $245,000/yr \| on-site \| full-time \| senior \| typescript, nuxt, postgres \| 3w ago
9.	Full-Stack Engineer- Series B Ai · $200-300K + equity (Benchstack Ai) San Francisco, California, United States+1 \| $200,000 – $300,000/yr \| on-site \| full-time \| senior \| react, typescript, python \| 3w ago
10.	Marketing Productivity Engineer (Sigma Computing) San Francisco, United States+2 \| $130,000 – $165,000/yr \| on-site \| full-time \| senior \| performance marketing, growth engineering, marketing operations \| 3w ago

browse all open builder jobs →

Original posting (closed) below

skills: cuda, gpu inference optimization, vllm, sglang, tensorrt-llm, kv caching, paged attention, batching, token streaming, distributed compute, gpu

Qualifications

CUDA + GPU inference optimization
vLLM, SGLang, or TensorRT-LLM experience
KV caching, paged attention, batching, token streaming, etc.
Distributed compute (with GPUs is a super plus)
No degree required

Company

Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.

Role

Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.

Day to day responsibilities:

Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
Conducting model performance reviews
Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
Sometimes write kernels and, yes, occasional tasteful shitposting

Benefits

health insurance

Get new builder jobs daily: