Staff Software Engineer - Backend & AI Infra - Trading

Career Renew | United States+1 | 1mo ago

locations: United States · United Kingdom

full-time | remote | senior

skills: go, python, typescript, node.js, postgres, redis, clickhouse, kubernetes, aws, eks, websocket, api, distributed systems, state management, job scheduling, real-time systems, infra as code, ci/cd, model serving, llm inference, vllm, tgi, tensorrt-llm, fintech, trading systems, exchange api, onchain infrastructure, wallet operations, rpc nodes, transaction monitoring, dex integration, multi-agent platforms

apply →

Career Renew is recruiting for one of its clients a Staff Software Engineer - Backend & AI Infra - Trading - this is a fully remote role for US/UK based candidates.

We are building the Hyperliquid Agent Runtime.

We’re hiring a Staff Software Engineer to own two critical workstreams: the agent runtime and backend infrastructure that powers every trade in our fleet, and the migration of model hosting and agent deployment in-house — moving us off third-party LLM providers and hosted agent platforms to Senpi-owned infrastructure.

This is a building role. You’ll write the backend services, runtime engine, and deployment systems that our entire agent fleet runs on. When you ship, every agent in the fleet immediately gets faster, more reliable, and more autonomous.

What You’ll Build

Agent Runtime & Backend (~50%)

The runtime is the engine that makes every agent work. You’ll own the core systems:

Plugin Runtime — the per-agent process that runs position tracking (10s polling), the RatchetStop exit engine (tiered trailing stops with sub-second evaluation), and DSL state management. Currently Go + Python; migrating to a centralized Go service with Postgres state and real-time websocket price feeds

Scanner Gateway / Rules Engine — a YAML-configurable evaluation layer that sits between scanners and execution. Scanners produce raw signal variables; the rules engine applies gates, scoring, and filters defined in YAML. Users customize trading behavior without touching Python. This is the next major runtime feature

RatchetStop Backend — centralized profit-trailing service that protects positions even when the agent is offline. Evaluates tier upgrades and places stop-loss orders on Hyperliquid via websocket, replacing per-agent polling with condition-based evaluation across all positions

Execution Layer — the MCP (Model Context Protocol) server that bridges agents to 48+ Senpi platform tools: position creation, clearinghouse state, market data, Smart Money intelligence. You’ll own auth, rate limiting, and the contract between agents and the exchange

Data Layer — enriched Hyperfeed pipeline (top 1K trader positions, momentum events, market concentration) flowing through Redis, Postgres, and ClickHouse. Real-time ingestion, 4-hour rolling windows, and the APIs that every scanner calls

Model & Agent Hosting Migration (~30%)

We’re moving off third-party hosted agents and external LLM inference to Senpi-owned infrastructure. You’ll lead the technical execution:

Agent deployment platform — migrate agents from Railway/OpenClaw to Senpi-hosted infrastructure. Each agent needs isolated workspace, cron scheduling, state persistence, MCP connectivity, and Telegram notifications. Target: deploy any skill from a GitHub repo with one command

Model hosting — evaluate and implement the path from external LLM APIs (Anthropic, Google) to self-hosted inference. Options range from proxied external models with full telemetry capture, to fine-tuned models running on Senpi GPUs. You’ll own the decision and execution

Agent telemetry — capture every scanner evaluation, every trade decision, every signal score across all agents. This data feeds the self-reinforcing loop: agents learn from fleet-wide performance, fork winning strategies, and improve autonomously

Deployment pipeline — CI/CD for shipping scanner updates, runtime patches, and skill configs to 50+ live agents without interrupting open positions. Zero-downtime rollouts where downtime = unprotected capital

Infrastructure & Operations (~20%)

Build monitoring and alerting that catches agent failures, orphaned positions, state corruption, and auth expiration before they cost money

Manage cloud infrastructure (AWS/EKS) with infrastructure-as-code

Own incident response — in a trading system, every minute of downtime is real dollars at risk

Health monitoring for the agent fleet: which agents are scanning, which are stuck, which have the midnight rollover bug

What We’re Looking For

Must Have

Strong backend engineering — you write production code daily in at least two of: Go, Python, Node.js/TypeScript. Go preferred for the runtime services

Experience building backend services from scratch at a startup: APIs, job scheduling, state management, distributed systems

Solid understanding of real-time systems where latency matters: websocket connections, condition-based evaluation, sub-second response requirements

Production experience with Postgres, Redis, and at least one analytics DB (ClickHouse, TimescaleDB, BigQuery)

Kubernetes experience — deploying, scaling, and debugging production workloads on AWS EKS

You’ve owned a system end-to-end: designed it, built it, deployed it, operated it, fixed it at 3am

Strong Plus

Experience with model serving / LLM infrastructure — deploying, scaling, and optimizing inference (vLLM, TGI, TensorRT-LLM, or managed endpoints)

Background in trading systems, exchange APIs, or fintech where uptime has direct financial consequences

Experience with onchain infrastructure: wallet operations, RPC nodes, transaction monitoring, DEX integration

Familiarity with MCP (Model Context Protocol) or similar agent-to-tool connectivity patterns

Experience building multi-agent platforms — orchestrating many independent processes sharing infrastructure but operating autonomously

Experience with CI/CD for systems where “deploy” means updating live trading agents, not just web servers

What This Role Is Not

This is not a pure DevOps role. You’ll spend 80% of your time writing Go, Python, and TypeScript that ships to production. The infrastructure you manage is the infrastructure you built — because at our stage the best person to operate a system is the person who designed it.

You’re building the backend for autonomous AI agents that manage real money in real time. The runtime you build determines whether positions are protected. The model hosting you stand up determines whether agents can think. The deployment pipeline you create determines whether the fleet can evolve. This is foundational infrastructure for a new category of software.

Benefits

health insurance

Get new builder jobs daily: