J
Jobs Base
2,963 active jobs 44 new today
Career Renew logo

Staff Software Engineer - Backend & AI Infra - Trading

Career Renew | United States+1 | Today
locations: United States · United Kingdom
full-time | remote | senior
skills: go, python, typescript, node.js, postgres, redis, clickhouse, kubernetes, aws, eks, websocket, api, distributed systems, state management, job scheduling, real-time systems, infra as code, ci/cd, model serving, llm inference, vllm, tgi, tensorrt-llm, fintech, trading systems, exchange api, onchain infrastructure, wallet operations, rpc nodes, transaction monitoring, dex integration, multi-agent platforms
Career Renew is recruiting for one of its clients a  Staff Software Engineer - Backend & AI Infra - Trading - this is a fully remote role for US/UK based candidates. 
 

We are building the Hyperliquid Agent Runtime.


We’re hiring a Staff Software Engineer to own two critical workstreams: the agent runtime and backend infrastructure that powers every trade in our fleet, and the migration of model hosting and agent deployment in-house — moving us off third-party LLM providers and hosted agent platforms to Senpi-owned infrastructure.


This is a building role. You’ll write the backend services, runtime engine, and deployment systems that our entire agent fleet runs on. When you ship, every agent in the fleet immediately gets faster, more reliable, and more autonomous.


What You’ll Build


Agent Runtime & Backend (~50%)


The runtime is the engine that makes every agent work. You’ll own the core systems:




  • Plugin Runtime — the per-agent process that runs position tracking (10s polling), the RatchetStop exit engine (tiered trailing stops with sub-second evaluation), and DSL state management. Currently Go + Python; migrating to a centralized Go service with Postgres state and real-time websocket price feeds




  • Scanner Gateway / Rules Engine — a YAML-configurable evaluation layer that sits between scanners and execution. Scanners produce raw signal variables; the rules engine applies gates, scoring, and filters defined in YAML. Users customize trading behavior without touching Python. This is the next major runtime feature




  • RatchetStop Backend — centralized profit-trailing service that protects positions even when the agent is offline. Evaluates tier upgrades and places stop-loss orders on Hyperliquid via websocket, replacing per-agent polling with condition-based evaluation across all positions




  • Execution Layer — the MCP (Model Context Protocol) server that bridges agents to 48+ Senpi platform tools: position creation, clearinghouse state, market data, Smart Money intelligence. You’ll own auth, rate limiting, and the contract between agents and the exchange




  • Data Layer — enriched Hyperfeed pipeline (top 1K trader positions, momentum events, market concentration) flowing through Redis, Postgres, and ClickHouse. Real-time ingestion, 4-hour rolling windows, and the APIs that every scanner calls




Model & Agent Hosting Migration (~30%)


We’re moving off third-party hosted agents and external LLM inference to Senpi-owned infrastructure. You’ll lead the technical execution:




  • Agent deployment platform — migrate agents from Railway/OpenClaw to Senpi-hosted infrastructure. Each agent needs isolated workspace, cron scheduling, state persistence, MCP connectivity, and Telegram notifications. Target: deploy any skill from a GitHub repo with one command




  • Model hosting — evaluate and implement the path from external LLM APIs (Anthropic, Google) to self-hosted inference. Options range from proxied external models with full telemetry capture, to fine-tuned models running on Senpi GPUs. You’ll own the decision and execution




  • Agent telemetry — capture every scanner evaluation, every trade decision, every signal score across all agents. This data feeds the self-reinforcing loop: agents learn from fleet-wide performance, fork winning strategies, and improve autonomously




  • Deployment pipeline — CI/CD for shipping scanner updates, runtime patches, and skill configs to 50+ live agents without interrupting open positions. Zero-downtime rollouts where downtime = unprotected capital




Infrastructure & Operations (~20%)




  • Build monitoring and alerting that catches agent failures, orphaned positions, state corruption, and auth expiration before they cost money




  • Manage cloud infrastructure (AWS/EKS) with infrastructure-as-code




  • Own incident response — in a trading system, every minute of downtime is real dollars at risk




  • Health monitoring for the agent fleet: which agents are scanning, which are stuck, which have the midnight rollover bug




What We’re Looking For


Must Have




  • Strong backend engineering — you write production code daily in at least two of: Go, Python, Node.js/TypeScript. Go preferred for the runtime services




  • Experience building backend services from scratch at a startup: APIs, job scheduling, state management, distributed systems




  • Solid understanding of real-time systems where latency matters: websocket connections, condition-based evaluation, sub-second response requirements




  • Production experience with Postgres, Redis, and at least one analytics DB (ClickHouse, TimescaleDB, BigQuery)




  • Kubernetes experience — deploying, scaling, and debugging production workloads on AWS EKS




  • You’ve owned a system end-to-end: designed it, built it, deployed it, operated it, fixed it at 3am




Strong Plus




  • Experience with model serving / LLM infrastructure — deploying, scaling, and optimizing inference (vLLM, TGI, TensorRT-LLM, or managed endpoints)




  • Background in trading systems, exchange APIs, or fintech where uptime has direct financial consequences




  • Experience with onchain infrastructure: wallet operations, RPC nodes, transaction monitoring, DEX integration




  • Familiarity with MCP (Model Context Protocol) or similar agent-to-tool connectivity patterns




  • Experience building multi-agent platforms — orchestrating many independent processes sharing infrastructure but operating autonomously




  • Experience with CI/CD for systems where “deploy” means updating live trading agents, not just web servers




What This Role Is Not


This is not a pure DevOps role. You’ll spend 80% of your time writing Go, Python, and TypeScript that ships to production. The infrastructure you manage is the infrastructure you built — because at our stage the best person to operate a system is the person who designed it.


You’re building the backend for autonomous AI agents that manage real money in real time. The runtime you build determines whether positions are protected. The model hosting you stand up determines whether agents can think. The deployment pipeline you create determines whether the fleet can evolve. This is foundational infrastructure for a new category of software.

Benefits

health insurance
Get new builder jobs daily: