Senior Infrastructure Engineer, Applied AI Engineer
We're currently building MVS (Mixpeek Vector Store) — a distributed vector database built on Ray + S3, designed to handle 100B+ vectors at a fraction of the cost of existing solutions. Architecture details: shard-level WAL, LIRE-based adaptive search, replica sets, and agent-native query primitives. If you've ever wanted to rethink how vector search works from the storage layer up, this is that project.
Some things we're shipping right now:
- IP safety for media & sports — our copyright detection platform (https://copyright.mixpeek.com) helps brands and leagues detect unauthorized use of visual IP at scale. We're working with partners in the media/sports ecosystem including Backblaze for storage-native integration.
- Healthcare pipelines — multimodal extraction for clinical trial recruitment and SNF/MDS coding workflows, working with enterprise partners in the space.
- Ad verification — we contribute to the IAB Tech Lab ARTF working group and power contextual intelligence for ad safety.
Our core primitives: feature extractors, retrievers, taxonomies, clusters. Decompose with extractors, recompose with retrievers. Docs: https://docs.mixpeek.com
Stack: Python, Ray, S3, FastAPI, React/TypeScript. We also maintain amux [https://github.com/mixpeek/amux], an open-source tmux multiplexer for running parallel Claude Code agent sessions — if you're into agentic dev workflows, check it out.
I'm Ethan (founder/CEO, previously led search at MongoDB). Small team, high ownership, real problems. We're preparing for NAB Show next week and scaling enterprise pipeline work across healthcare, adtech, and media.
Reach out: ethan [at] mixpeek [dot] com — mention HN.