Machine Learning Engineer
Machine Learning Engineer
NYC (Brooklyn) – onsite
Up to $300K + Equity
My client is a VC backed start up building the data layer for real-world AI training. We work with frontier labs to turn messy, multi-modal enterprise data into the highest-quality training data on the market — sourced from the hundreds of venture-backed startups we help wind down.
We're a fast-growing team based in-person in Dumbo, Brooklyn. Backed by Floodgate, Afore Capital, Hustle Fund, and incredible entrepreneurs.
The Role
As an ML Engineer, you'll take the cleaned, resolved data coming out of our pipeline and figure out what to build with it. The raw material is unique — real codebases, tickets, messages, docs, and decisions from real companies, with every linkage preserved. The open question is how to turn that into the most valuable training data on the market. You'll have wide latitude and direct access to the CEO and CTO on direction. All of this happens on deeply sensitive data, so everything we build is designed with security and privacy at the core.
Requirements:
- 3 - 8 years of experience in applied machine learning, with work training or fine- tuning models
- Experience training Machine Learning models with less defined or abstract data sources
- Experience with training data curation and evaluations
- Coming from an RL environment, data labeling, or data- for- AI companies
- Worked at an early- stage startup (sub- 50 people)
- BS+ in CS, ML, or related quantitative field
- Strong Python and other relevant ML libraries
- Expereince developing or fine- tuning transformer based models for Applied AI (huge bonus)
- Overlap with sensitive data processing: NER, NLP, entity resolution
- High agency and comfort with ambiguity; would rather pick the right problem than be handed one
What You'll Work On
- You'll own problems end-to-end. Some examples of what you might tackle in your first 90 days:
- Extracting realistic, verifiable agent tasks from linked repos, tickets, and PRs
- Building environments from real company snapshots where the reward signal comes from how work actually got done
- Augmenting datasets with synthetic variants without losing the realism that makes them valuable
- Running experiments to understand which enrichments actually move the needle for the labs buying from us.
You Might Be a Fit If
- You've trained or fine-tuned models and shipped applied ML work
- You're creative and high-agency — you'd rather pick the right problem than be handed one
- You're excited about applied work with real data
- AI is deeply integrated into your workflow and life
Why candidates should join
Recently started our data business which is already doing close to 5x our entire revenue from last year. We're acquiring and licenses data from venture-backed startups that are winding down, then cleans and enriches that multimodal data (email, Slack, code, images, video, financial records — a company's entire digital footprint).
- Working with a majority of the large AI Labs and data layer companies.
- Projecting 10x on last years ARR
- Have raised $5.5M from Floodgate, Afore Capital, Hustle Fund, and notable entrepreneurs.
Will have a lot of freedom to help us decide what data to utilize and how to effectively shape it so that it's in a trainable format for labs and other data providers.