Founding Engineer – Full Stack ML DevTools & Systems
David Joseph & Company · San Jose, US
Job description
Founding Engineer – Full Stack ML DevTools & Systems
Location: San Francisco, CA
Type: Full-Time
Base Compensation: $150,000 – $250,000
Equity: Competitive Series A Equity Package
Overview
This is a founding-level engineering role within a Series A AI infrastructure company building core developer tools and platform primitives for post-training, evaluation, and reinforcement learning workflows.
The platform enables ML engineers and researchers to:
- Create structured training data
- Run reinforcement fine-tuning workflows
- Evaluate model performance reliably and reproducibly at scale
This is a high-ownership role at the center of the product. You will operate across the Python SDK, backend systems, infrastructure, and developer experience—partnering directly with frontier labs, enterprise AI teams, and AI-native startups.
This is not a narrow feature role. You will shape foundational platform architecture and developer workflows that power advanced model training systems.
Core Responsibilities
Platform & Backend Systems
- Design and implement backend systems supporting post-training workflows, dataset primitives, run tracking, and artifact management
- Build reliable execution and orchestration systems with strong isolation and reproducibility
- Improve observability, debugging capabilities, and performance across job execution and distributed data pipelines
- Contribute to containerized infrastructure and Kubernetes-based deployment patterns
Python SDK & Developer Experience
- Own and evolve the Python SDK with clean APIs, strong documentation, intuitive defaults, and extensibility
- Design developer-friendly abstractions for reinforcement learning, evaluation loops, and training workflows
- Develop evaluation-native workflows connecting capability measurement, data creation, training, and re-evaluation loops
- Improve CLI tools, developer interfaces, and local-to-cloud workflows
Infrastructure & Cloud Systems
- Work across compute, networking, storage, and IAM configurations
- Design systems that are scalable, reproducible, and secure
- Collaborate on distributed systems design and execution infrastructure
Customer & Research Collaboration
- Partner directly with ML engineers and researchers to translate real-world workflows into platform improvements
- Incorporate structured customer feedback into roadmap decisions
- Operate at the intersection of research needs and production reliability
Requirements
- Strong production experience in Python
- Comfort operating across the stack, including APIs, backend systems, data systems, and frontend integration
- Deep understanding of Docker and Linux environments
- Cloud fundamentals: compute, networking, storage, IAM
- Strong product instincts with a bias toward shipping
- Demonstrated end-to-end ownership of production systems
Required Candidate Q&A
- LinkedIn Profile
- GitHub URL
- Publications URL (Google Scholar or similar, if applicable)
Interview Process
- Initial Screen
- Technical Evaluation
- Work Trial
- Final Discussion
- Offer Decision
ML/AI Work links you to the employer's original posting — always verify the details there before applying.
More Domain Specializations roles
View all →AI & Automation Engineer
Freestone Capital Management · Washington, US
Emerging Tech Engineer
U.S. Bank · Atlanta, US
Matterport – Senior Machine Learning/Computer Vision Engineer – 3D Reconstruction and Semantic Understanding
CoStar Group · Remote · Oakland
Junior AI/ML Engineer
Talan · Geneva, CH
Forma framtidens medicinska innovation med avancerad AI – Nu söker Karolinska Institutet 2 nya AI Ingenjörer
Karolinska Institutet (KI) · Uppsala, SE
AI/ML Engineer
MAERSK · Copenhagen, DK