ML/AIWork

Founding Engineer – Full Stack ML DevTools & Systems

David Joseph & Company · San Jose, US

Job description

Founding Engineer – Full Stack ML DevTools & Systems

Location: San Francisco, CA

Type: Full-Time

Base Compensation: $150,000 – $250,000

Equity: Competitive Series A Equity Package

Overview

This is a founding-level engineering role within a Series A AI infrastructure company building core developer tools and platform primitives for post-training, evaluation, and reinforcement learning workflows.

The platform enables ML engineers and researchers to:

  • Create structured training data
  • Run reinforcement fine-tuning workflows
  • Evaluate model performance reliably and reproducibly at scale

This is a high-ownership role at the center of the product. You will operate across the Python SDK, backend systems, infrastructure, and developer experience—partnering directly with frontier labs, enterprise AI teams, and AI-native startups.

This is not a narrow feature role. You will shape foundational platform architecture and developer workflows that power advanced model training systems.

Core Responsibilities

Platform & Backend Systems

  • Design and implement backend systems supporting post-training workflows, dataset primitives, run tracking, and artifact management
  • Build reliable execution and orchestration systems with strong isolation and reproducibility
  • Improve observability, debugging capabilities, and performance across job execution and distributed data pipelines
  • Contribute to containerized infrastructure and Kubernetes-based deployment patterns

Python SDK & Developer Experience

  • Own and evolve the Python SDK with clean APIs, strong documentation, intuitive defaults, and extensibility
  • Design developer-friendly abstractions for reinforcement learning, evaluation loops, and training workflows
  • Develop evaluation-native workflows connecting capability measurement, data creation, training, and re-evaluation loops
  • Improve CLI tools, developer interfaces, and local-to-cloud workflows

Infrastructure & Cloud Systems

  • Work across compute, networking, storage, and IAM configurations
  • Design systems that are scalable, reproducible, and secure
  • Collaborate on distributed systems design and execution infrastructure

Customer & Research Collaboration

  • Partner directly with ML engineers and researchers to translate real-world workflows into platform improvements
  • Incorporate structured customer feedback into roadmap decisions
  • Operate at the intersection of research needs and production reliability

Requirements

  • Strong production experience in Python
  • Comfort operating across the stack, including APIs, backend systems, data systems, and frontend integration
  • Deep understanding of Docker and Linux environments
  • Cloud fundamentals: compute, networking, storage, IAM
  • Strong product instincts with a bias toward shipping
  • Demonstrated end-to-end ownership of production systems

Required Candidate Q&A

  • LinkedIn Profile
  • GitHub URL
  • Publications URL (Google Scholar or similar, if applicable)

Interview Process

  • Initial Screen
  • Technical Evaluation
  • Work Trial
  • Final Discussion
  • Offer Decision

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

More Domain Specializations roles

View all →
$150,000 – $250,000/yr
David Joseph & Company
Apply →