ML/AIWork

ML ENGINEER (GENERAL)

· San Jose, US

Job description

ABOUT THE COMPANY

We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site

ABOUT THE ROLE

You'll build and maintain the ML systems and pipelines that our research runs on top of: data pipelines, training infrastructure, evaluation tooling, deployment, observability. The work bridges research and production, and you'll be the person who makes "we ran an experiment" actually mean "we ran it correctly, at scale, with results we trust."

This is a senior ML engineering role. You'll own systems end-to-end. You'll work with researchers daily and translate research code into infrastructure that the team can rely on. You'll move fast and you'll be measured on whether your systems make the team faster.

WHAT YOU'LL DO

  • Build and maintain the training, evaluation, and deployment pipelines that our research runs on
  • Take research code from prototype to production: refactor, harden, instrument, test
  • Design observability into our ML systems (metrics, logs, traces, eval dashboards) so failures surface fast
  • Own data pipelines for training and evaluation: ingest, dedup, version, validate
  • Work closely with researchers to understand what they need, what's slow, and what's brittle
  • Set engineering standards across our ML stack (testing, reviews, runbooks) so the team scales
  • Contribute to architectural decisions that shape how research and

production interact

WHAT WE'RE LOOKING FOR

  • Senior ML engineer with 6+ years building production-grade ML systems
  • Track record across the full lifecycle: data, training, evaluation, deployment, monitoring
  • Strong distributed systems experience; you've shipped systems that have to

be on

  • Fluent Python, fluent with at least one of (PyTorch, JAX); comfortable at the systems-level when needed
  • Comfortable with experimentation infrastructure (Ray, Slurm, Kubernetes, or

similar)

  • Bias toward shipping; you prefer working code over working diagrams
  • Strong written communication

NICE TO HAVE

  • Experience building experimentation platforms or research infrastructure

at a frontier ML lab

  • Background in distributed training systems
  • Open-source contributions to ML infrastructure
  • History of working effectively with small senior teams

THIS ROLE IS PROBABLY NOT FOR YOU IF

  • You want to do research with engineering as a side activity: this is engineering as the main thing
  • Cross-functional work with researchers (translation, scoping, education) doesn't appeal
  • Long-running ownership of running systems isn't appealing: this role has it

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

More AI Safety and Evaluation roles

View all →
ML ENGINEER (GENERAL)
San Jose, US
Apply →