ML/AIWork

AI Platform Engineer

· San Jose, US

Job description

The role

At Applied Compute, our applied researchers work directly with enterprises to design, deploy, and continuously improve AI agents that solve real operational problems. As an AI Platform Engineer, you'll build the infrastructure that makes this possible.

You'll own the foundational systems that power Applied Compute's post-training and agent infrastructure: large-scale evaluation pipelines, model serving systems, training orchestration, secure execution environments, and the deployment platform that brings continuously improving AI systems into customer environments. Your work will enable researchers to rapidly build, evaluate, and deploy production AI systems while meeting the security, reliability, and compliance requirements of large enterprises.

What you'll do

  • Build orchestration systems for post-training, evaluation, data generation, and continuous improvement workflows
  • Build large-scale evaluation infrastructure that measures model and agent performance across customer deployments and research workflows
  • Design and operate model serving systems that deliver low-latency, reliable inference for production AI applications
  • Architect the data infrastructure that powers training, evaluation, observability, and model improvement across customer environments
  • Develop secure execution environments for agents, evaluations, and training workloads using microVMs, containers, and modern sandboxing technologies
  • Design authentication, authorization, audit logging, and security controls that enable AI systems to operate safely within enterprise environments
  • Build deployment and provisioning systems that allow continuously improving models and agents to run inside customer VPCs and cloud environments
  • Improve reliability, scalability, observability, and operational efficiency across serving, evaluation, and training infrastructure
  • Partner closely with applied researchers to build the infrastructure that turns production data into better models, evaluations, and AI systems

What we're looking for

  • 5+ years of experience building distributed systems, infrastructure platforms, ML infrastructure, or large-scale backend services
  • Strong systems engineering fundamentals, including distributed systems, networking, operating systems, and cloud infrastructure
  • Experience designing and operating production systems with high reliability, scalability, and availability requirements
  • Experience building or operating orchestration systems, data pipelines, model serving infrastructure, or other large-scale platform services
  • Familiarity with containers, Kubernetes, infrastructure-as-code, and modern deployment workflows
  • Strong understanding of security fundamentals, including isolation, identity, secrets management, and auditing
  • Ability to reason about performance, scalability, fault tolerance, and operational tradeoffs in complex distributed systems
  • Excitement about partnering closely with applied researchers to build infrastructure for evaluation, post-training, and production AI systems

Strong candidates also have

  • Experience with sandboxing or isolation technologies such as Firecracker, gVisor, or Kata Containers
  • Experience with workflow orchestration systems such as Temporal, or similar platforms
  • Experience building platforms deployed into customer-controlled cloud environments
  • Experience with ML infrastructure, including model serving, distributed training, evaluation systems, or GPU scheduling
  • Experience building developer platforms, internal tooling, or systems that accelerate the productivity of technical teams

About us

Applied Compute builds Specific Intelligence for the enterprise. We provide the continual learning infrastructure for companies to build agent workforces trained on proprietary data and institutional expertise. Our researchers and platform embed directly within customer environments to build custom evals, train models, and deploy agents that get better with use.

  • Why we’re excited: We get to work at a rare intersection. Our product team builds the platform powering a new generation of digital coworkers. Our research team pushes the frontier of post-training and reinforcement learning. Our applied AI team sits side-by-side with customers as they ship agents into production. This combination of strong product, deep research, and boots on the ground is what we believe it takes to bring AI to the enterprise. We are product-led, research-enabled, and forward-deployed.
  • Who we are: We’re a team of engineers, researchers, and operators. Many of us are former founders. We've built RL infrastructure at OpenAI, data foundations at Scale AI, and systems at Together, Two Sigma, Watershed, and others. We work with F50 customers and are fortunate to be backed by partners like Kleiner Perkins, Benchmark, Sequoia, Lux, and Greenoaks.
  • Who Thrives Here: We're looking for people who are excited about applying novel research and complex systems to real-world problems. Our team genuinely enjoys working with customers: listening, empathizing, and understanding how work actually gets done in their organizations. Former founders, people who've built a lot of side projects, or anyone who's shown they can own something end-to-end, tend to do well here.

Benefits & Logistics

This role is based in San Francisco. We work from our office in the Mission. We offer:

  • Competitive compensation and equity
  • Generous health benefits
  • Unlimited PTO
  • Paid parental leave
  • Daily lunches and dinners
  • Transportation and relocation support
  • Retirement plans

We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the process with you. We encourage you to apply even if you do not believe you meet every single qualification. As set forth in Applied Compute’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

More AI Safety and Evaluation roles

View all →
AI Platform Engineer
San Jose, US
Apply →