Job description

Job Title: ML Platform Engineer - GPU Infrastructure

Job Summary

Support team by designing, implementing, and maintaining the automation and ML workload enablement layer of the GPU cluster platform. This role focuses on optimizing GPU compute environments for AI/ML training and Isaac Sim simulation workloads, integrating GPU jobs into CI/CD pipelines, standardizing runtime environments, and supporting reliable storage and artifact management.

Required Experience

3+ years of experience in ML Platform Engineering, DevOps, Infrastructure Engineering, or related field

Bachelor's or Master's degree in Systems Engineering, Computer Science, Computer Engineering, or related discipline

Responsibilities

Support GPU cluster platforms for AI/ML and simulation workloads

Optimize GPU compute environments for ML training and Isaac Sim execution

Integrate GPU workload execution into CI/CD pipelines

Standardize runtime environments using containers and automation tools

Manage storage, artifacts, and workload outputs

Troubleshoot and improve platform reliability, scalability, and performance

Collaborate with ML, infrastructure, and engineering teams

Required Skills

Experience with Linux, Kubernetes, Docker, and GPU infrastructure

Knowledge of CI/CD tools and automation scripting (Python/Bash)

Experience supporting AI/ML workloads and distributed systems

Familiarity with NVIDIA GPU technologies and containerized environments

Strong troubleshooting and performance optimization skills

Preferred Skills

Experience with Isaac Sim or simulation workloads

Exposure to cloud platforms (AWS, Azure, or GCP)

Knowledge of monitoring and observability tools such as Grafana or Prometheus

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

ML Platform Engineer - GPU Infrastructure

Job description

More MLOps and Platform roles

Machine Learning Engineer, Generative ML , Level 5

Director, AI Engineering

Senior Platform Engineer - AI

Director - AI Platform Engineering

Senior Data Scientist (TS/SCI with CI Poly Required)

Project Lead AI