ML/AIWork

ML Platform Engineer - GPU Infrastructure

Optimal Inc. · Detroit, US

Job description

Job Title: ML Platform Engineer - GPU Infrastructure

Job Summary

Support team by designing, implementing, and maintaining the automation and ML workload enablement layer of the GPU cluster platform. This role focuses on optimizing GPU compute environments for AI/ML training and Isaac Sim simulation workloads, integrating GPU jobs into CI/CD pipelines, standardizing runtime environments, and supporting reliable storage and artifact management.

Required Experience

3+ years of experience in ML Platform Engineering, DevOps, Infrastructure Engineering, or related field

Bachelor's or Master's degree in Systems Engineering, Computer Science, Computer Engineering, or related discipline

Responsibilities

Support GPU cluster platforms for AI/ML and simulation workloads

Optimize GPU compute environments for ML training and Isaac Sim execution

Integrate GPU workload execution into CI/CD pipelines

Standardize runtime environments using containers and automation tools

Manage storage, artifacts, and workload outputs

Troubleshoot and improve platform reliability, scalability, and performance

Collaborate with ML, infrastructure, and engineering teams

Required Skills

Experience with Linux, Kubernetes, Docker, and GPU infrastructure

Knowledge of CI/CD tools and automation scripting (Python/Bash)

Experience supporting AI/ML workloads and distributed systems

Familiarity with NVIDIA GPU technologies and containerized environments

Strong troubleshooting and performance optimization skills

Preferred Skills

Experience with Isaac Sim or simulation workloads

Exposure to cloud platforms (AWS, Azure, or GCP)

Knowledge of monitoring and observability tools such as Grafana or Prometheus

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

More MLOps and Platform roles

View all →
ML Platform Engineer - GPU Infrastructure
Optimal Inc.
Apply →