Position Summary
Richtech Robotics is looking for a Cloud / DevOps Engineer to support our AI compute infrastructure services. This role will help deploy, manage, and support cloud-based GPU environments for customers building AI models, robotics applications, simulation workflows, and Physical AI systems. The ideal candidate has strong Linux, networking, cloud infrastructure, and DevOps experience, with a willingness to learn GPU computing, CUDA environments, and AI workload deployment.

Responsibilities

Deploy and manage cloud-based GPU compute environments for customer workloads.
Configure virtual networks, VPNs, firewalls, security groups, SSH access, storage, and user permissions.
Build and maintain Linux-based environments for AI development, including Docker containers, CUDA drivers, Python environments, and Jupyter notebooks.
Work with AI engineers to deploy required runtime environments for model training, fine-tuning, simulation, and inference.
Monitor GPU usage, system performance, uptime, storage, and network connectivity.
Troubleshoot customer issues related to access, environment setup, networking, storage, and compute availability.
Create reusable deployment scripts, images, templates, and technical documentation.
Coordinate with cloud infrastructure partners and internal teams to resolve technical issues.

Required Qualifications

2+ years of experience in cloud infrastructure, DevOps, systems administration, or network engineering.
Strong Linux administration skills.
Solid understanding of networking, including TCP/IP, VPN, DNS, firewalls, routing, security groups, and private networks.
Experience with Docker and containerized environments.
Experience with at least one major cloud platform or private cloud environment.
Familiarity with monitoring, logging, automation, and scripting.
Ability to troubleshoot infrastructure issues independently.
Strong communication skills and willingness to support customer-facing technical requests.
Interest in learning GPU computing, CUDA environments, and AI infrastructure.

Preferred Qualifications

Experience deploying NVIDIA GPU drivers, CUDA, cuDNN, or NVIDIA Container Toolkit.
Familiarity with PyTorch, TensorFlow, Hugging Face, Jupyter, or vLLM.
Experience with Slurm or distributed compute environments.
Experience with Prometheus, Grafana, ELK, or similar monitoring tools.
Prior experience supporting AI/ML, data science, robotics, or simulation workload

Pay: $80,000.00 - $120,000.00 per year

Benefits:

Dental insurance
Health insurance
Paid time off
Vision insurance

Work Location: In person

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

Cloud / DevOps Engineer, AI Compute Infrastructure

Job description

More ML Systems and Inference roles

ML Systems Engineer

AI Vision engineer

Backend Software Engineer, AI Platform

(Senior) Data Scientist (all genders) - The Pattern Hunter

Technical Lead Manager, TorchTPU

Senior Software Engineer – Edge AI/GenAI