Cloud / DevOps Engineer, AI Compute Infrastructure
Richtech Creative Displays · Las Vegas, US
Job description
Position Summary
Richtech Robotics is looking for a Cloud / DevOps Engineer to support our AI compute infrastructure services. This role will help deploy, manage, and support cloud-based GPU environments for customers building AI models, robotics applications, simulation workflows, and Physical AI systems. The ideal candidate has strong Linux, networking, cloud infrastructure, and DevOps experience, with a willingness to learn GPU computing, CUDA environments, and AI workload deployment.
Responsibilities
- Deploy and manage cloud-based GPU compute environments for customer workloads.
- Configure virtual networks, VPNs, firewalls, security groups, SSH access, storage, and user permissions.
- Build and maintain Linux-based environments for AI development, including Docker containers, CUDA drivers, Python environments, and Jupyter notebooks.
- Work with AI engineers to deploy required runtime environments for model training, fine-tuning, simulation, and inference.
- Monitor GPU usage, system performance, uptime, storage, and network connectivity.
- Troubleshoot customer issues related to access, environment setup, networking, storage, and compute availability.
- Create reusable deployment scripts, images, templates, and technical documentation.
- Coordinate with cloud infrastructure partners and internal teams to resolve technical issues.
Required Qualifications
- 2+ years of experience in cloud infrastructure, DevOps, systems administration, or network engineering.
- Strong Linux administration skills.
- Solid understanding of networking, including TCP/IP, VPN, DNS, firewalls, routing, security groups, and private networks.
- Experience with Docker and containerized environments.
- Experience with at least one major cloud platform or private cloud environment.
- Familiarity with monitoring, logging, automation, and scripting.
- Ability to troubleshoot infrastructure issues independently.
- Strong communication skills and willingness to support customer-facing technical requests.
- Interest in learning GPU computing, CUDA environments, and AI infrastructure.
Preferred Qualifications
- Experience deploying NVIDIA GPU drivers, CUDA, cuDNN, or NVIDIA Container Toolkit.
- Familiarity with PyTorch, TensorFlow, Hugging Face, Jupyter, or vLLM.
- Experience with Slurm or distributed compute environments.
- Experience with Prometheus, Grafana, ELK, or similar monitoring tools.
- Prior experience supporting AI/ML, data science, robotics, or simulation workload
Pay: $80,000.00 - $120,000.00 per year
Benefits:
- Dental insurance
- Health insurance
- Paid time off
- Vision insurance
Work Location: In person
ML/AI Work links you to the employer's original posting — always verify the details there before applying.
More ML Systems and Inference roles
View all →ML Systems Engineer
— · San Francisco, US
AI Vision engineer
SkyeBase · Remote · Antwerp
Backend Software Engineer, AI Platform
eBay · Dublin, IE
(Senior) Data Scientist (all genders) - The Pattern Hunter
Viewpointsystem · Vienna, AT
Technical Lead Manager, TorchTPU
Google · London, GB
Senior Software Engineer – Edge AI/GenAI
Qualcomm · San Diego, US