AI Infrastructure Engineer
Richtech Creative Displays · Las Vegas, US
Job description
AI Infrastructure Engineer
Responsibilities
1. NVIDIA GPU & Hardware Infrastructure Deployment
- Hardware Provisioning: Rack, stack, configure, and maintain high-performance bare-metal GPU servers (e.g., NVIDIA H200, B300 or equivalent Supermicro/Dell/HGX architectures).
- AI Software Stack: Install, update, and optimize NVIDIA Drivers, CUDA Toolkit, cuDNN, and NVIDIA Container Toolkit on physical host machines.
- Containerization & Orchestration: Manage GPU-accelerated environments using Docker, including configuring GPU partitioning (MIG/vGPU) for optimal resource allocation.
2. Network & Systems Engineering
- High-Performance Networks: Configure and optimize InfiniBand (IB) switches and RoCE (RDMA over Converged Ethernet) to ensure ultra-low latency and maximum throughput for multi-GPU training workloads.
- Core Infrastructure: Manage enterprise firewalls, core switches, VLANs, and local network routing to ensure high security and stability of the data center network.
- Linux Administration: Oversee Linux server administration (Ubuntu, RHEL, or Rocky Linux), including automated OS provisioning and local storage clusters.
3. Metering & Billing System Integration
- Resource Metering: Implement and configure telemetry tools to accurately monitor and log GPU time, CPU utilization, storage usage, and network traffic.
- Billing System Management: Maintain and integrate usage-based billing/metering engines to track infrastructure costs or client usage.
- Automation: Write robust scripts (Python, Go, or Bash) to link data center resource telemetry with the billing platform for precise invoicing and automated usage reporting.
Qualifications & Skills
Required Qualifications:
- Experience: 3-5+ years of experience in Network Engineering, Linux Systems Administration, or DevOps, with hands-on experience in GPU infrastructure deployment.
- Linux & Automation: Expert-level knowledge of Linux environments and infrastructure-as-code/automation tools (Ansible, Terraform, or SaltStack).
- NVIDIA Ecosystem: Deep technical understanding of the NVIDIA AI Enterprise stack (CUDA, NCCL, NVLink).
- Billing/Metering Awareness: Practical experience working with usage-based tracking, billing APIs, or internal chargeback tools.
Pay: $80,000.00 - $120,000.00 per year
Benefits:
- Dental insurance
- Health insurance
- Paid time off
- Vision insurance
Work Location: In person
ML/AI Work links you to the employer's original posting — always verify the details there before applying.
More Core AI Engineering roles
View all →Generative AI Engineer
Technosoft Engineering, INC · Houston, US
2 days ago
Senior Software Engineer, AI/ML GenAI, Google Cloud
Google · San Jose, US
$174,000 – $253,000/yrSenior3 days ago
ASIC Power Engineer, ML Accelerators
Google · San Jose, US
$163,000 – $237,000/yr3 days ago
Senior Applied AI Engineer, Product Simulation
NVIDIA · San Jose, US
$184,000 – $356,500/yrSenior3 days ago
Lead Release Engineering, Agentic Platform
Salesforce · San Jose, US
$172,500 – $285,800/yrLead3 days ago
Senior Developer Relations Engineer, Chrome and Web AI
Google · San Jose, US
$163,000 – $237,000/yrSenior3 days ago
$80,000 – $120,000/yr
Richtech Creative Displays