Build and deploy multimodal perception systems that fuse vision, tactile, force, and audio sensing. You'll work at the frontier of robotics perception and machine learning to create systems that enable robots to understand and interact with the world more like humans do. From foundation models to real-time deployment, you will shape the future of sensor-driven intelligence.

Requirements: MS or PhD in Computer Science, Robotics, Machine Learning, or a related fieldâ€”or equivalent industry experience
Experience integrating vision, tactile, force, or audio sensors in robotic systems
Deep understanding of sensor fusion, time sync, calibration, and failure modes
Strong grasp of probability, optimization, signal processing, and linear algebra
Proficient in Python and PyTorch for model development
Experience with C++ for high-performance or embedded deployment
Comfortable working in Linux/Unix environments and with embedded hardware
(+) Experience training or fine-tuning ViTs, MAE, or transformer-based models
(+) Experience with diffusion models for generative or representation learning
(+) Experience with camera calibration, stereo/depth sensors, or event-based vision systems
(+) Experience deploying perception models to real-time or resource-constrained platforms

(+) Experience designing or managing sensor data collection pipelines for ML model training
Competitive salary and meaningful equity
Comprehensive health, dental, and vision coverage
Work with world-class researchers and engineers in AI and robotics
Backed by top investors (YC, leading VCs)
High-ownership role with opportunity to lead multimodal perception efforts
Help define and build next-generation embodied intelligence systems
Design and implement multimodal sensor fusion systems combining vision, touch, force, and audio
Develop and fine-tune transformer and diffusion models for robotic perception tasks
Adapt foundation models for real-time inference on physical robots
Fuse high-frequency sensor data to estimate object pose, contact events, and material properties
Deploy models to real-world robots with performance and latency constraints
Collaborate with reinforcement learning, control, and hardware teams to close the perception-action loop
Take technical ownership of perception systems and help define future direction

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

Multimodal Perception Engineer (Vision + Tactile + Audio)

Job description

More Domain Specializations roles

Quantitative Researcher - Deep Learning

Member of Technical Staff - Cybersecurity Capabilities

AI Engineer, Policy, Optimus

Senior AI Engineer – Developer Products

Data Engineer (173747)

Senior Technical Program Manager, AI Software