Multimodal Perception Engineer (Vision + Tactile + Audio)
PROception · San Jose, US
Job description
Build and deploy multimodal perception systems that fuse vision, tactile, force, and audio sensing. You'll work at the frontier of robotics perception and machine learning to create systems that enable robots to understand and interact with the world more like humans do. From foundation models to real-time deployment, you will shape the future of sensor-driven intelligence.
Requirements: MS or PhD in Computer Science, Robotics, Machine Learning, or a related field—or equivalent industry experience
Experience integrating vision, tactile, force, or audio sensors in robotic systems
Deep understanding of sensor fusion, time sync, calibration, and failure modes
Strong grasp of probability, optimization, signal processing, and linear algebra
Proficient in Python and PyTorch for model development
Experience with C++ for high-performance or embedded deployment
Comfortable working in Linux/Unix environments and with embedded hardware
(+) Experience training or fine-tuning ViTs, MAE, or transformer-based models
(+) Experience with diffusion models for generative or representation learning
(+) Experience with camera calibration, stereo/depth sensors, or event-based vision systems
(+) Experience deploying perception models to real-time or resource-constrained platforms
-
(+) Experience designing or managing sensor data collection pipelines for ML model training
-
Competitive salary and meaningful equity
-
Comprehensive health, dental, and vision coverage
-
Work with world-class researchers and engineers in AI and robotics
-
Backed by top investors (YC, leading VCs)
-
High-ownership role with opportunity to lead multimodal perception efforts
-
Help define and build next-generation embodied intelligence systems
-
Design and implement multimodal sensor fusion systems combining vision, touch, force, and audio
-
Develop and fine-tune transformer and diffusion models for robotic perception tasks
-
Adapt foundation models for real-time inference on physical robots
-
Fuse high-frequency sensor data to estimate object pose, contact events, and material properties
-
Deploy models to real-world robots with performance and latency constraints
-
Collaborate with reinforcement learning, control, and hardware teams to close the perception-action loop
-
Take technical ownership of perception systems and help define future direction
ML/AI Work links you to the employer's original posting — always verify the details there before applying.
More Domain Specializations roles
View all →Quantitative Researcher - Deep Learning
XTX Markets · London, GB
Member of Technical Staff - Cybersecurity Capabilities
Preference Model · Toronto, CA
AI Engineer, Policy, Optimus
Tesla · San Jose, US
Senior AI Engineer – Developer Products
Workato · Remote · San Francisco
Data Engineer (173747)
Colgate-Palmolive · Yonkers, US
Senior Technical Program Manager, AI Software
NVIDIA · San Jose, US