Data Scientist (AI Quality & Evaluation)
— · Boston, US
Job description
About the Role
We're looking for a Data Scientist to own the quality, reliability, and trustworthiness of our clinical AI outputs. You'll build the systems that ensure our AI "knows what it doesn't know" — developing evaluation frameworks, calibrated confidence scoring, and automated quality assurance that physicians can actually trust.
What You'll Do
- Design and implement automated evaluation pipelines that assess AI output quality, accuracy, and safety at scale
- Develop uncertainty quantification systems where confidence scores meaningfully correlate with accuracy
- Build comprehensive evaluation frameworks combining automated assessment with clinician-validated test cases
- Implement feedback loops that continuously improve model outputs based on validation signals
- Establish scalable quality gates that catch errors before they reach end users
- Contribute to model alignment and fine-tuning efforts
Qualifications
Required
- Strong foundation in deep learning frameworks (PyTorch) and LLM architectures
- Experience with model evaluation, benchmarking, and quality metrics
- Proficiency in Python and modern ML development tools
- Strong statistical foundations
- Ability to read, implement, and extend research papers
- Excellent communication skills
Preferred
- Master's degree in Computer Science, Machine Learning, Statistics, or related quantitative field (PhD preferred)
- Publications in top ML/AI venues (NeurIPS, ICML, ICLR, ACL)
- Experience with RLHF, DPO, or preference optimization techniques
- Background in healthcare AI or regulated industries
- Experience building evaluation systems for production LLM applications
ML/AI Work links you to the employer's original posting — always verify the details there before applying.
More Data Science roles
View all →Data Scientist – Seed Robotics & AI
Enza Zaden · Amersfoort, NL
2 days ago
Senior Data Scientist - Government & Public Services
Deloitte · Baltimore, US
Senior3 days ago
DATA SCIENTIST LEAD L1(CONTRACT)
Wipro UK · Milton Keynes, GB
Lead3 days ago
Data Scientist, Behavior Evaluation
Zoox · Boston, US
$176,000 – $240,000/yr3 days ago
Data Scientist, Autonomy Behavior Monitoring
Zoox · Boston, US
$176,000 – $240,000/yr3 days ago
Data Scientist, Behavior Evaluation
Zoox · Oakland, US
$176,000 – $240,000/yr3 days ago
Data Scientist (AI Quality & Evaluation)
Boston, US