Synthetic Data Engineer (AI Data/Training)
Hyphen Connect Limited · Boston, US
Job description
We are seeking a talented and innovative Synthetic Data Engineer. In this role, you will design and implement domain-specific synthetic data generation pipelines, ensuring high-quality data management for training loops. Your expertise will drive the success of data processing and model training within the organization.
Responsibilities:
- Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting.
- Implement automated quality scoring and de-duplication systems.
- Manage data pipelines that feed directly into SFT and DPO training loops.
Qualifications:
- Proven experience building large-scale data pipelines (Airflow, Spark, Ray).
- Deep knowledge of prompt engineering for data generation.
- Familiarity with dataset distillation and bias mitigation.
ML/AI Work links you to the employer's original posting — always verify the details there before applying.
More AI Data and Training Ops roles
View all →Technical Program Manager, ML Fleet Capacity, Systems Enablement
Google · Washington, US
$192,000 – $279,000/yr2 days ago
Sr Data Scientist
Alcon · Dallas, US
Senior2 days ago
Staff Software Engineer (C#/Java/AI) - Underwriting Automation - Hybrid
GEICO · Dallas, US
$110,000 – $230,000/yrStaff2 days ago
Technical Program Manager, ML Fleet Capacity, Systems Enablement
Google · San Jose, US
$192,000 – $279,000/yr2 days ago
Team Lead AI Transformation (D/F/M)
DHL · Wuppertal, DE
Lead3 days ago
Project Manager AI Transformation (D/F/M)
DHL · Wuppertal, DE
3 days ago
Synthetic Data Engineer (AI Data/Training)
Hyphen Connect Limited