ML/AIWork

Data Engineer - Multimodal Systems

· San Jose, US

Job description

Zyphra is an artificial intelligence company based in San Francisco, California.

The Role:

As a Data Engineer - Multimodal Systems, you will be a core contributor to creating, collecting, and improving Zyphra’s datasets and data pipelines across a variety of modalities. Your work will intersect with almost every team at Zyphra. You will be involved in collecting large-scale datasets and implementing and optimizing highly parallel data pipelines.

You’ll Work Across:

  • Large-scale data collection across a variety of modalities (text, audio, image)
  • Designing and working with highly efficient, parallelized data processing pipelines across modalities
  • Designing and running rigorous experimental ablations to demonstrate the impact of new data improvements

What We're Looking For / Requirements:

  • Strong implementation and prototyping ability
  • Can take an idea from conception to experimentation quickly
  • The ability to work well with others in a high-paced research setting
  • Can rapidly learn new fields and are excited to implement new ideas
  • Excellent communication and collaboration skills, and can work effectively on both research and engineering implementation at scale.

Qualifications / Additional Skills:

  • Experience collecting, handling, and processing large datasets
  • Experience with parallel Python programming frameworks such as Dask
  • Understanding of the state-of-the-art in dataset curation across modalities
  • A generally meticulous nature and a strong interest in actually looking at data and sanity checking things
  • Strong grasp of proper experimental methodology for running rigorous ablations and other hypothesis testing
  • Understanding of and interest in large-scale, highly parallel data processing pipelines.
  • Proficiency with PyTorch and Python.
  • Experience contributing to large pre-existing codebases and rapidly getting up to speed.
  • Previously published machine learning research in well-respected venues.
  • Postgraduate degree in a scientific subject (Computer Science, EE/EECS, Mathematics, Physics, Machine Learning)

Why Work at Zyphra:

  • Our research methodology is grounded in methodical, step-by-step approaches to ambitious goals. Both deep research and engineering excellence are equally valued
  • We strongly value new and crazy ideas and are very willing to bet big on new ideas
  • We move as quickly as we can; we aim to minimize the bar to impact as low as possible
  • We all enjoy what we do and love discussing AI

Benefits and Perks:

  • Comprehensive medical, dental, vision, and FSA plans
  • Competitive compensation and 401(k) plan
  • Relocation and immigration support on a case-by-case basis
  • In-office snacks and meals provided
  • Unlimited PTO and company holidays
  • In-person team in San Francisco with a collaborative, high-energy environment

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

More Domain Specializations roles

View all →
Data Engineer - Multimodal Systems
San Jose, US
Apply →