ML Research Scientist -Deep Learning & Transformer Architectures
Millennium Management · New York, US
Job description
ML Research Scientist -Deep Learning & Transformer ArchitecturesPlease direct all resume submissions to QuantTalentUS@mlp.comand reference REQ-29605 in the subject.
Overview
As part of a long-term research agenda within a newly formed systematic equities pod, we are building a proprietary Transformer-based model trained on tokenized intraday market data for next-token prediction of price movements.
We are seeking an exceptional ML research scientist with deep expertise in Transformer architectures and large-scale model training. You will design, implement, and train a custom decoder-only Transformer from scratch -not fine-tune an existing LLM, but build a purpose• built architecture for financial time-series.
This is a long-term research project with significant computational resources. The successful candidate will have a PhD in machine learning or a related field and demonstrated ability to implement Transformer architectures from first principles.
Principal Responsibilities
- Design and implement a custom decoder-only Transformer architecture optimized for tokenized financial time-series data
- Develop a novel tokenization scheme for intraday market data: price movements, volume, order flow, and cross-sectional features
- Implement efficient training pipelines using PyTorch with mixed-precision training, gradient checkpointing, and multi-GPU parallelism
- Design attention mechanisms adapted to financial data: temporal attention patterns, cross-asset attention, and multi-scale representations
- Build evaluation frameworks for next-token prediction accuracy, signal quality, and trading performance
- Implement inference optimization for low-latency production deployment: model quantization, KV-cache, speculative decoding
- Conduct rigorous ablation studies to validate architecture choices and training methodology
- Collaborate with the team to integrate model predictions into the live trading pipeline
- Document research methodology, experimental results, and architectural decisions
Required Skills / Qualifications
- PhD in Machine Learning, Computer Science, Statistics, Applied Mathematics, or a related field with a focus on deep learning
- Demonstrated ability to implement Transformer architectures from scratch (not just finetuning pre-trained models)
- Deep understanding of attention mechanisms, positional encodings, tokenization strategies, and training dynamics
- Expert-level PyTorch skills including custom modules, training loops, mixed-precision, and multi-GPU training
- Strong mathematical foundations: linear algebra, probability theory, optimization, information theory
- Experience training models at scale (100M+ parameters)
- Strong programming skills in Python and C++ for performance-critical components
- Self-directed researcher capable of defining and executing a multi-month research agenda
- Familiarity with Al-assisted development tools (Cursor, Claude Code)
Preferred Skills / Experience
- Experience applying deep learning to financial data or time-series forecasting
- Familiarity with tokenizatlon approaches for continuous or non-text data
- Published research in top ML venues (NeurlPS, ICML, ICLR) or equivalent industry experience
- Knowledge of market microstructure and intraday trading dynamics
- Experience with model compression, quantization, and inference optimization
Millennium offers a total compensation package which includes a base salary, discretionary performance bonus, and comprehensive benefits. The estimated base salary range for this position is $150,000 to $200,000, which is specific to New York and may change in the future. When finalizing an offer, we take into consideration an individual’s experience level and the qualifications they bring to the role to formulate a competitive total compensation package.
ML/AI Work links you to the employer's original posting — always verify the details there before applying.