ML/AIWork
Bloomberg logo

Senior Software Engineer - AI Inference

Bloomberg · New York, US

Job description

Our team: Join the team that is building the core infrastructure for AI at Bloomberg. The Bloomberg AI Inference Platform provides production-grade managed infrastructure for hosting, deploying, and serving all machine learning models, both predictive and cutting-edge generative models. We abstract away infrastructure complexity, empowering engineering teams to focus on creating intelligent applications with guaranteed scalability, performance, and governance. Our platform is built on the open-source KServe project, and the CNCS AI Inference team is a primary contributor to its development.

We'll trust you to:* Design and build scalable infrastructure for both online and offline inference workloads.

  • Lead integration of high-performance inference runtimes and serving frameworks, including TensorRT, vLLM, ONNX, and Triton.
  • Drive architecture and technical decisions across Bloomberg’s inference platform, balancing latency, throughput, reliability, and cost.
  • Partner across engineering teams to improve model deployment, observability, and production performance. Mentor junior engineers on system design, debugging, and performance optimization.

You'll need to have:* 5+ years of professional software engineering experience.

  • Experience designing, building, and operating production distributed systems.
  • Strong systems intuition and a track record of debugging and optimizing performance-critical services.
  • Ability to own problems end-to-end and quickly ramp up in unfamiliar technical areas.
  • 4+ years of demonstrated experience working with an object-oriented programming language. A degree in Computer Science, Electrical Engineering, or equivalent practical experience.

We'd love to see:* Experience deploying and operating machine learning systems at scale.

  • Experience with inference optimization techniques such as batching, caching, request scheduling, or memory-aware serving.
  • Familiarity with PyTorch and GPU software stacks such as CUDA and NCCL.
  • Exposure to high-performance interconnects and distributed computing technologies such as NVLink, InfiniBand, or MPI.
  • Experience with Kubernetes and cloud-native infrastructure. Experience with load balancing, request routing, or traffic management systems.

Representative projects:

  • Autoscaling a heterogeneous compute fleet to match supply and demand aross diverse inference workloads.
  • Building production-grade deployment pipelines to safely roll out new models to millions of users.
  • Developing new inference capabilities such as structured sampling, prompt caching, and advanced serving optimizations.
  • Analyzing observability data from real production workloads to improve latency, throughput, and resource efficiency. Salary Range = 160,000 - 240,000 USD Annual + Benefits + Bonus

The referenced salary range is based on the Company's good faith belief at the time of posting. Actual compensation may vary based on factors such as geographic location, work experience, market conditions, education/training and skill level.

We offer one of the most comprehensive and generous benefits plans available and offer a range of total rewards that may include merit increases, incentive compensation (exempt roles only), paid holidays, paid time off, medical, dental, vision, short and long term disability benefits, 401(k) +match, life insurance, and various wellness programs, among others. The Company does not provide benefits directly to contingent workers/contractors and interns.

Discover what makes Bloomberg unique - watch our podcast series for an inside look at our culture, values, and the people behind our success.

Accommodations

Bloomberg provides reasonable adjustment/accommodation to individuals with disabilities. Please tell us if you require a reasonable adjustment/accommodation to apply for a job. Examples of reasonable adjustment/accommodation include but are not limited to making a change to the application process or work procedures, providing documents in an alternate format or using specialized equipment. To request an adjustment/accommodation to apply for a job, please email AMER_recruit@bloomberg.net (Americas), EMEA_recruit@bloomberg.net (Europe, the Middle East and Africa), or APAC_recruit@bloomberg.net (Asia-Pacific), based on the region you are submitting an application for. We may share your information with a third party provider of accommodations services who may use this information to reach out to you for the purposes of accommodating your application.

Equal Opportunity

Bloomberg is an equal opportunity employer and prohibits discrimination in employment. It is Bloomberg’s policy to provide equal opportunity and access for all persons, and the Company is committed to attracting, retaining, developing, and promoting the most qualified individuals without regard to age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, self-identified or perceived sex, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy, childbirth or related medical conditions, or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law (each, a “Protected Characteristic”). Bloomberg prohibits treating applicants or employees less favorably in connection with the terms and conditions of employment, in all phases of the employment process, because of one or more Protected Characteristics.

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

More Generative AI and LLM roles

View all →
$160,000 – $240,000/yr
Bloomberg
Apply →