ML/AIWork
Google logo

Software Engineer, TPU Host Networking

Google · San Jose, US

Job description

Minimum qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • 2 years of experience with software development or 1 year of experience with an advanced degree in an industry setting.
  • 2 years of experience with developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage or hardware architecture.
  • 2 years of experience with networking protocols and troubleshooting.
  • 2 years of experience working in C++.

Preferred qualifications:

  • 2 years of experience with data structures and algorithms.
  • 2 years of experience with performance optimization.
  • 2 years of experience with networking protocols.
  • Experience in network infrastructure.
  • Experience with machine learning infrastructure.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

Tensor Processing Units (TPU) are Google’s custom-built Application-Specific Integrated Circuits used to accelerate machine learning (ML) workloads. TPU are designed from the ground up leveraging Google’s deep experience and leadership in ML Learning.

As a team member in Tensor Processing Unit Host Networking, you will play a leading role in the design, development, testing, deployment, and debugging of the TPU networking stack, from hardware (Tensor Processing Unit, Network Interface Controller) all the way up to ML frameworks (JAX, PyTorch) to enable both large-scale training and low-latency inference applications.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.Individual pay is determined by factors including job-related skills, experience, and relevant education or training.

US: $147000 - $211000 (USD) + 15% bonus target + bonus + equity + benefits

Learn more about benefits at Google.Responsibilities

  • Write product or system development code.
  • Design, develop, test and deploy TPU networking stack.
  • Perform full-stack cross-layer optimization of TPU networking performance for a variety of ML workloads.
  • Analyze and debug TPU networking performance issues in production.
  • Develop and enhance telemetry to provide deep visibility into network behavior and accelerate troubleshooting. Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

$147,000 – $211,000/yr
Google
Apply →