ML/AIWork

Senior SW Engineer – AI Infrastructure & Optimization

NeuReality · Katowice, PL

Job description

We are looking for a Senior Software Engineer to help build and optimize large-scale, high-performance GenAI infrastructure and inference systems on Kubernetes.

As AI workloads increasingly move toward Kubernetes-native infrastructure, we are building systems that support distributed inference, performance optimization, reliability, observability, and production-grade deployment at scale.

This role is ideal for an engineer who can reason deeply about systems, performance, tradeoffs, and reliability, and who is comfortable owning difficult technical decisions end-to-end.

You will work across inference serving, distributed systems, optimization, and Kubernetes-native AI infrastructure.

What You’ll Do

  • Build and optimize high-performance Kubernetes-native GenAI inference systems
  • Work with modern inference stacks such as vLLM, SGLang, TensorRT-LLM, and related tooling
  • Work with Kubernetes-native distributed LLM inference frameworks such as llm-d and NVIDIA Dynamo
  • Design and implement optimization algorithms and performance improvements
  • Improve reliability, observability, deployment, and operational maturity of AI systems
  • Make architectural decisions and take ownership of technical outcomes
  • Collaborate with a small, senior engineering team focused on performance and production quality

Requirements:

Required Qualifications

  • Minimum 5 years of experience as a Software Engineer, with strong software engineering and system design skills.
  • Programming experience in Go and Python
  • Hands-on experience with the Kubernetes ecosystem, including Operators, service meshes, GitOps, Gateway API, and OpenTelemetry
  • Experience with cloud platforms
  • Strong understanding of optimization algorithms and performance engineering
  • Ability to independently drive technical initiatives from concept to production
  • Strong systems thinking and debugging skills
  • Comfort operating in environments with high autonomy and responsibility

Nice to Have

  • Experience with modern LLM inference frameworks such as vLLM, SGLang, or TensorRT-LLM
  • Experience with distributed LLM inference frameworks such as llm-d or NVIDIA Dynamo
  • Contributions to open-source Kubernetes or ML infrastructure projects
  • GPU performance optimization and profiling experience
  • Familiarity with CUDA, NCCL, or Triton kernels
  • Experience running GenAI systems at scale in production

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

More Machine Learning roles

View all →
Senior SW Engineer – AI Infrastructure & Optimization
NeuReality
Apply →