ML/AIWork
Diverse Consulting Group logo

Senior DevOps Engineer (AI & Platform Operations)

Diverse Consulting Group · Warsaw, PL

Job description

As a recruitment company, DCG understands that every business is powered by experienced professionals. Our management style and partnership approach enable us to meet your needs and provide continuous support. Due to our ongoing growth and the large number of recruitment projects we undertake for our partners, we are currently looking for:

Senior DevOps Engineer (AI & Platform Operations)

Responsibilities:

  • Incident & Problem Management: Own the RCA process for production incidents — diagnose, resolve, and put preventive measures in place so issues don't recur
  • Production Monitoring & Support: Continuously monitor service health, detect anomalies early, and act before they become incidents
  • Deployment Execution: Trigger and oversee release deployments through existing CI/CD pipelines; troubleshoot failed deployments and coordinate rollbacks when needed
  • Environment Oversight: Keep Pre-Production and Production environments stable and aligned — not building them from scratch, but ensuring they behave as expected day to day
  • Runbook & Knowledge Management: Document operational procedures, known issues, and resolution steps to build a reliable knowledge base for the team
  • Cross-team Collaboration: Work shoulder-to-shoulder with development and platform teams to triage issues, clarify operational requirements, and close the feedback loop between prod and dev
  • Identify recurring pain points and propose automation or tooling to reduce toil
  • Improve observability coverage — dashboards, alerts, log queries — to catch issues faster
  • Contribute to service continuity initiatives and disaster recovery drills

Requirements:

  • 5+ years in IT operations, application support (2nd/3rd line), or a similar production-facing role
  • Proven track record of owning incidents end-to-end — from alert to RCA to prevention
  • 2+ years working within an ITIL framework (incident, problem, change management)
  • Experience working in Agile delivery environments alongside development teams
  • Excellent English communication skills — able to explain technical issues clearly to both engineers and non-technical stakeholders
  • Proficiency with log analysis and alerting tools: Splunk, Apica, Sysdig
  • Observability tooling: Prometheus, Grafana — reading dashboards, tuning alerts
  • Comfortable operating services running on Kubernetes (checking pod health, reading logs, triggering restarts — not cluster administration)
  • Familiarity with Jenkins pipelines to execute and troubleshoot deployments
  • Relational databases (Oracle, DB2) — querying, interpreting execution plans, identifying data-related incidents
  • Working knowledge of Spring/Hibernate application behavior, Kafka message flows, XML/JSON payloads — enough to trace an issue through the stack

Nice to have:

  • Java/J2EE development background (helps enormously when reading stack traces and working with dev teams)
  • IBM Datastage operational experience
  • Scripting (Bash, Python) for automation of repetitive operational tasks
  • Ansible for applying configuration changes in controlled operational scenarios

Offer:

  • Private medical care
  • Co-financing for the sports card
  • Constant support of dedicated consultant
  • Employee referral program

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

More MLOps and Platform roles

View all →
Senior DevOps Engineer (AI & Platform Operations)
Diverse Consulting Group
Apply →