ML Ops Support Engineer
Vibotek LLC · Philadelphia, US
Job description
- Mandatory Skills:
MLOps L2 Support Engineer to provide 24/7 production support for machine learning (ML) and data pipelines. The role requires on-call support, including weekends, to ensure high availability and reliability of ML workflows. The candidate will work with Dataiku, AWS, CI/CD pipelines, and containerized deployments to maintain and troubleshoot ML models in production.
Responsibilities: Incident Management & Support:
+ Provide L2 support for MLOps production environments, ensuring uptime and reliability.
+ Troubleshoot ML pipelines, data processing jobs, and API issues.
+ Monitor logs, alerts, and performance metrics using Dataiku, Prometheus, Grafana, or AWS tools such CloudWatch.
+ Perform root cause analysis (RCA) and resolve incidents within SLAs.
+ Escalate unresolved issues to L3 engineering teams when needed. Dataiku Platform Management:
+ Manage Dataiku DSS workflows, troubleshoot job failures, and optimize performance.
+ Monitor and support Dataiku plugins, APIs, and automation scenarios.
+ Collaborate with Data Scientists and Data Engineers to debug ML model deployments.
+ Perform version control and CI/CD integration for Dataiku projects. Deployment \& Automation:
+ Support CI/CD pipelines for ML model deployment (Bamboo, Bitbucket etc).
+ Deploy ML models and data pipelines using Docker, Kubernetes, or Dataiku Flow.
+ Automate monitoring and alerting for ML model drift, data quality, and performance.
Cloud & Infrastructure Support:
+ Monitor AWS\-based ML workloads (SageMaker, Lambda, ECS, S3, RDS).
+ Manage storage and compute resources for ML workflows.
+ Support database connections, data ingestion, and ETL pipelines (SQL, Spark, Kafka).Security \& Compliance:
+ Ensure secure access control for ML models and data pipelines.
+ Support audit, compliance, and governance for Dataiku and MLOps workflows.
+ Respond to security incidents related to ML models and data access.
Required Skills & Experience:
Experience: 5+ years in MLOps, Data Engineering, or Production Support.
Dataiku DSS: Strong experience in Dataiku workflows, scenarios, plugins, and APIs.
Cloud Platforms: Hands-on experience with AWS ML services (SageMaker, Lambda, S3, RDS, ECS, IAM).
CI/CD & Automation: Familiarity with GitHub Actions, Jenkins, or Terraform.
Scripting & Debugging: Proficiency in Python, Bash, SQL for automation & debugging.
Monitoring & Logging: Experience with Prometheus, Grafana, CloudWatch, or ELK Stack.
Incident Response: Ability to handle on-call support, weekend shifts, and SLA-based issue resolution.
Preferred Qualifications:
Containerization: Experience with Docker, Kubernetes, or OpenShift.
ML Model Deployment: Familiarity with TensorFlow Serving, MLflow, or Dataiku Model API.
Data Engineering: Experience with Spark, Databricks, Kafka, or Snowflake.
ITIL/DevOps Certifications: ITIL Foundation, AWS ML certifications; Dataiku certification Work Schedule & On-Call Requirements:
Rotational on-call support (including weekends and nights).
Shift-based monitoring for ML workflows and Dataiku jobs.
Flexible work schedule to handle production incidents and critical ML model failures.
ML/AI Work links you to the employer's original posting — always verify the details there before applying.
More Research and Science roles
View all →Senior Data Scientist
— · Uppsala, SE
Director of Data Engineering
2K Games · Dublin, IE
Data Scientist
Københavns Lufthavne · Copenhagen, DK
Senior Data Scientist
Samba TV · The Hague, NL
PhD in AI, Diversity and Cardiology
Amsterdam UMC · The Hague, NL
Data Scientist (m/w/d) im Innovationsmanagement (m/w/d)
Agentur für Innovation in der Cybersicherheit GmbH · Leipzig, DE