ML/AIWork

MLOps Engineer

Strategic Healthcare Programs · Remote · Oxnard

Job description

Strategic Healthcare Programs (SHP) is a leading provider of analytics and performance management solutions for the post-acute healthcare market. We are an industry leader in helping Home Health, Hospice, and Skilled Nursing providers improve their financial and quality performance while complying with many regulatory requirements. Additionally, we connect the post-acute world to the broader provider markets to allow for optimal management across the continuum of care.

Role Overview

We're hiring a strong Python engineer to build and operate our production ML platform end-to-end. You'll productionalize data science work by building robust on-premises infrastructure, establishing software engineering best practices, and creating the tooling that enables our data scientists to ship faster. All infrastructure is self-hosted.

This is a remote or hybrid position within the United States. Employees living within 75 miles of the Santa Barbara office are required to work in-person in the office every Wednesday.

ML experience is welcome but not required. We care most about your software engineering foundation: production Python, OOP, testing, and async/parallel performance. Our existing ML engineers will get you up to speed on the ML side — frameworks, LLMs, vector stores, vLLM, and the rest.

Team: You'll join a tight ML team where every engineer owns meaningful surface area. We're a small team where every engineer owns their code end-to-end. We value people who deeply understand the systems they build — not just that they run.

What You'll Do Day-to-Day

Production ML Systems (40%)

  • Build automated ML pipelines: data ingestion training evaluation deployment retraining
  • Deploy and serve models (batch + real-time) via FastAPI/Flask APIs with auto-scaling and rollback
  • Implement CI/CD for ML: model packaging, versioning, automated deployments
  • Optimize workflows using async, parallelism, Ray, and Dask

ML Platform & Tooling (35%)

  • Design reusable internal Python packages for preprocessing, training, inference, and evaluation
  • Refactor data science notebooks into maintainable OOP modules
  • Build workflow orchestration for training and inference pipelines
  • Create standardized templates for model development

Observability & Reliability (15%)

  • Monitor latency, drift, data quality, and model performance
  • Build alerting for degradation and anomalies (Prometheus, Grafana)
  • Create dashboards for production model health
  • Set up automated retraining triggers

Code Quality & Collaboration (10%)

  • Coach data scientists on production-grade Python: testing, OOP, async/parallel patterns
  • Establish and enforce software best practices across the ML codebase
  • Partner with data scientists to translate pain points into engineering solutions

Required Skills

Must Have:

  • 5+ years of production Python engineering
  • Strong OOP fundamentals: classes, inheritance, composition, design patterns
  • Testing discipline: unit, integration, fixtures, mocking
  • Demonstrated async and parallel optimization (asyncio, multiprocessing, threading)
  • Building and operating production Python services (APIs, workers, background jobs)
  • Familiarity with FastAPI or Flask
  • Experience deploying to self-hosted/on-prem environments

Soft Skills:

  • Translate engineering needs into clean, maintainable code
  • Comfortable coaching peers on production engineering practices
  • Curious about ML and motivated to ramp into it

Nice-to-Have

  • Prior MLOps or ML platform experience
  • ML frameworks: scikit-learn, XGBoost, PyTorch
  • Observability stack: Prometheus, Grafana, structured logging/tracing
  • RAG pipelines: vector stores, semantic search
  • LLM serving: vLLM, Text Generation Inference
  • GenAI/agentic frameworks: LangChain, LlamaIndex, DSPy
  • Orchestration: Prefect, Kubeflow, Airflow, or similar
  • Kubernetes and containerization in on-prem environments
  • Experiment tracking: MLflow
  • LLM observability: Phoenix, Langfuse, OpenLIT
  • On-prem GPU infrastructure management

Pay

$140,000. - $175,000. annual, depending upon experience.

Benefits

We value work/life balance. We offer comprehensive health benefits, a 401(k) plan with a company match, an employee stock purchase plan, vacation time, sick time, and paid holidays.

This position is not eligible for immigration sponsorship.

Experience

Required

  • 5+ years of production Python engineering
  • Strong OOP fundamentals: classes, inheritance, composition, design patterns
  • Testing discipline: unit, integration, fixtures, mocking
  • Demonstrated async and parallel optimization (asyncio, multiprocessing, threading)
  • Building and operating production Python services (APIs, workers, background jobs)
  • Familiarity with FastAPI or Flask
  • Experience deploying to self-hosted/on-prem environments

Equal Opportunity Employer/Protected Veterans/Individuals with Disabilities
This employer is required to notify all applicants of their rights pursuant to federal employment laws. For further information, please review the Know Your Rights (https://www.eeoc.gov/poster) notice from the Department of Labor.

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

More MLOps and Platform roles

View all →
$140,000 – $175,000/yr
Strategic Healthcare Programs
Apply →