Overview

We are seeking a highly experienced ML Systems Architect to design and implement a scalable, production-grade architecture for our machine learning solver. This role bridges research prototypes and commercial deployment, ensuring reliability, maintainability, and performance in a mixed technology stack.

Responsibilities

Architect the ML Solver Platform:
- Define modular architecture for data preprocessing, model execution, and post-processing.
- Establish clear API contracts between Python/TensorFlow and C# services.
Productionize ML Workflows:
- Convert research code into robust, testable, and observable services.
- Implement CI/CD pipelines, automated testing, and reproducibility standards.
Integration & Interoperability:
- Design REST/gRPC endpoints for cross-language communication.
- Ensure compatibility with C#/.NET services.
Performance & Scalability:
- Optimize GPU/CPU utilization, batching strategies, and memory management.
- Plan for multi-model and multi-tenant scenarios.
MLOps & Lifecycle Management:
- Implement model versioning, artifact registries, and deployment workflows.
- Set up monitoring, logging, and alerting for solver performance.
Security & Compliance:
- Apply best practices for secrets management, dependency scanning, and secure artifact storage.

Required Skills & Experience

ML Frameworks: Expert in TensorFlow (TF2/Keras), experience with ONNX Runtime for inference.
Programming: Advanced Python for ML; strong understanding of packaging, type checking, and performance profiling.
Architecture: Proven experience designing scalable ML systems for production.
APIs: Proficiency in gRPC/Protobuf and REST for cross-language integration.
MLOps: CI/CD pipelines, containerization (Docker/Kubernetes), model registries, reproducibility.
Performance Optimization: GPU acceleration (CUDA/cuDNN), mixed precision, XLA, profiling.
Observability: Metrics, tracing, structured logging, dashboards.
Security: SBOM, image signing, role-based access, vulnerability scanning.

Preferred Qualifications

Experience with ONNX Runtime Training, PyTorch, or hybrid ML architectures.
Familiarity with distributed training strategies and multi-GPU setups.
Knowledge of feature stores and data validation frameworks.
Exposure to regulated environments and compliance frameworks.

Tools & Technologies

ML: TensorFlow, ONNX Runtime, tf2onnx.
APIs: FastAPI, gRPC.
DevOps: GitLab CI/GitHub Actions, Docker, Kubernetes.
Monitoring: Prometheus, Grafana, OpenTelemetry.
Security: HashiCorp Vault, Sigstore.

Why Join Us?

Work on cutting-edge ML solutions integrated into commercial engineering software.
Define architecture that scales across global deployments.
Collaborate with a team of experts in ML, software engineering, and UI development.

To apply: Send your resume and a brief cover letter to HR@softinway.com

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

Principal Machine Learning Engineer – Production Systems

Job description

Overview

Responsibilities

Required Skills & Experience

Preferred Qualifications

Tools & Technologies

Why Join Us?

More MLOps and Platform roles

ML Engineer

Machine Learning Engineer – IA Conversationnelle & Voicebot – Paris (IT) / Freelance

Ingénieur Machine Learning – IA Conversationnelle & Voicebot

ML Platform Engineer

Senior ML Engineer B2C - H/F

ML Engineer B2B - H/F