AI Engineer Clinical Data Science
Katalyst Healthcares & Life Sciences · New York, US
Job description
Job Description:
- We are looking for an AI Engineer to join our Data Science team, building AI-powered solutions for clinical data processing and analysis within a major pharmaceutical organization. You will design, develop and deploy generative AI systems that automate clinical reporting workflows, extract intelligence from documents, and accelerate data-driven decision making.
- This is a hands-on engineering role you'll be writing production code, not just building prototypes.
Responsibilities:
- Generative AI & Automation.
- Develop LLM-powered automation tools for clinical reporting and document generation workflows.
- Build AI-driven code generation pipelines and quality assessment frameworks.
- Design and implement human-in-the-loop review workflows with feedback loops to continuously improve output quality.
Research & Evaluation:
- Research and evaluate emerging AI methods, frameworks, and techniques for specific tasks e.g. comparing fine-tuning vs zero-shot approaches, assessing new document extraction tools, or trialling new agentic frameworks.
- Prototype and benchmark new approaches before recommending adoption.
- Stay current with a rapidly evolving field and bring new ideas to the team.
Agentic AI & Orchestration:
- Design and build multi-agent systems for data workflows agents that retrieve, generate, validate, and iterate autonomously.
- Implement agent orchestration using frameworks such as Google ADK, Lang Graph, or Lang Chain.
- Deploy and manage agents on Google Vertex AI.
Document Understanding & RAG:
- Build document processing pipelines (PDFs, Word/DOCX) extraction, parsing, table detection, structure recognition.
- Design and build RAG pipelines grounded in source documents.
- Process, extract and transform data from unstructured and semi-structured sources.
- Code Quality & Engineering Practices:
- Write clean, well-tested, maintainable Python code following SOLID principles and recognised design patterns.
- Apply single responsibility, dependency inversion, and interface segregation in real codebases not just theory.
- Write meaningful tests and maintain high standards across the team.
- Refactor and improve existing code as part of normal development workflow.
AI-Assisted Development:
- Use AI coding tools (e.g. Gemini CLI, GitHub Copilot) as a core part of your development workflow.
- Critically review and validate AI-generated code understanding what it produces, why, and when it's wrong.
- Write effective prompts to direct AI tools toward correct, secure, well-structured output.
- Know when to use AI and when to write code manually judgement over speed.
Platform & Infrastructure:
- Integrate and orchestrate LLM providers available through Google Vertex AI (Gemini, etc.).
- Build internal tools and applications using Stream lit and Fast API.
- Containerize and deploy services using Docker.
Required Skills & Experience:
- MSc in Data Science, Computer Science, Bioinformatics, or related field (or equivalent practical experience), Strong Python skills.
- Hands-on experience building RAG systems or LLM-powered applications (using LangChain, LlamaIndex, or similar frameworks).
- Experience integrating LLM APIs (Google Gemini, OpenAI, or similar) we work primarily through Google Vertex AI.
- Working knowledge of vector databases (ChromaDB, Weaviate, Qdrant, Pinecone, or similar).
- Cloud platform experience (GCP preferred, especially Vertex AI).
- Docker and containerized deployments.
- Strong software engineering fundamentals SOLID principles, clean code practices, design patterns, testing, version control (Git), code review.
- Comfortable using AI-assisted development tools (e.g. Gemini CLI, GitHub Copilot) and critically evaluating what they produce.
- Strongly Preferred.
- Experience with agentic AI patterns multi-agent orchestration, tool use, autonomous workflows (LangGraph, Google ADK, or similar).
- Document processing experience extracting and parsing data from PDFs and Word/DOCX files programmatically.
- Understanding of LLM evaluation principles and output quality assessment (BLEU, ROUGE etc, code execution metrics, or similar).
- Data science fundamentals Pandas, NumPy, scikit-learn, statistical analysis, data visualization.
- Prompt engineering and optimisation techniques.
- Streamlit application development.
Domain Knowledge:
- Clinical trials or pharmaceutical industry experience.
- Familiarity with clinical data standards.
- Awareness of regulatory and data privacy requirements in life sciences.
Infrastructure & DevOps :
- Terraforma or infrastructure-as-code expe rience.
- CI/CD pipeline design (GitHub Actions or similar).
Knowledge Graphs:
- Neo4j, Cypher query language.
- Network for graph analytics.
- Graph-based RAG or knowledge extraction.
AI/ML:
- Experience with LLM-driven code generation.
- LLM fine-tuning experience (e.g. LoRA, PEFT, RLHF, Vertex AI model tuning, or similar approaches).
- NLP and text processing (HuggingFace Transformers, Sentence-Transformers).
- PyTorch or TensorFlow (for custom model work if needed).
- Google ADK (Agent Development Kit) or Vertex AI Agent Builder.
- Model Context Protocol (MCP) for tool integration and interoperability.
Other:
- Frontend experience (React, TypeScript).
- FastAPI or Flask REST API development.
- PostgreSQL or similar relational databases.
What You'll Work With:
- Languages: Python (primary), SQL, some TypeScript/R.
- AI/ML : Lang Chain, LlamaIndex, Lang Graph, Google ADK, MCP, Hugging Face Transformers, Sentence-Transformers, Google Gemini (via Vertex AI).
- Document Processing: PyMuPDF, python-docx, pdf plumber, OCR tools.
- Data: Pandas, NumPy, SciPy, scikit-learn, Plotly.
- Databases: Vector databases, graph databases, relational databases.
- Infrastructure: Docker, Google Cloud Platform (Vertex AI, GCS), Terraform, GitHub Actions.
- Applications: stream lit, Fast API, Flask.
- Tools: Python packaging, testing frameworks, linting, Git.
About You:
- You care about code quality not just making things work, but making them maintainable.
- You're comfortable working across the full stack of an AI application, from data ingestion to user-facing tools.
- You can context-switch between multiple projects and work autonomously.
- You're curious about the clinical/pharmaceutical domain and motivated to learn it.
- You see AI-assisted development as a force multiplier, not a replacement for engineering judgment.
- You're a self-directed learner who researches new methods and tools, evaluates them critically, and knows when to adopt vs when to stick with what works.
ML/AI Work links you to the employer's original posting — always verify the details there before applying.
More Core AI Engineering roles
View all →Generative AI Engineer
Technosoft Engineering, INC · Houston, US
2 days ago
Senior Software Engineer, AI/ML GenAI, Google Cloud
Google · San Jose, US
$174,000 – $253,000/yrSenior3 days ago
ASIC Power Engineer, ML Accelerators
Google · San Jose, US
$163,000 – $237,000/yr3 days ago
Senior Applied AI Engineer, Product Simulation
NVIDIA · San Jose, US
$184,000 – $356,500/yrSenior3 days ago
Lead Release Engineering, Agentic Platform
Salesforce · San Jose, US
$172,500 – $285,800/yrLead3 days ago
Senior Developer Relations Engineer, Chrome and Web AI
Google · San Jose, US
$163,000 – $237,000/yrSenior3 days ago
AI Engineer Clinical Data Science
Katalyst Healthcares & Life Sciences