AI Architect
Storyful · Dublin, IE
Job description
Storyful is an equal opportunity employer
Job Description :
Reporting to: Chief Product & Technology Officer (CPTO)
Dublin (Hybrid — 3 days/week in office)
Team: Product & Engineering (foundational hire for Data Science / AI function)
Mission: Build a next-generation Risk & Insights Intelligence Platform that disrupts media monitoring, social listening, and LLM monitoring—from early prototypes to commercially successful, market-leading products.
This role is for someone who can architect and build (hands-on) agentic LLM Systems in production, partner deeply with Data Scientists, and obsess over evaluation, quality, and cost—while thriving in the ambiguity of zero-to-one product creation.
Why this role exists:
We’re building an AI-native platform that detects, explains, and helps teams respond to reputational and narrative risk. You’ll shape the technical direction including network science and explainability early: agent ecosystems, information retrieval (e.g. RAG + Graph RAG), multi-document reasoning, classification, scoring, evaluation, and LLMOps—and turn them into reliable product experiences.
What you’ll do (Responsibilities)
1) Architect and ship agentic GenAI systems
- Design and implement agent ecosystems (multi-agent architectures) that deliver real product outcomes (not demos).
- Build specialized agents for workflows like adverse media / risk detection, entity investigation, source authenticity, classification, and summarization—and orchestrate them reliably.
- Own the translation from research/prototypes into production-grade features (latency, reliability, observability, cost).
2) Build RAG + Graph RAG for multi-doc intelligence
- Deliver RAG chatbots for investigation and exploration across large document sets.
- Implement multi-document summarization, including Graph RAG patterns (graph extraction, linking entities/claims, narrative threads).
- Implement semantic chunking / paragraph splitting, retrieval strategies, and citation/grounding patterns suitable for risk/comms teams.
- deep agents or deep research; graph traversal strategies (network science); agentic RAG
3) Multi-document classification + scoring (risk-focused)
- Build instruction-based and ML-assisted classification pipelines for multi-document inputs (themes, narratives, risk taxonomy). Explore generating data to fine tune small models.
- Create scoring methodologies (e.g., risk score, severity, momentum/growth, confidence, exposure) with a clear rationale and calibration approach.
- Bonus: experience building “risk detection” classifiers and adverse media style pipelines.
4) Context engineering + automatic prompt improvement
- Lead prompt engineering practices across the product: reusable prompt assets, versioning, guardrails, and domain adaptation.
- Implement prompt evolution techniques (e.g., automated prompt iteration / prompt improvement loops) where it makes commercial sense.
- Understand the impact of the words in a prompt into the distribution of probabilities the LLM outputs, managing context, through graphs and information retrieval
5) Evaluation: make quality measurable and repeatable
-
Build robust evaluation methodologies for prompts, RAG, summarization, and classification.
-
Apply multiple evaluation techniques, including:
- offline metrics (precision/recall/F1 where appropriate)
- retrieval metrics and ablations
- LLM-as-a-judge style evaluations with rubrics, controls, and drift detection
-
Define quality gates that allow the team to move fast without breaking trust.
-
Understanding an LLM as a neural network, and not only something that can be prompted and observed from the outside. For example understanding how entropy can be a signal to detect hallucinations while they unfold through the layers of the model.
6) LLMOps + cost control
-
Implement LLMOps: experiment tracking, model/prompt versioning, dataset management, observability, and release practices.
-
Build monitoring for quality + safety + cost, and actively optimize infrastructure spend in cloud environments.
-
Deploying and maintaining open source models
7) Lead by influence (and occasionally by direct leadership)
- Bring “Senior/Lead Engineer” judgement: clean architecture, pragmatic decisions, mentoring, unblock teams.
- Partner tightly with Product, Design, Data Science, and Engineering—while also being able to execute independently.
What success looks like (first 6–12 months)
- A production-grade agentic architecture powering key workflows (investigate summarize classify score recommend action).
- A measurable evaluation framework where quality improves release over release.
- A Graph RAG (or equivalent) capability that materially improves multi-doc summarization accuracy and defensibility.
- Clear cost/performance tradeoffs and observability that make the system operable at scale.
- A team around you that’s leveled up in GenAI engineering practices.
Required experience (Must-have)
-
Proven background as a Senior / Lead Engineer (or equivalent staff-level scope), owning architecture and delivery.
-
Demonstrated experience building agentic GenAI architecture for commercially successful product features (not only internal prototypes).
-
Strong experience working with Data Scientists on ML algorithms, NLP, evaluation design, and productionization.
-
Hands-on experience in AWS and GCP (Azure acceptable as additional).
-
Production experience with:
- RAG chatbots
- multi-document summarization (ideally Graph RAG)
- multi-document classification
- scoring methodologies (risk scoring is a strong bonus)
-
Deep expertise in prompt engineering and evaluation, including both classical metrics (e.g., precision/recall) and LLM-as-a-judge approaches.
-
Strong LLMOps and GenAI product design experience: experimentation deployment monitoring iteration.
Nice-to-have (Strong bonuses)
- Experience in risk/compliance domains (e.g., adverse media, AML, entity investigation workflows).
- Knowledge graphs in production (e.g., Neo4j) and graph extraction pipelines.
- Experience running annotation programs / building labeled datasets for NLP tasks.
Skills & tools (examples)
We don’t require exact matches, but we do expect you to be fluent in this class of tooling and able to choose pragmatically.
GenAI frameworks & LLMs
- LangChain, LlamaIndex
- OpenAI / Gemini / Claude
- Vector RAG + Graph RAG patterns
LLMOps / experimentation / observability
- MLflow (experiments, tracking)
- Langfuse (prompt & trace observability)
Data & retrieval
- Neo4j (graph), ElasticSearch
- Vector stores (Pinecone-style capability), embeddings, semantic chunking
Cloud / infrastructure (examples)
- AWS: Lambda, SQS/SNS, Kinesis, Glue, Athena, Redshift, DynamoDB, RDS, API Gateway, CloudFront, SageMaker, Comprehend, Kendra, Lex
- GCP (plus Azure exposure helpful)
Languages
- Python (primary), TypeScript, Java (Ruby on Rails experience welcome)
Job Category:
Storyful - Product & Technology
ML/AI Work links you to the employer's original posting — always verify the details there before applying.
More Architecture and Leadership roles
View all →Senior Data Scientist - Government & Public Services
Deloitte · Baltimore, US
Lead AI Architect - OBIQUA
Celestar Corporation · Baltimore, US
Security Services AI Solution Architect
Booz Allen Hamilton · Remote · Baltimore
AI Solutions Architect, Senior Manager
Booz Allen Hamilton · Remote · Baltimore
AI Model SME
Booz Allen Hamilton · Remote · Baltimore
AI and ML Engineer
Booz Allen Hamilton · Remote · Baltimore