ML/AIWork
Storyful logo

AI Architect

Storyful · Dublin, IE

Job description

Storyful is an equal opportunity employer

Job Description :

Reporting to: Chief Product & Technology Officer (CPTO)

Dublin (Hybrid — 3 days/week in office)

Team: Product & Engineering (foundational hire for Data Science / AI function)

Mission: Build a next-generation Risk & Insights Intelligence Platform that disrupts media monitoring, social listening, and LLM monitoring—from early prototypes to commercially successful, market-leading products.

This role is for someone who can architect and build (hands-on) agentic LLM Systems in production, partner deeply with Data Scientists, and obsess over evaluation, quality, and cost—while thriving in the ambiguity of zero-to-one product creation.

Why this role exists:

We’re building an AI-native platform that detects, explains, and helps teams respond to reputational and narrative risk. You’ll shape the technical direction including network science and explainability early: agent ecosystems, information retrieval (e.g. RAG + Graph RAG), multi-document reasoning, classification, scoring, evaluation, and LLMOps—and turn them into reliable product experiences.

What you’ll do (Responsibilities)

1) Architect and ship agentic GenAI systems

  • Design and implement agent ecosystems (multi-agent architectures) that deliver real product outcomes (not demos).
  • Build specialized agents for workflows like adverse media / risk detection, entity investigation, source authenticity, classification, and summarization—and orchestrate them reliably.
  • Own the translation from research/prototypes into production-grade features (latency, reliability, observability, cost).

2) Build RAG + Graph RAG for multi-doc intelligence

  • Deliver RAG chatbots for investigation and exploration across large document sets.
  • Implement multi-document summarization, including Graph RAG patterns (graph extraction, linking entities/claims, narrative threads).
  • Implement semantic chunking / paragraph splitting, retrieval strategies, and citation/grounding patterns suitable for risk/comms teams.
  • deep agents or deep research; graph traversal strategies (network science); agentic RAG

3) Multi-document classification + scoring (risk-focused)

  • Build instruction-based and ML-assisted classification pipelines for multi-document inputs (themes, narratives, risk taxonomy). Explore generating data to fine tune small models.
  • Create scoring methodologies (e.g., risk score, severity, momentum/growth, confidence, exposure) with a clear rationale and calibration approach.
  • Bonus: experience building “risk detection” classifiers and adverse media style pipelines.

4) Context engineering + automatic prompt improvement

  • Lead prompt engineering practices across the product: reusable prompt assets, versioning, guardrails, and domain adaptation.
  • Implement prompt evolution techniques (e.g., automated prompt iteration / prompt improvement loops) where it makes commercial sense.
  • Understand the impact of the words in a prompt into the distribution of probabilities the LLM outputs, managing context, through graphs and information retrieval

5) Evaluation: make quality measurable and repeatable

  • Build robust evaluation methodologies for prompts, RAG, summarization, and classification.

  • Apply multiple evaluation techniques, including:

    • offline metrics (precision/recall/F1 where appropriate)
    • retrieval metrics and ablations
    • LLM-as-a-judge style evaluations with rubrics, controls, and drift detection
  • Define quality gates that allow the team to move fast without breaking trust.

  • Understanding an LLM as a neural network, and not only something that can be prompted and observed from the outside. For example understanding how entropy can be a signal to detect hallucinations while they unfold through the layers of the model.

6) LLMOps + cost control

  • Implement LLMOps: experiment tracking, model/prompt versioning, dataset management, observability, and release practices.

  • Build monitoring for quality + safety + cost, and actively optimize infrastructure spend in cloud environments.

  • Deploying and maintaining open source models

7) Lead by influence (and occasionally by direct leadership)

  • Bring “Senior/Lead Engineer” judgement: clean architecture, pragmatic decisions, mentoring, unblock teams.
  • Partner tightly with Product, Design, Data Science, and Engineering—while also being able to execute independently.

What success looks like (first 6–12 months)

  • A production-grade agentic architecture powering key workflows (investigate summarize classify score recommend action).
  • A measurable evaluation framework where quality improves release over release.
  • A Graph RAG (or equivalent) capability that materially improves multi-doc summarization accuracy and defensibility.
  • Clear cost/performance tradeoffs and observability that make the system operable at scale.
  • A team around you that’s leveled up in GenAI engineering practices.

Required experience (Must-have)

  • Proven background as a Senior / Lead Engineer (or equivalent staff-level scope), owning architecture and delivery.

  • Demonstrated experience building agentic GenAI architecture for commercially successful product features (not only internal prototypes).

  • Strong experience working with Data Scientists on ML algorithms, NLP, evaluation design, and productionization.

  • Hands-on experience in AWS and GCP (Azure acceptable as additional).

  • Production experience with:

    • RAG chatbots
    • multi-document summarization (ideally Graph RAG)
    • multi-document classification
    • scoring methodologies (risk scoring is a strong bonus)
  • Deep expertise in prompt engineering and evaluation, including both classical metrics (e.g., precision/recall) and LLM-as-a-judge approaches.

  • Strong LLMOps and GenAI product design experience: experimentation deployment monitoring iteration.

Nice-to-have (Strong bonuses)

  • Experience in risk/compliance domains (e.g., adverse media, AML, entity investigation workflows).
  • Knowledge graphs in production (e.g., Neo4j) and graph extraction pipelines.
  • Experience running annotation programs / building labeled datasets for NLP tasks.

Skills & tools (examples)

We don’t require exact matches, but we do expect you to be fluent in this class of tooling and able to choose pragmatically.

GenAI frameworks & LLMs

  • LangChain, LlamaIndex
  • OpenAI / Gemini / Claude
  • Vector RAG + Graph RAG patterns

LLMOps / experimentation / observability

  • MLflow (experiments, tracking)
  • Langfuse (prompt & trace observability)

Data & retrieval

  • Neo4j (graph), ElasticSearch
  • Vector stores (Pinecone-style capability), embeddings, semantic chunking

Cloud / infrastructure (examples)

  • AWS: Lambda, SQS/SNS, Kinesis, Glue, Athena, Redshift, DynamoDB, RDS, API Gateway, CloudFront, SageMaker, Comprehend, Kendra, Lex
  • GCP (plus Azure exposure helpful)

Languages

  • Python (primary), TypeScript, Java (Ruby on Rails experience welcome)

Job Category:

Storyful - Product & Technology

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

More Architecture and Leadership roles

View all →
AI Architect
Storyful
Apply →