ML/AIWork
Inria logo

PhD Position F/M Defending deployed AI models: manipulation as a countermeasure

Inria · Rennes, FR

Job description

Le descriptif de l’offre ci-dessous est en Anglais

Type de contrat : CDD

Niveau de diplôme exigé : Bac + 5 ou équivalent

Fonction : Doctorant

A propos du centre ou de la direction fonctionnelle

The Inria center at the University of Rennes is one of eight Inria centers and has more than thirty research teams. The Inria center is a major and recognized player in the field of digital sciences. It is at the heart of a rich ecosystem of R&D and innovation, including highly innovative SMEs, large industrial groups, competitiveness clusters, research and higher education institutions, centers of excellence, and technological research institutes.

Contexte et atouts du poste

Deployed AI models on platforms are interesting to at least two different kinds of crowds:
users and attackers. In the first case, it becomes clearer and clearer that the impact of these
models on users' everyday life must be audited for preventing abuse or bias [LMPT24]. In the
second case, the cost of training these models calls for proper defenses against malicious entities
and o ensive competitors [MGW+25]. The ambition of the Cluster SequoIA's FANG chair is
to bridge the gap between these two critical setups: legal auditing and o ensive security, in
the domain of modern deployed AI models. From this unique standpoint, and from the body
of work we have contributed to build in the field of AI auditing (e.g., [BGDV+25, GLMT+24,
GLMP+25, Ric26]), we expect to find new insights for attacking and defending deployed AI
models, by finding novel angles.
A key observation from this body of work is that platforms hosting AI models are not passive
actors. We have shown that platforms are incentivized to maintain the utility of their model
despite regulation, and may actively manipulate audit outcomes to their advantage [GLMT+24].
Indeed, audit manipulation where a platform returns strategically altered responses to an audi-
tor's queries can severely disrupt the reliability of black-box audits [LMT20]. This manipulative
capability, currently studied as a threat to auditors, constitutes, when viewed from the security
standpoint, a powerful and largely unexplored defensive tool for model owners facing attackers.
This Ph.D. thesis proposes to bring the concepts and techniques of audit manipulation [GLMT+24,
Fuk20, Yan22] to the field of AI security, in order to design novel defenses for deployed AI models.

The central insight is the following: when a platform detects an ongoing attack (e.g., model
extraction, adversarial example crafting, or ngerprinting-based reconnaissance [Ric26]), rather
than simply blocking the attacker (which signals detection and incentivizes the attacker to adapt),
a more effective strategy is to manipulate the responses returned to the attacker. By returning
strategically biased results, the platform can degrade the quality of the attacker's extracted in-
formation, poison surrogate models being built by the attacker, or feed misleading signals that
waste the attacker's resources. This is conceptually analogous to honeypots and deception-based
defenses in classical cybersecurity, but instantiated in the speci c context of machine learning
model APIs.
A critical challenge arises when the platform cannot reliably distinguish attackers from legit-
imate users or regulators. In this regime of uncertain detection, the platform must navigate a
fundamental tension: manipulated responses, if served to legitimate users, degrade the model's
utility [Kur25]. Randomized defenses [MFL22] o er a principled framework for this setting: by
injecting controlled noise or perturbations into a fraction of responses, the platform can prob-
abilistically disrupt attacks while bounding the impact on legitimate users.

We will study how to calibrate such randomized manipulation strategies, drawing on the trade-o s between attack
disruption rate and model utility loss.
This thesis will leverage the formal understanding of what information di erent attacks ex-
tract, and at what query cost, to design defenses that are targeted : manipulating precisely the
dimensions of the model's output that are most valuable to attackers, while preserving the di-
mensions that matter for legitimate use and regulatory audits. This cat and mouse (or platform
and regulator) defense/audit game might improve our understanding of the limits of what is
achievable by both parties in this black-box scenario.

Mission confiée

Research questions
ˆ Can the concepts of audit manipulation where platforms return strategically altered re-
sponses to auditors be transposed to defend models against attackers? What are the formal
conditions under which manipulation-based defenses provably degrade an attacker's informa-
tion gain?
ˆ When a platform cannot reliably distinguish an attacker from a legitimate user, what is the
optimal trade-off between the amplitude of response manipulation and the resulting loss of
model utility for legitimate users?
ˆ Can randomized defenses be designed so that they selectively disrupt attack-relevant dimen-
sions of model outputs (e.g., decision boundaries exploited by adversarial attacks for classifiers,
or output distributions leveraged for LLMs extraction) while preserving the dimensions relevant for standard use?
ˆ How does the effectiveness of manipulation-based defenses depend on the type of attack
being countered? In particular, are extraction attacks, adversarial example crafting, and
fingerprinting-based reconnaissance equally vulnerable to response manipulation, or do some
attack classes require different defensive strategies fundamentally?
ˆ On the regulatory side, can manipulation-based defenses coexist with legitimate auditing by
regulators? That is, can a platform deploy active defenses against attackers without simultane-
ously disrupting the stealthy audits that regulators rely on to assess fairness and compliance?

Principales activités

Envisioned planning
ˆ t0 + 6 months: Production of a state-of-the-art on manipulation-based defenses for LLMs,
covering audit manipulation, adversarial perturbation defenses (e.g., randomized smooth-
ing [MFL22] for classi ers), and detection-then-response paradigms for LLMs/agents. Formal
problem statement and threat model de nition.
ˆ t0 +12 months: Design and theoretical analysis of manipulation-based defense strategies against
model extraction attacks, or other more subtle attacks.
ˆ t0 +20 months: Extension to multi-attack defense: studying how a single manipulation strategy
can simultaneously counter extraction, adversarial, and reconnaissance attacks. Analysis of
the utility defense trade-off under uncertain attacker detection.
ˆ t0 + 30 months: Study of the coexistence of active defenses and legitimate regulatory audits.
Formal characterization of when and how manipulation-based defenses can discriminate be-
tween attackers and auditors.
ˆ t0 + 36 months: Thesis manuscript completed, and planned defense.

Compétences

Solid theoretical background in maths and/or machine-learning
ˆ

Python coding skills for experimental evaluations

Avantages

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Rémunération

monthly gross salary 2300 euros

Informations générales

  • Thème/Domaine : Optimisation, apprentissage et méthodes statistiques

    Statistiques (Big data) (BAP E)

  • Ville : Rennes

  • Centre Inria : Centre Inria de l'Université de Rennes

  • Date de prise de fonction souhaitée : 2026-10-01

  • Durée de contrat : 3 ans

  • Date limite pour postuler : 2026-09-30

Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.

Consignes pour postuler

Please submit online : your resume, cover letter and letters of recommendation eventually

Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.

Politique de recrutement :

Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.

Contacts

A propos d'Inria

Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'efforce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

More Machine Learning roles

View all →
PhD Position F/M Defending deployed AI models: manipulation as a countermeasure
Inria
Apply →