Thèse Multi-Agent Reinforcement Learning Dialogue Grounding ReasoningOrange

Lannion (22)CDD
HierSoyez parmi les premiers à postuler

L'entreprise : Orange

L'ambition de la Division Innovation est de porter plus loin l'innovation d'Orange et de renforcer son leadership technologique, en mobilisant nos capacités de recherche pour nourrir une innovation responsable au service de l'humain, éclairer les choix stratégiques du Groupe à long terme et influencer l'écosystème digital mondial.
Nous formons les expertes et les experts des technologies d'aujourd'hui et de demain, et veillons à une amélioration continue de la performance de nos services et de notre efficacité. La division Innovation rassemble, dans le monde, 6000 salariés dédiés à la recherche et l'innovation dont 740 chercheurs. Porteurs d'une vision globale avec une grande diversité de profils (chercheurs, ingénieurs, designers, développeurs, data scientists, sociologues, graphistes, marketeurs, experts en cybersécurité...), les femmes et les hommes de Innovation sont à l'écoute et au service des pays, des régions et des business units pour faire d'Orange un opérateur multiservices de confiance.
Au sein de Innovation, vous serez intégré(e) dans la direction Data & AI. Cette direction a pour principale mission de faire d'Orange une entreprise « data driven qui définit les standards du Groupe en matière de data et d'intelligence artificielle, et qui facilite le développement des cas d'usage, des produits et services de données. Cette direction est appelée à accompagner l'ensemble du groupe Orange.

Description du poste

Since their breakthrough in 2022, Large Language Models (LLMs) are transforming our daily lives. However, they still struggle with reliable reasoning and planning, often neglecting grounding-the process by which interlocutors ensure mutual understanding. These limitations cannot be fully addressed by prompting techniques alone (e.g., chain-of-thoughts, ReAct). They are even more pronounced in Small Language Models (LMs), whose limited parameters restrict their generalization.
Reinforcement Learning from Human Feedback (RLHF) has proven effective in reducing hallucinations and improving reasoning in LLMs (notably with DeepSeek), but it is less efficient for small LMs. This has led to increased interest in Multi-agent Reinforcement Learning (MRL) as a promising alternative.
This thesis proposes to study MRL by decomposing complex conversational tasks into three sub-tasks: grounding, reasoning, and planning, focusing on small LMs. The objectives are:
Adjusting the weights of specialized LM agents working collaboratively in a multi-agent environment, going beyond traditional prompting, RAG, or fine-tuning.
Applying MRL to public benchmarks and Orange's use cases (e.g., resolving network or product issues).
Key challenges include identifying the optimal task decomposition, designing effective reward functions, and evaluating their performance. By cooperating, specialized agents can overcome individual limitations to solve complex tasks

Description du profil

Skills (Technical and scientific) and soft skills
You have experience in the fields of Artificial Intelligence, Machine Learning and particularly in deep learning.
You have a strong background in mathematics (numerical optimization, statistics, probability, etc.).
You are proficient in software development
You are proficient in read, written and spoken English
You are curious, attracted by new technologies, and ready to keep up with their evolutions - You enjoy working in a team, within multidisciplinary projects, and contributing to a common goal, while being autonomous in your activities
You have good analytical and synthesis skills
Proficiency in one of the following deep learning tools: Torch, pyTorch, TensorFlow, MXNet is desired
You like to communicate the results of your work through written reports and oral presentations preferable in English
Required training (master's degree, engineering degree, PhD, scientific and technical field, etc.)
Engineering degree and/or Research Master's degree, with knowledge in machine learning and in at least one of the fields listed above.
Desired experience (internships, etc.),
A first experience in the implementation of deep learning algorithms (as part of an internship for example) would be desired.

Salaire et avantages

CE

Postulez chez Orange

au poste de Thèse Multi-Agent Reinforcement Learning Dialogue Grounding Reasoning - CDD.

Par exemple : prenom.nom@domaine.com. Ce champ est obligatoire.
En cliquant sur "Postuler à cette offre", j'accepte les conditions générales d'utilisation du site Agefiph
Référence : 2026-51532