THE GENERATIVE FRICTION TRILOGY: LEARNING TO INTERVENE, EVALUATE, AND COLLABORATE IN MULTI-AGENT COLLABORATION
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Abstract
Large language models are increasingly deployed as collaborative partners in multi-turn, multi-party tasks, where they must reason jointly with others rather than act in isolation. In such settings, collaboration often requires generating frictional interventions—strategic prompts that slow the group down, surface hidden disagreements, or encourage reflection—rather than direct answers. A key challenge is that these interventions do not directly determine how a collaboration unfolds: collaborators may ignore, reinterpret, or partially incorporate them, and the resulting dialogue trajectory depends on how all participants respond. This mediation breaks the assumptions underlying standard alignment and reinforcement learning methods, which implicitly treat an agent’s chosen action as directly determining outcomes. This dissertation approaches this challenge through a trilogy of complementary perspectives. First, we introduce the Frictional Agent Alignment Framework (FAAF), which provides the first principled formulation of friction intervention learning that generalizes to previously unseen collaborative settings. Second, we extend this framework to realistic collaborative settings where different intervention agents interact with collaborators over multiple turns, producing divergent dialogue trajectories. We formalize collaborative dialogue as a Modified-Action Markov Decision Process (MAMDP) where collaborators modify interventions before they affect the environment, and introduce a counterfactual roleplay framework that isolates causal effects while controlling for collaborator dynamics. We show that friction interventions can accelerate common-ground convergence, but that even optimal interventions fail without appropriate collaborator responses. Third, having shown that collaborator responses determine outcomes, we develop the Interruptible Collaborative Roleplayer (ICR) to train effective collaborators, through a constrained optimization objective that combines task utility maximization with counterfactual invariance—robustness to spurious features in interventions. Empirical results show ICR-trained collaborators outperform frontier models and achieve superior common-ground convergence without explicit team-level training signals. Practically, this work provides the first end-to-end framework for training and evaluating both sides of collaborative AI systems, moving beyond dyadic assistant-user interactions toward principled small-group collaboration where friction serves intelligence rather than impedes it. By bridging offline alignment that is robust to data-bias, causal evaluation, and collaborator training, we establish friction-aware alignment as a fundamental challenge at the intersection of NLP and multi-agent RL, prove that standard single-agent methods are suboptimal for collaboration, and provide novel protocols for the realistic multi-party collaboration toward which LLM deployment is moving.
Description
Rights Access
Subject
Large Language Models
Natural Language Processing
Artificial Intelligence
Reinforcement Learning
Machine Learning
