Repository logo

THE GENERATIVE FRICTION TRILOGY: LEARNING TO INTERVENE, EVALUATE, AND COLLABORATE IN MULTI-AGENT COLLABORATION

Abstract

Large language models are increasingly deployed as collaborative partners in multi-turn, multi-party tasks, where they must reason jointly with others rather than act in isolation. In such settings, collaboration often requires generating frictional interventions—strategic prompts that slow the group down, surface hidden disagreements, or encourage reflection—rather than direct answers. A key challenge is that these interventions do not directly determine how a collaboration unfolds: collaborators may ignore, reinterpret, or partially incorporate them, and the resulting dialogue trajectory depends on how all participants respond. This mediation breaks the assumptions underlying standard alignment and reinforcement learning methods, which implicitly treat an agent’s chosen action as directly determining outcomes. This dissertation approaches this challenge through a trilogy of complementary perspectives. First, we introduce the Frictional Agent Alignment Framework (FAAF), which provides the first principled formulation of friction intervention learning that generalizes to previously unseen collaborative settings. Second, we extend this framework to realistic collaborative settings where different intervention agents interact with collaborators over multiple turns, producing divergent dialogue trajectories. We formalize collaborative dialogue as a Modified-Action Markov Decision Process (MAMDP) where collaborators modify interventions before they affect the environment, and introduce a counterfactual roleplay framework that isolates causal effects while controlling for collaborator dynamics. We show that friction interventions can accelerate common-ground convergence, but that even optimal interventions fail without appropriate collaborator responses. Third, having shown that collaborator responses determine outcomes, we develop the Interruptible Collaborative Roleplayer (ICR) to train effective collaborators, through a constrained optimization objective that combines task utility maximization with counterfactual invariance—robustness to spurious features in interventions. Empirical results show ICR-trained collaborators outperform frontier models and achieve superior common-ground convergence without explicit team-level training signals. Practically, this work provides the first end-to-end framework for training and evaluating both sides of collaborative AI systems, moving beyond dyadic assistant-user interactions toward principled small-group collaboration where friction serves intelligence rather than impedes it. By bridging offline alignment that is robust to data-bias, causal evaluation, and collaborator training, we establish friction-aware alignment as a fundamental challenge at the intersection of NLP and multi-agent RL, prove that standard single-agent methods are suboptimal for collaboration, and provide novel protocols for the realistic multi-party collaboration toward which LLM deployment is moving.

Description

Rights Access

Subject

Large Language Models

Natural Language Processing

Artificial Intelligence

Reinforcement Learning

Machine Learning

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By