Learning without training: The implicit dynamics of in-context learning

Jul 2025

Authors:

Benoit Dherin* · Google Research
Michael Munn* · Google Research
Hanna Mazzawi* · Google Research
Michael Wunder · Google Research
Javier Gonzalvo · Google Research

Abstract

In-context learning (ICL) has emerged as a powerful paradigm where language models can learn new tasks simply by observing examples without parameter updates. This paper explores the mechanisms behind ICL, examining how models leverage their pre-trained representations to adapt to new tasks through few-shot prompting. We investigate the relationship between pre-training objectives, model architecture, and ICL performance, revealing insights into how models can generalize from limited examples. Our analysis suggests that ICL effectiveness stems from the model's ability to recognize patterns in the prompt and apply learned representations in novel contexts. This work contributes to understanding the fundamental capabilities of large language models and their potential for flexible, sample-efficient learning across diverse domains.

Introduction

Traditional machine learning approaches require extensive training data and parameter updates to adapt to new tasks. In-context learning represents a paradigm shift, where models can learn new capabilities simply by observing a few examples in their input context. This ability has profound implications for AI systems, enabling rapid adaptation without retraining and opening new possibilities for flexible, sample-efficient learning.

What is In-Context Learning?

In-context learning occurs when a language model demonstrates improved performance on a task after seeing a few examples, without any parameter updates. The model essentially "learns" the task pattern from the examples provided in its input context.

Key Characteristics

ICL requires no gradient updates or fine-tuning—the model adapts purely through the information present in the input sequence.

Few-Shot Learning

Models can learn from just a handful of examples, making them highly sample-efficient compared to traditional approaches.

Task Flexibility

The same model can adapt to diverse tasks—from translation to reasoning—without architectural changes.

Zero-Shot Capability

Some models can even perform tasks without any examples, relying solely on their pre-trained knowledge.

Mechanisms of In-Context Learning

Our research investigates the underlying mechanisms that enable in-context learning in large language models.

Implicit Parameter Updates

We demonstrate that ICL can be understood as an implicit form of gradient descent, where the model's activations adapt to the task without explicit parameter updates.

Representation Reuse

Models leverage pre-trained representations, recombining them in ways that match the patterns observed in the few-shot examples.

Attention Dynamics

The attention mechanism plays a crucial role in ICL, enabling the model to selectively focus on relevant examples and extract task-specific patterns.

Experimental Results

We conducted extensive experiments to understand the factors that influence in-context learning performance.

Pre-training Objectives

Models trained with diverse objectives and data sources demonstrate stronger ICL capabilities, suggesting that exposure to varied tasks during pre-training enhances adaptability.

Model Scale

ICL ability scales with model size, with larger models showing dramatically improved few-shot performance, indicating that capacity plays a crucial role in this capability.

Example Selection

The choice and ordering of examples significantly impact ICL performance, with diverse, representative examples yielding the best results.

Theoretical Framework

We propose a theoretical framework for understanding in-context learning that connects it to traditional learning paradigms.

ICL can be formalized as implicit meta-learning during pre-training
The transformer architecture inherently supports rapid adaptation through its attention mechanism
Pre-training creates a rich representation space that enables generalization from few examples
ICL performance is bounded by the diversity and coverage of the pre-training distribution

Applications and Implications

In-context learning has significant implications for AI applications and future research directions.

Rapid prototyping of new AI capabilities without specialized training
Personalization of models to individual users through examples
Low-resource adaptation to specialized domains
Flexible multi-task systems that can switch between capabilities on demand
Sample-efficient learning for rare or novel tasks

Limitations and Future Work

While in-context learning represents a powerful paradigm, it has several limitations that present opportunities for future research.

Context length constraints limit the number of examples that can be provided
Performance varies significantly across tasks and example selections
ICL remains less effective than fine-tuning for some complex tasks
Theoretical understanding of ICL remains incomplete
Limited ability to retain learned patterns across separate inference sessions

Conclusion

In-context learning represents a fundamental shift in how we think about machine learning, moving beyond the traditional paradigm of explicit parameter updates. Our research provides insights into the mechanisms behind this capability, demonstrating that it emerges from the interplay between pre-training, model architecture, and the rich representational capacity of large language models. As we continue to explore and enhance ICL capabilities, we can expect increasingly flexible, sample-efficient AI systems capable of rapid adaptation to new tasks and domains.

For more details, see the original paper: Learning without training: The implicit dynamics of in-context learning