RMRM Full Stack & AI Engineer · All guides · Roadmaps
AI & ML · guide

What is a Context Window?

A context window is the maximum amount of text (measured in tokens) that a large language model (LLM) can process in a single interaction — including both the input you provide and the output it generates.

The Basic Concept

Every LLM processes text as a fixed-size block of tokens, where a token is roughly 3–4 characters or about 0.75 words in English. The context window defines the hard upper limit on how many tokens the model can 'see' at once. Anything outside that window is invisible to the model — it has no memory of it unless you explicitly include it again.

What Counts Toward the Limit

The context window is consumed by everything in a single request: the system prompt, conversation history, user messages, retrieved documents, and the model's own generated reply. For example, a 128,000-token window might hold roughly 96,000 words — about the length of a full novel. Developers must carefully budget token usage across all these components.

Why It Matters

The context window directly determines how much information a model can reason over in one shot — longer windows enable summarizing large documents, maintaining long conversations, or processing entire codebases. Smaller windows force developers to use chunking, retrieval-augmented generation (RAG), or conversation truncation strategies. Choosing the right model for a task often comes down to its context window size versus its cost per token.

How Attention Mechanisms Use the Window

Transformer-based LLMs use self-attention to relate every token in the context to every other token, which is what gives the model its coherent understanding of the full input. This attention computation scales quadratically with sequence length (O(n²)), making very large context windows computationally expensive. Techniques like sliding-window attention and sparse attention are used to manage this cost at extreme lengths.

Key Gotcha: Long Does Not Mean Perfect Recall

A common misconception is that placing information anywhere inside a large context window guarantees the model will use it accurately. Research shows LLMs often suffer from 'lost in the middle' degradation — they attend best to content at the very start and very end of the context, and can miss critical details buried in the middle. Always place the most important information near the beginning or end of your prompt for best results.

Go deeper with an AI tutor that teaches this in context — and quizzes you on it.
Open the app — free to start

© RM Full Stack & AI Engineer · All guides · Roadmaps · Open the app