Back to glossary Technology

Context Window

Maximum amount of text (tokens) an AI model can process in a single query — a key LLM performance constraint.

What is a Context Window?

A context window is the maximum amount of text (measured in tokens) that an AI model can "see" simultaneously — including both input (prompt, documents, conversation history) and generated output.

Context window sizes

2024-2026 models offer increasingly larger windows: GPT-4o — 128K tokens (~300 text pages), Claude — 200K tokens, Gemini — up to 2M tokens. Despite this, the context window remains a constraint: more text means higher cost, longer latency, and potentially worse quality.

Context management strategies

In enterprise systems, smart context window management is key: RAG (provide only relevant fragments, not entire documents), context compression (summarizing conversation history), agent hierarchy (each agent operates on its own smaller context), and memory systems (persistent memory outside the context window).