How does a chatbot work?

Understanding what's actually happening inside an AI chatbot helps demystify both its capabilities and its limitations.

What Is a Large Language Model?

At its core, a chatbot like ChatGPT, Claude, or Gemini is powered by a Large Language Model (LLM)—a sophisticated pattern-matching system trained on vast amounts of text. The "large" refers to both the training data (often hundreds of billions of words) and the model's parameters (the adjustable values that relate words to other words and encode their relative meanings).

An LLM doesn't "know" things the way humans do. It predicts what text should come next based on patterns it learned during training. When you ask it a question, it's generating a statistically plausible response, not retrieving facts from a database.

What Does 'Training' Actually Mean?

Training is where human decisions fundamentally shape what the AI becomes. It happens in stages:

1. Data Curation: Human engineers decide what goes into the training dataset. This includes books, websites, academic papers, code repositories, and more. What's included, and excluded, shapes what the model can and cannot do. Biases in this data become biases in the model. Data in may be outputted verbatim without attribution.

2. Pre-training: The model learns to predict the next word in a sequence, billions of times over. Whether it gets this right or not then tweaks some parameters in a ‘model’ of language. Through this process, it develops a mathematical representation of language, facts, reasoning patterns, and even some common-sense knowledge. But it's learning correlations, not truth.

3. Fine-tuning: Human engineers further train the model on curated examples of desired behavior: how to be helpful, how to refuse harmful requests, how to admit uncertainty. This is where the model learns to be a useful assistant rather than just a text predictor.

4. Safety Mechanisms: Humans design and implement guardrails: content filters, refusal behaviors, output monitoring. These aren't emergent properties of the AI, they're deliberate engineering choices made by people who anticipate potential harms.

At every stage, human judgment determines what the model learns and how it behaves.

From Input to Output: What Happens When You Send a Message

When you type a prompt and hit send, here's the pipeline your message travels through:

Step 1: Tokenisation

Your text gets broken into "tokens"—chunks that might be words, parts of words, or punctuation. The sentence "How are you today?" might become: ["How", " are", " you", " today", "?"]. The model doesn't see letters or words; it sees token IDs (numbers).

Step 2: Embedding

Each token gets converted into a high-dimensional vector: a long list of numbers that represents the token's meaning in context. Words with similar meanings end up with similar vectors. This is how the model captures that "happy" and "joyful" are related.

Step 3: Self-Attention

The model looks at all the tokens in your input simultaneously and calculates how much each token should "pay attention" to every other token. This is how it understands that in "The cat sat on the mat because it was tired," the word "it" refers to the cat, not the mat.

Self-attention happens across many "layers", with each layer building more abstract representations of meaning.

Step 4: Prediction

After processing through many attention layers, the model outputs a probability distribution over all possible next tokens. It might predict there's a 15% chance the next token is "Hello", a 12% chance it's "I", an 8% chance it's "The", and so on across its entire vocabulary.

The model then samples from this distribution (sometimes picking the most likely token, sometimes introducing randomness for variety) and repeats the process for each subsequent token until the response is complete.

Step 5: Output Filtering and Safety Checks

Before you see the response, it may pass through additional safety systems, such as classifiers that check for harmful content, filters that catch policy violations, or other guardrails. These are separate systems designed by humans to catch problems the base model might miss.

Tool Use and Structured Outputs

LLMs have a fundamental limitation: they're trained on text, not on doing things. They can discuss mathematics but aren't actually performing calculations, they're predicting what a correct calculation would look like (and sometimes getting it wrong).

To address this, engineers have built tool use capabilities:

Code Execution: When you ask a chatbot to calculate something complex, it might write Python code, execute it in a sandboxed environment, and return the actual computed result. The LLM handles the reasoning; the code interpreter handles the math.

Web Search: The model can trigger a search query, receive results, and incorporate real-time information into its response, compensating for its static training data.

Structured Outputs: Engineers can constrain the model to output valid JSON, fill in forms, or follow specific formats. This makes AI outputs more reliable for integration with other software systems.

API Calls: The model can interact with calendars, databases, or other services through well-defined interfaces.

These aren't capabilities the model learned on its own. They're human-designed extensions that compensate for the model's inherent limitations, essentially giving a text predictor the ability to take actions in the world.

How Did We Get Here? Scale and compute.

The core architecture behind today's most capable AI chatbots, the Transformer (what we just described above), was invented in 2017.

The fundamental approach hasn't changed dramatically since then.

What changed was scale.

Researchers discovered "scaling laws": predictable relationships showing that model performance improves reliably as you increase three things:

The number of parameters (model size)
The amount of training data
The compute used for training

Once this was understood, progress became less about clever algorithmic innovations and more about:

Building bigger models — from millions to billions to hundreds of billions of parameters
Assembling larger datasets — scraping more of the internet, digitising more books, collecting more code
Investing in compute — training runs that cost tens or hundreds of millions of dollars
Expanding permissions — giving models access to tools, the web, and the ability to take actions

The jump from GPT-3 to GPT-5, or from earlier Claude versions to current ones, came primarily from scaling up resources and expanding what the models were allowed to do, not from fundamental architectural breakthroughs.

This has important implications: current AI capabilities are largely the result of industrial-scale investment, not secret innovations. And it means we may be approaching limits where simply adding more scale yields diminishing returns.

The Bottom Line

An AI chatbot is:

A statistical pattern matcher trained on human-generated text
Shaped at every stage by human decisions about data, training, and safety
Extended by human-designed tools to compensate for its limitations
The product of scale and investment more than novel technology

Understanding this helps set appropriate expectations: these systems are powerful and useful, but they're not magic, they're not infallible, and they're deeply shaped by the humans who build them.

A Quick Guide to AI

A Quick Guide to AI

Back to all AI resources

Responsible AI Due Diligence Tool

Introduction

Glossary

How does a chatbot work?

How do other AI systems work?

How does an AI agent work?

Why does trustworthy AI matter?

The cost of compute

How does a chatbot work?

What Is a Large Language Model?

What Does 'Training' Actually Mean?

From Input to Output: What Happens When You Send a Message

Step 1: Tokenisation

Step 2: Embedding

Step 3: Self-Attention

Step 4: Prediction

Step 5: Output Filtering and Safety Checks

Tool Use and Structured Outputs

How Did We Get Here? Scale and compute.

The Bottom Line