Glossary

The vocabulary of this revolution.

A glossary of 50 terms to understand the language of generative AI — from token to scaling laws, jargon kept to a minimum. Written for those who want to follow the debates without feeling left out.

A

AGI

Artificial General Intelligence
Safety & ethics

An AI as versatile as a competent human.

Artificial General Intelligence refers to a hypothetical system able to match or surpass humans on most cognitive tasks, not just a few. The term is debated — its definition is not universally agreed upon, and the industry increasingly uses it as a marketing goal.

AI capex

AI infrastructure spending
Economy & strategy

Capital spending on AI equipment (GPUs, datacenters).

Microsoft, Google, Amazon and Meta collectively spent over $300 billion per year on AI infrastructure by late 2025. The current debate: is this capex justified by today's revenues or does it rest on a promise?

API

Application Programming Interface
Economy & strategy

The technical interface that lets software call a model.

An API exposes a model's capabilities to developers. It's the main channel through which AI labs monetize their models for enterprise customers — token billing, contracts, quotas.

ARR

Annual Recurring Revenue
Economy & strategy

Standardized SaaS revenue metric, projected over 12 months.

Central indicator for comparing AI labs' commercial traction. OpenAI's and Anthropic's ARR is regularly reported in the financial press as a barometer of the race.

ASI

Superintelligence
Safety & ethics

An AI that would significantly surpass humans in every domain.

A hypothetical level beyond AGI, where a system surpasses the best human experts in every field. A central concept in debates on existential risk and AI governance.

Agent

AI agent
Capabilities

A model that can act — not just answer.

An agent is an AI model that doesn't just chat: it runs code, reads files, browses the web, calls other tools, and chains actions to reach a goal. Claude Code, Cursor and Aider are examples of coding agents.

Alignment

Alignment
Safety & ethics

Ensuring a model acts according to human intentions.

The discipline grouping techniques aimed at ensuring a model remains helpful, honest and harmless. Includes reinforcement learning, value charters (Constitutional AI), adversarial evaluations, and usage policies.

Attention

Self-attention
Architectures

A mechanism that lets a model weigh the importance of different elements in an input.

When a model reads a sentence, attention lets it connect each word to all the others and decide which ones matter for what comes next. It's the key innovation of *Attention Is All You Need* (2017), the root of the current revolution.

B

Benchmark

Evaluation
Practices

Standardized test to measure a model's performance.

Suite of tests (MMLU, HumanEval, GPQA, SWE-bench…) on which labs compare their models. Benchmarks are essential for marketing but often criticized for saturation, leaking into training corpora, or weak ties to real-world utility.

Bias

Bias
Safety & ethics

Systematic distortion a model inherits from its training data.

Models reproduce and sometimes amplify biases present in their corpora: stereotypical representations, cultural gaps, over-representation of certain languages. Measuring and mitigating bias is an ongoing effort.

C

Chain of Thought

CoT
Capabilities

A technique where the model makes its reasoning explicit, step by step.

By inviting the model to "think aloud" before answering, performance improves drastically on reasoning tasks (math, logic, code). So-called *reasoning* models (o1, o3, Claude with extended thinking) industrialize this approach.

Computer vision

CV
Capabilities

The field of AI dealing with image and video.

From object recognition to image generation through medical scene analysis, computer vision is today integrated into the most advanced multimodal LLMs.

Constitutional AI

IA constitutionnelle
Practices

Alignment method using a charter of principles, developed by Anthropic.

Instead of learning only from human feedback, the model is trained to critique itself against a written charter (the "constitution"). This is Anthropic's signature approach to aligning Claude.

Context window

Fenêtre de contexte
Capabilities

The maximum number of tokens a model can read at once.

Measures how much information a model can handle at once: a prompt, a document, a conversation history. Modern models range from 8,000 to 2 million tokens. Beyond that, the model "forgets" the beginning.

Corpus

Training data · Dataset
Fundamentals

The set of texts (or images, audio…) used to train a model.

Frontier LLMs are trained on corpora exceeding a petabyte: Common Crawl of the web, books, code, scientific papers, conversations. The quality and composition of the corpus largely determine what the model knows — and its biases.

D

Deep learning

Deep learning
Fundamentals

Machine learning with multi-layer neural networks.

A subfield of machine learning based on deep neural networks (tens to thousands of layers). It's the paradigm dominating AI since 2012, enabling LLMs, modern computer vision and image generation.

Distillation

Knowledge distillation
Practices

Compress a large model into a smaller one that copies its outputs.

A smaller "student" is trained to imitate a larger "teacher". The result: a cheaper model to run that keeps most capabilities. Many "compact" models (Haiku, GPT-4o-mini) are distillations.

E

Embedding

Semantic vector
Fundamentals

A numerical representation of the meaning of a word, sentence or document.

The model converts "cat" and "dog" into close vectors in a several-hundred-dimensional space because they share meaning. Embeddings are the foundation of semantic search and RAG.

Emergence

Emergent capabilities
Architectures

A capability that appears abruptly beyond a certain scale.

Some abilities (multi-step reasoning, zero-shot translation) aren't observed in small models then suddenly appear past a size or training threshold. The phenomenon is partially contested — it depends on the metrics used.

F

FLOPS

Floating-Point Operations Per Second
Economy & strategy

Unit of computing power, in operations per second.

Frontier models now require around 10²⁵ FLOPS to train. The European *AI Act* and some US export regimes set FLOPS thresholds above which a model is considered "systemically risky".

Few-shot

In-context learning
Practices

Learning a task from a few examples placed in the prompt.

Instead of fine-tuning a model for a new task, we give it two or three examples in the prompt — it generalizes from these demonstrations. A major advantage of LLMs over older models.

Fine-tuning

Affinage
Practices

Continuing the training of an existing model on specific data.

Take a pre-trained model and specialize it for a domain (legal, medical, internal vocabulary…) with a much smaller dedicated corpus. Cheaper than full training, more effective than a simple prompt.

Foundation model

Modèle de fondation
Architectures

A pre-trained model that serves as the base for many applications.

Term popularized by Stanford in 2021 for very large versatile models (GPT-4, Claude, Gemini, Llama) that serve as a base for thousands of specialized applications via fine-tuning or prompting.

Frontier model

Modèle frontière
Economy & strategy

The most capable models at a given moment — the "cutting edge".

Subset of foundation models at the leading edge of capabilities, generally held by 4-5 players (OpenAI, Anthropic, Google, Meta, xAI, DeepSeek). A category targeted by emerging regulations because of its potential impact.

G

GPU

Graphics Processing Unit
Economy & strategy

Processor designed for massive parallel computation — the workhorse of AI.

Nvidia's GPUs (H100, H200, B200) are the raw material of generative AI. Their availability conditions who can train what. Allocation among labs has become a geopolitical issue.

Generative AI

GenAI
Fundamentals

A family of models that produce new content rather than classify.

Refers to models that create — text, image, audio, video, code — as opposed to purely "discriminative" models that classify. Generative AI triggered the media wave since 2022.

H

Hallucination

Confabulation
Safety & ethics

When a model fabricates information with confidence.

LLMs, by design, predict the most probable word — not the true one. When information is missing, they invent plausibly. Fictional citations, made-up dates, nonexistent functions: the risk is real and must be taken seriously.

I

Inference

Inference
Fundamentals

Running an already-trained model to produce a response.

While a model is trained once, it is used billions of times. The inference cost (per request) determines an AI service's profitability. Competition increasingly hinges on inference efficiency.

J

Jailbreak

Safety & ethics

A technique for bypassing a model's safety guardrails.

A malicious prompt or exploit that makes a model break its own rules (generate prohibited content, leak its system prompt, etc.). Labs dedicate teams (red teams) to it in a permanent adversarial mode.

L

LLM

Large Language Model
Fundamentals

An AI model trained to predict text at very large scale.

A Large Language Model is a neural network (usually a Transformer) trained on massive text corpora to predict the next word. Claude, ChatGPT, Gemini and Mistral are all LLMs. Their size is measured in parameters — from a few billion to several trillion.

M

MCP

Model Context Protocol
Capabilities

Open protocol for connecting models to tools and data.

A standard introduced by Anthropic in late 2024 to standardize how an AI agent accesses files, APIs, databases. Widely adopted across the ecosystem — Cursor, Continue, and most AI editors support it.

Machine learning

ML
Fundamentals

The discipline where machines learn from data rather than from hand-written rules.

Rather than writing instructions by hand, we show the machine many examples; it adjusts its internal parameters to reproduce the observed behaviour. It's the foundation of all modern AI.

MoE

Mixture of Experts
Architectures

Architecture where only a few of the model's "experts" activate per request.

Instead of activating every parameter for each token, a MoE routes the request to a few specialized sub-networks. Allows training huge models (several trillion parameters) while keeping inference cost reasonable. GPT-4, Mixtral, DeepSeek-V3 use this technique.

Multimodal

Capabilities

A model that handles multiple input types: text, image, audio, video.

Modern models (GPT-4o, Claude 4, Gemini) are no longer limited to text: they can read images, hear audio, sometimes generate images or video. Multimodality considerably expands the range of use cases.

O

Open source

Economy & strategy

Source code AND weights accessible, with permissive use and modification license.

Strictly speaking, an open source model publishes its code, weights, data and training methods under a free license. To be distinguished from "open weights", which is more restrictive. True examples: OLMo, Pythia. Often conflated with open weights in the press.

Open weights

Poids ouverts
Economy & strategy

The model's weights are downloadable, but the data and training stay closed.

Llama, Mistral, Qwen, DeepSeek publish their weights — anyone can download and run the model. But they don't publish their corpora or all their methods. This is the most common definition of "openness" in AI today.

P

Parameter

Weight
Fundamentals

The model's internal numbers that get adjusted during training.

A modern LLM contains from a few billion to several trillion parameters — each a real number adjusted during training. More parameters = more capacity, but also more training and inference cost.

Prompt

Invite
Practices

The input text given to a model to get a response.

Everything a model receives before answering: your question, context, system instructions, history. Prompt quality directly affects output quality — hence the rise of "prompt engineering" as a skill.

Prompt engineering

Ingénierie de prompt
Practices

The art of crafting effective prompts to get the most out of a model.

A set of empirical techniques: task decomposition, inserted examples (few-shot), chains of thought, assigned roles. A skill in its own right, closer to editing than to classic engineering.

Q

Quantization

Quantification
Practices

Reducing the numerical precision of parameters to speed up inference.

Instead of storing each parameter in 32 or 16 bits, we bring them down to 8, 4 or even 2 bits. The model loses a bit of quality but becomes much faster and lighter — useful for running LLMs on a laptop or phone.

R

RAG

Retrieval-Augmented Generation
Capabilities

Combining an LLM with document retrieval to ground its answers.

Before answering, the system searches a document base for relevant passages and injects them into the prompt. The model then relies on verifiable sources — reduces hallucinations, lets you query internal documents.

RLHF

Reinforcement Learning from Human Feedback
Practices

Fine-tuning a model from human preferences.

Humans compare two of the model's answers and indicate which is better. The model learns to produce answers judged better. A standard technique since ChatGPT for aligning LLMs with human expectations.

Reasoning

Reasoning
Capabilities

A model's ability to chain several logical steps before answering.

A new generation of models (o1, o3, Claude *thinking*, Gemini 2.5) takes time to "think" before producing an answer — decomposition, verification, backtracking. Increased performance on math, code, planning.

Red team

Safety & ethics

Internal team whose mission is to attack the model to find its flaws.

A practice borrowed from cybersecurity: before every major release, a team simulates malicious uses (jailbreaks, exploitable biases, data leaks) to identify and fix vulnerabilities.

S

Scaling laws

Lois d'échelle
Economy & strategy

Empirical relationship between a model's size, data, compute, and performance.

Discovered by OpenAI then DeepMind (Chinchilla, 2022): doubling parameters or data improves the model predictably. This regularity has justified the massive investments of the last five years — and its potential plateau is today's central question.

T

Token

Fundamentals

The basic unit an LLM handles — a word, a fragment, a symbol.

An LLM doesn't see words but tokens — usually 3-5 character fragments. "Hello" may be 1 token, "antidisestablishmentarianism" takes 6 or 7. It's also the unit by which APIs are billed.

Tool

Tool use · Function calling
Capabilities

A model's ability to call external functions in order to act.

A tool-equipped model can decide to run code, read a file, search the web, call a weather API. This is the layer that turns a chatbot into an autonomous agent.

Training

Pre-training
Fundamentals

The model's learning phase, where its parameters get adjusted.

We expose the model to massive amounts of data and progressively adjust its billions of parameters so it produces the expected outputs. It's the costliest step — several million to several hundred million dollars for frontier models.

Transformer

Architectures

The neural network architecture behind all modern large models.

Introduced in 2017 in *Attention Is All You Need* by Google, the Transformer replaces recurrent architectures with the attention mechanism. The entire LLM revolution (GPT, BERT, Claude, Gemini) flows from it.

Z

Zero-shot

Practices

Performing a task without any prior example, from the instruction alone.

An emblematic LLM capability: we ask them directly to translate, summarize, classify — without ever showing an example of the task in the prompt. The flip side of few-shot.

Un terme vous échappe encore ?

Suggérez-le-nous, ou demandez une clarification — nous étoffons le lexique au fur et à mesure des questions reçues.
contact@ryuzakilabs.com

Pour les développeurs : lexicon.json à la racine du dépôt.