How Does Generative AI Work?

Training – The Creation of a GenAI Model

Abstract illustration of AI model training process

When you use a Generative AI tool, you are using a tool that has been created by a complex and expensive process. To help you develop an understanding of this process, I’m going to rely a bit on an analogy that isn’t perfect but will help with your understanding, at least at a high level.

Think about how a child learns language – through exposure to countless conversations, books, and interactions. Large Language Models (LLMs) learn in a similar way, but instead of years of real-world experience, they learn from vast amounts of text from the internet, books, and other sources. Here’s a basic overview of how these AI models are trained:

Gathering and Preparing the Data:
Companies collect enormous amounts of data from sources like books, websites, and articles. This data needs to be carefully cleaned and filtered – removing inappropriate content, errors, and low-quality information. Just like you wouldn’t want a child learning from incorrect or harmful material, the AI needs good quality data to learn from.
Basic Training:
The AI begins recognizing patterns in all this text – not just learning words, but understanding how they typically appear together and relate to each other. Unlike a child who learns strict grammar rules, the AI learns by spotting patterns in millions (or even billions) of examples. It starts recognizing which words often appear together, what typically follows certain phrases, and how ideas are usually expressed.
Alignment:
The model is then trained to be helpful and safe – like teaching good manners and proper behavior. The training involves guiding the AI towards providing useful, appropriate responses while limiting harmful or biased content. This process is called alignment.
Testing and Release:
The AI is thoroughly tested to make sure it works well and safely. Once ready, it’s made available for people to use, but continues to be monitored and improved over time.

These AI models are continuously being improved and refined. However, it’s important to remember that despite this sophisticated training process, these models are still pattern-matching tools – they don’t truly understand or think like humans do. They’re incredibly powerful assistants, but they need human guidance and oversight to be used effectively.

Inference – Using a GenAI Model

When you use a GenAI chatbot, you are interacting with a user interface but behind the scenes is a GenAI model working hard to calculate a response to your prompts. This calculation process is called inference.

Here is what happens during inference:

Processing Your Input:
The model splits your text into small chunks called tokens (imagine splitting a sentence into individual words and parts of words – like breaking “playing” into “play” and “ing”). This helps the model understand your input piece by piece.
Using Its “Knowledge”:
The model then uses all the patterns it learned during its training – kind of like how you use everything you’ve learned in school to answer a question. But instead of actively searching through books, the model uses its “memory” (the patterns stored in its system) to figure out what might come next.
Making Smart Predictions:
The model looks at your input and what it has started writing, considering everything together (like how you consider the whole question before answering). It then figures out what would make the most sense to say next, one small piece at a time.
Adding Some Variety:
The model has special settings (like a creativity dial) that control how creative or focused its response should be. A higher setting means more creative and varied responses, while a lower setting means more focused and predictable ones.
Building the Response:
Finally, the model builds its response one small piece at a time. Each new piece it adds is influenced by everything that came before it – both in your prompt and in the response it’s creating. It’s like building with blocks, where each new block needs to fit with all the others.

Remember: The model isn’t thinking or understanding like a human does. It’s following patterns it learned during training to create responses that make sense based on your input. It’s more like a very sophisticated pattern-matching tool than a thinking brain.

Going Deeper

Must Read Articles

Videos on Generative AI and Large Language Models (LLMs)

The web has countless resources to help you learn more about the inner workings of GenAI. Here are some of my favourites, organized by complexity and approach:

Generative AI in a Nutshell

This is one of those “drawing and talking” videos that discusses how GenAI works and how to use it. It’s a fun video that doesn’t require any technical background.

LLMs for Curious Beginners (3Blue1Brown)

Grant Sanderson provides a brief overview of Large Language Models with his signature visual explanations. Perfect for those who want to understand the basics.

Transformers and Attention (3Blue1Brown)

A deeper dive into the transformer architecture and attention mechanism that powers modern LLMs. Requires some mathematical background.

What is a Neural Network? (3Blue1Brown)

First video in a comprehensive series on Deep Learning. This 2017 series was well ahead of the ChatGPT explosion and provides foundational knowledge.

The Busy Person’s Intro to LLMs

Andrej Karpathy, OpenAI founder and former Tesla AI Director, provides a general-audience tutorial on LLMs with technical depth.

AI Literacy Glossary: An Introduction

This glossary is designed to provide clear, concise definitions of key concepts, models, and techniques in Artificial Intelligence (AI) and Generative AI. It serves as a reference for educators, learners, and anyone seeking to build a foundational understanding of this rapidly evolving field.

AI is transforming industries and reshaping how we learn, work, and create. From understanding the basics of machine learning to exploring the cutting-edge capabilities of generative models, this glossary covers essential terms that demystify AI and its applications.

These definitions are intended to support your understanding of AI at a broad level and provide clarity on important terms and ideas. While they are not exhaustive or technical explanations, they aim to give you a solid starting point for engaging with AI concepts and exploring their relevance to real-world applications. Whether you're navigating AI for the first time or deepening your knowledge, this glossary is here to make these complex ideas more accessible.

The Glossary is broken into 2 sections:

Commonly used AI terms - these are the terms that we encounter in everyday usage of generative AI
(Moderately) Technical Terms - these are terms that you will encounter as a layperson if you are going for a deeper understanding of how generative AI works

Commonly Used AI Terms

AI Agent

Computer programs or systems that are capable of performing autonomous actions, making decisions, and interacting with their environment to achieve specific goals or tasks. They can adapt, learn, and optimize their performance over time.

AI Literacy

The knowledge and skills required to understand, interact with, and critically evaluate AI tools and technologies in various contexts, including education.

Alignment

The process of ensuring AI systems align with human values, ethical considerations, and intended goals, particularly in educational contexts.

Artificial General Intelligence (AGI)

AGI refers to a (so far) hypothetical class of AI systems capable of understanding, learning, and applying knowledge across a broad range of tasks and domains, comparable to human cognitive abilities. Unlike Narrow AI, which excels at specific tasks, AGI would exhibit the versatility and adaptability needed to solve unfamiliar problems, reason abstractly, and transfer knowledge between domains.

The concept of AGI remains difficult to define precisely due to the complexity and breadth of human intelligence itself. AGI systems would not only need to achieve human-level performance in diverse areas but also demonstrate attributes such as common sense, self-awareness, and the ability to set and pursue goals autonomously.

Achieving AGI is widely regarded as the ultimate challenge in AI research. It would require breakthroughs in understanding cognition, developing highly scalable and flexible learning architectures, and addressing foundational issues such as alignment with human values and safety. While AGI remains a theoretical construct, its pursuit drives significant debate and innovation, with implications spanning scientific discovery, ethics, and the future of human-AI collaboration.

Artificial Intelligence

Artificial Intelligence (AI) encompasses a broad field of computer science focused on creating systems capable of performing tasks that typically require human intelligence. These tasks include problem-solving, decision-making, understanding natural language, recognizing patterns, and adapting to new information. AI systems range from simple algorithms powering recommendations to sophisticated models generating human-like text or identifying diseases in medical imaging.

The definition of AI is fluid and evolves alongside advancements in technology and our understanding of intelligence. Historically, AI has been divided into two categories: Narrow AI, designed to excel at specific tasks (e.g., language translation, facial recognition), and the aspirational Artificial General Intelligence (AGI), which would exhibit human-like cognitive versatility.

Bias

Bias in AI refers to systematic errors in the output of AI models, often resulting from unrepresentative training data or flawed algorithms, leading to unfair or unequal treatment of certain groups or individuals.

Chatbots

Chatbots are AI-powered programs designed to simulate human conversation, often used in customer service, information retrieval, and interactive communication online. This is the “chat” part of ChatGPT

Data Privacy

Data privacy encompasses the practices and policies that ensure confidential and sensitive data, including personal information, is secure, protected, and used responsibly in AI and other digital systems.

Ethics in AI

This refers to the moral principles guiding the design, deployment, and use of AI technologies, ensuring fairness, transparency, and respect for human rights.

Fine-Tuning

The process of refining a pre-trained AI model to specialize in specific tasks or domains, often using a smaller dataset relevant to the intended application

Foundational Models

Foundational Models are pre-trained machine learning models that serve as the base for customized applications across a diverse range of tasks and domains. These models are trained on extensive datasets to capture a broad spectrum of knowledge and skills, and they can be fine-tuned and adapted to specific tasks or industries. Foundational models are akin to generalists in the AI world, offering a versatile and robust starting point for specialized applications, including in natural language processing, computer vision, and beyond.

Frontier Models

Frontier Models are cutting-edge AI systems that push the boundaries of machine learning capabilities, representing the latest advancements in scale, performance, and innovation. These models are designed to tackle highly complex tasks, often exceeding the capabilities of foundational models in specialized or emergent domains. Frontier models are typically developed using state-of-the-art techniques, massive computational resources, and novel architectures. They pave the way for groundbreaking applications in areas such as advanced robotics, multi-modal AI, and scientific discovery, while also setting new benchmarks for safety, alignment, and ethical considerations in AI development

Generative AI (GenAI)

Generative Artificial Intelligence refers to a class of AI systems designed to create new content, such as text, images, audio, video, or code, based on patterns and knowledge learned from existing data. Unlike traditional AI systems that focus on classifying, predicting, or analyzing data, generative AI produces outputs that resemble human-created content, opening new frontiers in creativity and automation.

Generative AI systems, such as Large Language Models (LLMs) like GPT or image generators like DALL·E, rely on sophisticated architectures, such as transformers, to learn and replicate the underlying structure of data. These models are trained on vast datasets, enabling them to generate coherent, contextually relevant, and high-quality content in response to prompts or inputs.

Hallucinations

In the context of AI, hallucinations refer to the instances where models generate incorrect, nonsensical, or unrealistic information. These mistakes can be very obvious or very subtle, so always review the output from a generative AI if you need it to be true and correct.

Large Language Models (LLMs)

Large Language Models are complex AI systems that have been trained on enormous volumes of text data to comprehend, generate, and manipulate human-like text effectively. These models, equipped with millions or even billions of parameters, excel in tasks ranging from content creation and translation to question answering and beyond. LLMs can understand context, generate coherent and contextually relevant content, and are integral in various applications including conversational AI, automated content generation, and aiding in problem-solving scenarios.

Multi-Modal AI

AI systems capable of processing and generating content across multiple data types (e.g., text, images, audio, video) to enable richer and more interactive educational experiences

Prompt

the input or query provided to the chatbot by the user. It's the message or question given to the chatbot to elicit a response or information. The chatbot then processes the prompt and generates a relevant reply based on its programming and the data it has been trained on

Moderately Technical Terms and Definitions

Alignment

The process of ensuring that the objectives and outcomes of an AI system align with human values and intentions.

Artificial Neural Networks

Computational models inspired by the human brain, designed for pattern recognition and decision-making.

Data Annotation

The process of labeling data, such as adding descriptions to images or categories to text, to make it usable for machine learning.

Deep Learning

A subset of machine learning involving neural networks with multiple layers, enabling complex pattern recognition.

Explainability

The extent to which an AI system's decisions or outputs can be understood and articulated by humans. This involves the system providing information about how it arrived at a result, enabling humans to interpret, explain, and trust its behavior and outcomes.

Generative Pre-Trained Transformer (GPT)

A machine learning model that's pre-trained to generate coherent and contextually relevant text based on a given prompt. This is the GPT part of ChatGPT. They're designed to understand and generate human-like text. Here's a simplified breakdown of its name and function:
Generative: This refers to the model's ability to create or generate new content. In the case of GPT, it can generate human-like text based on the input it receives. For instance, it can help in writing essays, answering questions, or even creating poetry.
Pre-trained: Before a model like GPT can be used, it needs to be trained on a large dataset of text. "Pre-trained" refers to the fact that this training occurs before the model is fine-tuned for specific tasks. The pre-training helps the model learn about grammar, facts about the world, reasoning abilities, and also some common sense.
Transformers: Transformer is the type of neural network architecture that GPT is based on. It's particularly effective for handling sequential data like text or time series data. The Transformer architecture allows GPT to pay selective attention to different parts of the input text, which helps in understanding the context and generating coherent responses.

Inference

Inference is the process where a trained AI model uses its learned knowledge to make predictions or generate outputs based on new input data. In the context of an LLM (Large Language Model), inference happens when the model takes a user’s input (prompt) and processes it to produce a meaningful response.

During inference, the model applies the parameters (like weights and biases) it learned during training to evaluate the input, recognize patterns, and generate an appropriate output. This process is what allows the model to "understand" and respond to prompts, answer questions, or generate creative content.

Inference is essentially the application phase of an AI model, turning training into practical use. It’s how the model uses what it has learned to interact with the world

Interpretability

The degree to which the outputs and operations of a machine learning model can be understood by humans.

Machine Learning

Machine Learning (ML) is a subset of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed to do so.

Natural Language Processing (NLP)

A domain of AI enabling machines to understand, interpret, and generate human language.

Parameters

The rules or settings that an LLM uses to "understand" and generate language. There are two main types of parameters in an LLM: weights and biases. Weights are numbers that show how strong the connections are between different parts of the LLM. Biases are numbers that help adjust the output of the LLM. Together, weights and biases control how the LLM processes and produces language. Generally, the more parameters an LLM has, the more powerful it becomes.

Parameters are the internal variables that the model learns through training. They help the model make predictions or decisions based on input data.

Training Data

The dataset used to train machine learning models, containing examples to learn from.

Transformer

A Transformer is a model architecture in machine learning that excels in handling sequential data, such as text. This is the breakthrough technology that has allowed chatbots like ChatGPT to become so powerful.

AI Literacy for Students

How Does Generative AI Work?

Training – The Creation of a GenAI Model

Inference – Using a GenAI Model

Going Deeper

Must Read Articles

Videos on Generative AI and Large Language Models (LLMs)

Generative AI in a Nutshell

LLMs for Curious Beginners (3Blue1Brown)

Transformers and Attention (3Blue1Brown)

What is a Neural Network? (3Blue1Brown)

The Busy Person’s Intro to LLMs

More Articles and Activities to Explore

AI Literacy Glossary: An Introduction

Commonly Used AI Terms

AI Agent

AI Literacy

Alignment

Artificial General Intelligence (AGI)

Artificial Intelligence

Bias

Chatbots

Data Privacy

Ethics in AI

Fine-Tuning

Foundational Models

Frontier Models

Generative AI (GenAI)

Hallucinations

Large Language Models (LLMs)

Multi-Modal AI

Prompt

Moderately Technical Terms and Definitions

Alignment

Artificial Neural Networks

Data Annotation

Deep Learning

Explainability

Generative Pre-Trained Transformer (GPT)

Inference

Interpretability

Machine Learning

Natural Language Processing (NLP)

Parameters

Training Data

Transformer