AI-generated Transcript
In this video, we take a look at how Chat GPT works. We learned a lot from making this video. We hope you will learn something too.
Let’s dive right in. Chat GPT was released on November 30, 2022. It reached 100 million monthly active users in just two months.
It took instagram two and a half years to reach the same milestone. This is the fastest growing app in history. Now.
How does chat GPT work? The heart of Chat GPT is an LLM, or a large language model. The default LLM for chat GPT is GPT-3.5. Chat GPT could also use the latest GPT Four model, but there is not much technical details on GPT Four yet for us to talk about.
Now, what is large language model? Now, a large language model is a type of neural network based model that is trained on massive amounts of text data to understand and generate human languages. Now, the model uses the training data to learn the statistical patterns and relationships between words in the language and then utilizes this knowledge to predict the subsequent words one word at a time. Now, an LLM is often characterized by its size and the number of parameters it contains.
The largest model of GPT-3.5 has 175,000,000,000 parameters spread across 96 layers in a neural network, making it one of the largest deep learning models ever created. The input and output to the model are organized by token. Now, tokens are numerical representations of words, or more accurately, parts of the words.
Numbers are used for tokens rather than words because they can be processed more efficiently. GPT-3.5 was trained on a large chunk of Internet data. The source data contains 500 billion tokens.
Looking at it another way, the model was trained on hundreds of billions of words. Now, the model was trained to predict a next token. Given a sequence of input tokens, it is able to generate text that is structured in a way that is grammatically correct and semantically similar to the Internet data it was trained on.
But without proper guidance, the model can also generate outputs that are untruthful toxic or reflect harmful sentiments. Even with that severe downsize, the model itself is already useful, but only in a very structured way. It can be taught to perform natural language tasks using carefully engineered text instructions or prompts.
This is where the new field prompt engineering came from. Now, to make the model safer and to be capable of question and answer in the style of a chatbot, the model is further fine tuned to become a version that was used in Chat GPT. Now, fine tuning is a process that turns the model that does not quite align with human values into a fine tuned model that Chat GPT could use.
This process is called reinforcement training from Human Feedback or LLHF. OpenAI explains how they ran LLHF on the model, but it is not easy to understand for nonml people. Let’s try to understand it with an analogy.
Imagine GPT-3.5 as a highly skilled chef who can prepare a wide variety of dishes. Now, fine tuning GPT-3.5 with LLHF is like refining this chef’s skills to make dishes more delicious. Initially, the chef is trained with a large set of recipes and cooking techniques.
However, sometimes the chef doesn’t know which dish to make for a specific customer request. To help with this, we collect feedback from real people to create a new data set. The first step is to create a comparison data set.
We ask the chef to prepare multiple dishes for a given request and then have people rank the dishes based on taste and presentation. This helps the chef understand which dishes are preferred by the customers. The next step is reward modeling.
The chef uses this feedback to create a reward model, which is like a guide for understanding customer preferences. The higher the reward, the better the dish. Next we train the model with PPO or proximal policy optimization.
In this analogy, the chef practices making dishes while following the reward model. They use a technique called Proximal Policy Optimization to improve their skills. This is like the chef comparing their current dish with a slightly different version and learning which one is better.
According to the reward model, this process is repeated several times, with the chef refining their skills based on updated customer feedback. With each iteration, the chef becomes better at preparing dishes that satisfy customer preferences. To look at it another way, GPT-3.5 is fine tuned with LLF by gathering feedback from people, creating a reward model based on their preferences, and then iteratively improving the model’s performance using PPO.
This allows GPT-3.5 to generate better responses tailored to specific user requests. Now we understand how the model is trained and fine tuned. Let’s take a look at how the model is used in Chat GPT to answer a prompt conceptually, it is as simple as feeding the prompt into the Chat GPT model and returning the output.
In reality, it is a bit more complicated. First, Chatgpt knows the context of the Chat conversation. This is done by Chat GPT UI, feeding the model the entire past conversation every time a new prom is entered.
This is called conversational prompt injection. This is what Chatgpt appears to be context aware. Second chat GPT includes primary prompt engineering.
These are pieces of instructions injected before and after the user’s prompt to guide the model for conversational tone. These prompts are invisible to the user. Third, the prompt is passed to the moderation API to warn or block certain types of unsafe content.
The generated result is also likely to be passed to the moderation API before returning to the user. And that wraps up our journey into the fascinating world of Chat GPT. There was a lot of engineering that went into creating the models used by Chat GPT.
The technology behind it is constantly evolving, opening doors to new possibilities and reshaping the way we communicate. AK. Now tighten the seatbelt and enjoy the ride.