Transcript Auto-Generated by AssemblyAI
Form. All right, well, this is CS 50 tech talk. Thank you all so much for coming.
So, about a week ago, we circulated the Google form, as you might have seen, at 10:52 A.m. And by like 11:52 A.m., we had 100 RSVPs, which I think is sort of testament to just how much interest there is in this world of AI and OpenAI and GPT, chat GPT and the like.
And in fact, if you’re sort of generally familiar with what everyone’s talked talking about, but you haven’t tried it yourself, this is the URL at which you can try out this tool that you’ve probably heard about, Chat GPT. You can sign up for a free account there and start tinkering with what everyone else has been tinkering with. And then if you’re more of the app minded type, which you probably are if you are here with us today, OpenAI in particular has its own low level APIs via which you can integrate AI into your own software.
But of course, as is the case in computer science, there’s all the more abstractions and services that have been built on top of these technologies. And we’re so happy today to be joined by our friends from McGill University and Steamship, Syll and Ted, from whom you’ll hear in just a moment to speak to us about how they are making it easier to build, to deploy, to share applications using some of these very same technologies. So our thanks to them for hosting today.
Our friends at Plimpton, Jenny, Lee and Alumna, who’s here with us today. But without further ado, allow me to turn things over to Ted and SYL, and pizza will be served shortly after 01:00 P.m. Outside.
All right, over to you, Ted, a lot. Hey, everybody. It’s great to be here.
I think we’ve got a really good talk for you today. SYL is going to provide some research grounding into how it all works, what’s going inside the brain of GPT, as well as other language models. And then I’ll show you some examples that we’re seeing on the ground of how people are building apps and what apps tend to work in the real world.
So our perspective is we’re building AWS for AI apps, so we get to talk to a lot of the makers who are building and deploying their apps, and through that, see both the experimental end of the spectrum and also see what kinds of apps are getting pushed out there and turned into companies, turned into side projects. We did a cool hackathon yesterday. Many thanks to Neiman, to David Malin and CS 50 for helping us put all of this together, to Harvard for hosting it.
And there were two sessions, lots of folks built things. If you go to steamship.com hackathon, you’ll find a lot of guides, a lot of projects that people built, and you can follow along.
We have a text guide as well, just as a quick plug for that if you want to do it remotely or on your own. So, to tee up SIL, we’re going to talk about basically two things today that I hope you’ll walk away with and really know how to then use as you develop and as you tinker. One is what is GPT and how is it working? Get a good sense of what’s going on inside of it other than as just this magical machine that predicts things.
And then two is how are people building with it? And then importantly, how can I build with it too? If you’re a developer and if you have CS 50 background, you should be able to pick things up and start building some great apps. I’ve already met some of the CS 50 grads yesterday and the things that they were doing were pretty amazing. So hope this is useful.
I’m going to kick it over to SYL and talk about some of the theoretical background of GPT. Yeah. So thank you, Ted.
My name is Phil. I’m a graduate student in the digital humanities at McGill. I study literature and computer science and linguistics in the same breath, and I published some research over the last couple of years exploring what is possible with language models and culture in particular.
And my half or whatever of the presentation is to describe to you what is GPT? That’s really difficult to explain in 15 minutes and there are even a lot of things that we don’t know. But a good way to approach that is to first consider all the things that people call GPT by or descriptors. So you can call them large language models, you can call them universal approximators.
From computer science, you can say that it is a generative AI. We know that they are neural networks, we know that it is an artificial intelligence. To some it’s a simulator of culture, to others it just predicts text.
It’s also a writing assistant. If you’ve ever used chat GPT, you can plug in a pit of your essay, get some feedback. It’s amazing for that.
It’s a content generator. People use it to do copywriting, jasper, AI, pseudo write, et cetera. It’s an agent.
So the really hot thing right now, if you might have seen it on Twitter, auto GPT, baby AGI. People are giving these things tools and letting them run a little bit free in the wild to interact with the world, computers, et cetera. We use them as chat bots, obviously, and the actual architecture is the transformer.
So there’s lots of ways to describe GPT and any other one of them is a really perfectly adequate way to begin the conversation. But for our purposes, we can think of it as a large language model and more specifically, a language model. And a language model is a model of language, if you allow me the tautology.
But really what it does is it produces a probability distribution over some vocabulary. So let us imagine that we had the task of predicting the next word of the sequence I am. So if I give a neural network the words I am, what of all words in English, is the next most likely word to follow? That at its very core is what GPT is trained to answer.
And how it does it is it has a vocabulary of 50,000 words, and it knows roughly, given the entire internet, which words are likely to follow other words of those 50,000 in some sequence, up to 2000 words, up to 4000, up to 8000, and now up to 32,000, GPT four. So you give it a sequence here I am and over the vocabulary of 50,000 words, it gives you the likelihood of every single word that follows. So here it’s I am.
Perhaps the word happy is fairly frequent, so we’ll get that high probability if we look at all words, all utterances of English, it might be I am sad, maybe that’s a little bit less probable I am school. That really should be at the end, because I don’t think anybody would ever say that I am bjork that’s a little bit, it’s not very probable, but it’s less probable than happy sad. But there’s still some probability attached to it.
And when we say it’s probable, that’s literally a percentage that’s like happy follows I am maybe like 5% of the time, sad follows I am maybe 2% of the time, or whatever. So for every word that we give GPT, it tries to predict what the next word is across 50,000 words, and it gives every single one of those 50,000 words number that reflects how probable it is. And the really magical thing that happens is you can generate new text.
So if you give GPT I am, and it predicts happy as being the most probable word over 50,000, you can then append it to I am. So now you say I am happy, and you feed it into the model again, you sample another word, you feed it into the model again and again and again and again, and there’s lots of different ways that I am happy, I am sad can go. And you add a little bit of randomness and all of a sudden you have a language model that can write essays that can talk and a whole lot of things which is really unexpected and something that we didn’t predict even five years ago.
So this is all relevant. And if we move on as we scale up the model and we give it more compute. In 2012, AlexNet came out and we figured out we can run the model on GPUs, so we can speed up the process.
We can give the model lots of information downloaded from the Internet and it learns more and more and more. And the probabilities that it gives you get better as it sees more examples of English on the Internet. So we have to train the model to be really large, really wide, and we have to train it for a really long time.
And as we do that, the model gets more and more better and expressive and capable, and it also gets a little bit intelligent and for reasons we don’t understand. But the issue is that because it learns to replicate the internet, it knows how to speak in a lot of different genres of text and a lot of different registers. If you begin the conversation like Chat GPT, can you explain the moon landing to a six year old in a few sentences? GPT-3, this is an example drawn from the Instruct GPT paper from OpenAI.
GPT-3 would have just been like, okay, so you’re giving me an example like explain the moon landing to a six year old. I’m going to give you a whole bunch of similar things because those seem very likely to come in a sequence. It doesn’t necessarily understand that it’s being asked a question, has to respond with an answer.
GPT-3 did not have that apparatus, that interface for responding to questions, and the scientists at OpenAI came up with the solution. And that’s let’s give it a whole bunch of examples of question and answers such that we first train it on the internet and then we train it with a whole bunch of questions and answers such that it has the knowledge of the internet, but really knows that it has to be answering questions. And that is when Chat GPT was born and that’s when it gained 100 million users in one month.
I think it beat TikTok’s record at 20 million in one month. It was a huge thing and for a lot of people they went, oh, this thing is intelligent, I can ask it questions, it answers back, we can work together to come to a solution. And that’s because it’s still predicting words.
It’s still a language model, but it knows to predict words in the framework of a question and answer. So that’s what a prompt is, that’s what instruction tuning is, that’s a key word, that’s what Rlhf is. If you’ve ever seen that acronym, reinforcement, alignment with human feedback and all those combined means that the models that are coming out today, the types of language predictors that are coming out today, work to operate in a Q and A form.
GPT Four exclusively only has the aligned model available. And this is a really great solid foundation to build on because you can do all sorts of things. You can ask Chat GPT, can you do this for me? Can you do that for me? You might have seen that OpenAI has allowed plugin access to Chat GPT.
So it can access Wolfram, it can search the web, it can do Instacart for you, it can look up recipes. Once the model knows that not only it has to predict language, but that it has to solve a problem. And the problem here being.
Give me a good answer to my question. It’s suddenly able to interface with the world in a really solid way. And from there on, there’s been all sorts of tools that build on this Q and a form that Chat GPT uses.
You have autogpt, you have Lang chain, you have React. There was a react paper where a lot of these come from. And turning the model into an agent with which to achieve any ambiguous goal is where the future is going.
And this is all thanks to instruction tuning. And with that, I think I will hand it off to Ted, who will be giving a demo or something along those lines for how to use GPT as an agent. All right, so I’m a super applied guy.
I kind of look at things and think, okay, how can I add this Lego? Add that Lego and clip them together and build something with it. And right now, if you look back in computer science history, when you look at the kinds of things that were being done in 1970, right after computing was invented, the microprocessors were invented, people were doing research like, how do I sort a list of numbers? And that was meaningful work. And importantly, it was work that’s accessible to everybody because nobody knows what we can build with this new kind of oil, this new kind of electricity, this new kind of unit of computation we’ve created.
And anything was game, and anybody could participate in that game to figure it out. And I think one of the really exciting things about GPT right now is, yes, in and of itself, it’s amazing. But then what could we do with it if we call it over and over again, if we build it into our algorithms and start to build it into broader software.
So the world really is yours to figure out those fundamental questions about what could you do if you could script computation itself over and over again in the way that computers can do not just talk with it, but build things atop it. So we’re a hosting company. We host apps.
And these are just some of the things that we see. I’m going to show you demos of this with code and try to explain some of the thought process. But I wanted to give you a high level of overview of you’ve probably seen these on Twitter, but kind of when it all sorts out to the top.
These are some of the things that we’re seeing built and deployed with language models today. Companionship. That’s everything from I need a friend.
Do I need a friend with a purpose? I want a coach. I want somebody to tell me, go to the gym and do these exercises. I want somebody to help me study a foreign language question answering.
This is a big one. This is everything from your newsroom having a slack bot that helps assist you? Does this article conform to the style guidelines of our newsroom all the way through to need help on my homework? Or hey, I have some questions that I want you to ask Wikipedia, combine it with something else, synthesize the answer and give it to me. Utility functions.
I would describe this as there’s a large set of things for which human beings can do them. Computers could do them, if only they had access to language computation, language knowledge. An example of this would be read every tweet on Twitter, tell me the ones I should read.
That way I only get to read the ones that actually make sense to me, and I don’t have to skim through the rest. Creativity, image generation, text generation, storytelling, proposing other ways to do things. And then these wild experiments in kind of baby AGI, as people are calling them, in which the AI itself decides what to do and is self directed.
So I’ll show you examples of many of these and what the code looks like. And if I were you, I would think about these as categories within which to both think about what you might build and then also seek out starter projects for how you might go about building them online. All right, so I’m just going to dive straight into demos and code for some of these, because I know that’s what’s interesting to see as fellow builders with a high level diagram for some of these as to how it works.
So approximately, you can think of a companionship bot as a friend that has a purpose to you. And there are many ways to build all of these things. But one of the ways you can build this is simply to wrap Gptu or a language model in an endpoint that additionally injects into the prompt some particular perspective or some particular goal that you want to use.
It really is that easy in a way, but it’s also very hard because you need to iterate and engineer the prompt so that it consistently performs the way you want it to perform. So a good example of this is something somebody built in the hackathon yesterday. And I just wanted to show you the project that they built.
It was a Mandarin idiom coach, and I’ll show you what the code looked like first. I’ll show you the demo first. I think I already pulled it up.
Here we go. So the buddy that this person wanted to create was a friend that if you gave it a particular problem you were having, it would pick a Chinese Idiom, a four character Cheng Yu that described poetically like, here’s a particular way you could say this, and it would tell it to her. So that the person who built this was studying Chinese and she wanted to learn more about it.
So I might say something like, I’m feeling very sad, and it would think a little bit, and if everything’s up and running, it will generate one of these four character phrases and it will respond to it with an example. Now, I don’t know if this is correct or not, so if somebody can call me out, if this is actually incorrect, please call me out. And it will then finish up with something encouraging saying, hey, you can do it.
I know this is hard, keep going. So let me show you how they built this. And I pulled up the code right here.
So this was the particular starter repllet that folks were using in the hackathon yesterday. And we pulled things up into basically, you have a wrapper around GPT and there’s many things you could do, but we’re going to make it easy for you to do two things. One of them is to inject some personality into the prompt, and I’ll explain what that prompt is in a second.
And then the second is add tools that might go out and do a particular thing, search the web or generate an image or add something to a database or fetch something from a database. So having done that, now you have something more than GPT. Now you have GPT, which we all know what it is and how we can interact with it.
But you’ve also added a particular lens through which it’s talking to you and potentially some tools. So this particular Chinese tutor, all it took to build that was four lines. So here’s a question that I think is frying the minds of everybody in the industry right now.
So is this something that we’ll all do casually and nobody really knows? Will we just all say in the future to the LLM, hey, for the next five minutes, please talk like a teacher. And maybe, but also, definitely in the meantime and maybe in the future, it makes sense to wrap up these personalized endpoints so that when I’m talking to GPT, I’m not just talking to GPT, I have a whole army of different buddies, of different companions that I can talk to. They’re kind of human and kind of talk to me interactively, but because I’ve preloaded them with, hey, by the way, you particular.
I want you to be a kind, helpful Chinese teacher that responds to every situation by explaining the changu that fits it, speak in English and explain the Chungyu in its meaning, then provide a note of encouragement about learning language. And so just adding something like that, even if you’re a non programmer, you can just type deploy and it’ll pop it up to the web. It’ll take it over to a telegram bot that you can even interact with, hey, I’m feeling too busy.
And interact with it over telegram, over the web. And this is the kind of thing that’s now within reach for everybody from a CS 101 grad. So I’m using the general purpose framing all the way through to professionals in the industry that you can do just with a little bit of manipulation on top of sort of this raw unit of conversation and intelligence.
So companionship is one of the first common types of apps that we’re seeing. So a second kind of app that we’re seeing. And this blew up for those of you who are on kind of Twitter followers, this blew up, I think the last few months is question answering.
And I want to unpack a couple of different ways this can work because I know many of you have probably already tried to build some of these kinds of apps. There’s a couple of different ways that it works. The general framework is a user queries GPT and maybe it has general purpose knowledge, maybe it doesn’t have general purpose knowledge.
But what you want it to say back to you is something specific about an article you wrote or something specific about your course syllabus, or something specific about a particular set of documents from the United Nations on a particular topic. So what you’re really seeking is what we all hoped the customer service bot would be. We’ve all interacted with these customer service bots and we’re kind of smashing our heads on the keyboard as we do it.
But pretty soon we’re going to start to see very high fidelity bots that interact with us comfortably. And this is approximately how to do it as an engineer. So here’s your game plan as an engineer.
Step one, take the documents that you want it to respond to. Step two, cut them up. Now, if you’re an engineer, this is going to madden you.
You don’t cut them up in a way that you would hope. For example, you could cut them up into clean sentences or clean paragraphs or semantically coherent sections and that would be really nice. Honestly.
The way that most folks do it, and this is a simplification that tends to be just fine, is you window. You have a sliding window that goes over the document and you just pull out fragments of text. Having pulled out those fragments of text, you turn them into something called an Embedding vector.
So an Embedding vector is a list of numbers that approximate some point of meaning. So you’ve already all dealt with embedding vectors yourself in regular life. And the reason you have, and I know you have, is because everybody’s ordered food from Yelp before.
So when you order food from Yelp, you look at what genre of restaurant is it? Is it in a pizza restaurant? Is it an Italian restaurant? Is it a Korean barbecue place? You look at how many stars does it have? 12345? You look at where is it? So all of these you can think of as points in space, dimensions in space, korean barbecue restaurant, four stars near my house. That’s a three, three number vector. That’s all this is.
So this is a thousand number vector or a 10,000 number vector. Different models produce different size vectors. All it is, is chunking pieces of text, turning it into a vector that approximates meaning.
And then you put it in something called a vector database. And a vector database is just a database that stores numbers. But having that database now, when I ask a question, I can search the database and I can say, hey, the question was, what does CS 50 teach? What pieces of text in the database have vectors similar to the question, what does CS 50 teach? And there’s all sorts of tricks and empires being made on refinements of this general approach.
But at the end, you the developer model it simply as thus. And then when you have your query, you embed it. You find the document fragments, and then you put them into a prompt.
And now we’re just back to the personality, the companionship bot. Now it’s just a prompt. And the prompt is, you’re an expert in answering questions.
Please answer user provided questions using source documents, results from the database. That’s it. So after all of these decades of engineering of these customer service bots, it turns out with a couple of lines of code, you can build this.
So let me show you. I made one just before the class with the CS 50 syllabus. So we can pull that up, and I can say I added the PDF right here.
So I apologize. I don’t know if it’s an accurate or recent syllabus. I just searched the web for CS 50 syllabus PDF.
I put the URL in here. It loaded it into here. This is just like 100 line piece of code deployed that will now let me talk to it, and I can say, what will CS 50 teach me? So under the hood now, what’s happening is exactly what that slide just showed you.
It takes that question, what will CS 50 teach me? It turns it into a vector. That vector approximates without exactly representing the meaning of that question. It looks into a vector database that steamship hosts of fragments from that PDF.
And then it pulls out a document and then passes it to a prompt that says, hey, you’re an expert at answering questions. Someone has asked you, what does CS 50 teach? Please answer. It using only the source documents and source materials I’ve provided.
Now, those source materials materials are dynamically loaded into the prompt. It’s just basic prompt engineering, and I want to keep harping back onto that. What’s amazing about right now as builders is that so many things just boil down to very creative tactical rearrangement of prompts and then using those over and over again in an algorithm and putting that into software.
So the result and again, it could be lying. It could be making things up. It could be hallucinating is CS 50 will teach students how to think, algorithmically, and solve problems efficiently, focusing on topics such as abstraction.
And then it returns the source document from which it was found so this is another big category of which there are tons of potential applications because you can repeat for each context, you can create arbitrarily many of these once it’s software, because once it’s software, you can just repeat it over and over again. So for your dorm, for your club, for your Slack, for your Telegram, you can start to begin putting pieces of information in and then responding to it. And it doesn’t have to be documents.
You can also load it straight into the prompt. I think I have it pulled up here and if I don’t, I’ll just skip it. Oh, here we go.
One other way you can do question answering, because I think it’s healthy to always encourage the simplest possible approach to something. You don’t need to engineer this giant system. It’s great to have a database, it’s great to use embeddings, it’s great to use this big approach.
It’s fancy, it scales. You can do a lot of things, but you can also get away with a lot by just pushing it all into a prompt. And as an engineer is one of our team who’s here always says like engineers should aspire to be lazy.
And I couldn’t agree more. You as an engineer should want to set yourself up so that you can pursue the lazy path to something. So here’s how you might do the equivalent of a question answering system with a prompt alone.
Let’s say you have 30 friends and each friend is good at a particular thing. Or you can, you know, this is isomorphic to many other problems. You can simply just say, hey, I know certain things.
Here’s the things I know. A user is going to ask me something, how should we respond? And then you load that into an agent. That agent has access to GPT, you can ship deploy it.
And now you’ve got a bot that you can connect to Telegram, you can connect to Slack. And that bot, now, it won’t always give you the right answer because at a certain level we can’t control the variance of the model underneath, but it will tend to answer with respect to this list. And the degree to which it tends to is to a certain extent something that both industry is working on to just give everybody as a capacity.
But also you doing prompt engineering to tighten up the error bars on it. So I’ll show you just a few more examples and then in about eight minutes I’ll turn it over to questions because I’m sure you’ve got a lot about how to build things. So just to give you a sense of where we are, this is one.
I don’t have a demo for you, but if you were to come to me and you were to say, ted, I want a weekend hustle, man, what should I build? Holy moly. There are a set of applications that I would describe as utility functions. I don’t like.
That name because it doesn’t sound exciting. And this is really exciting. And it’s low hanging fruits that automate tasks that require basic language understanding.
So examples for this are generate a unit test. I don’t know how many of you have ever been writing tests and you’re just like, oh, come on, I can get through this, I can get through this. If you’re a person who likes writing tests, you’re a lucky individual.
Looking up the documentation for a function, rewriting a function, making something conform to your company guidelines, doing a brand check. All of these things are things that are kind of relatively context free operations or scoped context operations on a piece of information that requires linguistic understanding. And really you can think of them as something that is now available to you as a software builder, as a weekend project builder, as a startup builder.
And you just have to build the interface around it and present it to other people in a context in which it’s meaningful for them to consume. And so the space of this is extraordinary. I mean, it’s the space of all human endeavor.
Now with this new tool, I think, is the way to way to think about it. People often joke about how when you’re building a company, when you’re building a project, you don’t want to start with the hammer because you want to start with the problem instead. And it’s generally true, but my God, we’ve just got a really cool new hammer.
And to a certain extent, I would encourage you to, at least casually on the weekends, run around and hit stuff with it and see what can happen from a builders, from a tinkerers, from an experimentalist point of view. And then the final one is creativity. This is another huge makea app.
Now, I primarily live in the text world, and so I’m going to talk about text based things. I think so far this has mostly been growing in the imagery world because we’re such visual creatures and the images you can generate are just staggering with AI. Certainly brings up a lot of questions too around IP and artistic style.
But the template for this, if you’re a builder that we’re seeing in the wild, is approximately the following. And the thing I want to point out is domain knowledge here. This is really the purpose of this slide, is to touch on the importance of the domain knowledge.
So many people approximately find the creative process as follows come up with a big idea over generate possibilities. Edit down what you over generated repeat, right? Like anybody who’s been a writer knows, when you write, you write way too much and then you have to delete lots of it. And then you revise and you write way too much and you have to delete lots of it.
This particular task is fantastic for AI. One of the reasons it’s fantastic for AI is because it allows the AI to be wrong. You know, you’ve pre agreed you’re going to delete lots of it.
And so if you pre agree, hey, I’m just going to build generate five possibilities of the story I might tell, five possibilities of the advertising headline, five possibilities of what I might write my thesis on. You pre agreed. It’s okay if it’s a little wrong because you are going to be the editor that steps in.
And here’s the thing that you really should bring to the table, is don’t think about this as a technical activity. Think about this as your opportunity not to put GPT in charge. Instead for you to grasp the steering wheel tighter, I think, at least in Python or the language you’re using to program, because you have the domain knowledge to wield GPT in the generation of those.
So let me show you an example of what I mean by that. So this is a cool app that someone created for the Writing Atlas project. So Writing Atlas is a set of short stories, and you can think of it as good reads for short stories.
So you can go in here, you can browse different stories. And this was something somebody created where you can type in a story, a description that you like. And this is going to take about a minute to generate.
So I’m going to talk while it’s generating and while it’s working, what it’s doing, and I’ll show you the code in a second. Is it’s searching through the collection of stories for similar stories? And here’s where the domain knowledge part comes in. Then it uses GPT to look at what it was that you wanted and use knowledge of how an editor, how a bookseller thinks to generate a set of suggestions specifically through the lens of that perspective, with the goal of writing that beautiful handwritten note that we sometimes see in a local bookstore tacked on underneath a book.
And so it doesn’t just say, hey, you might like this. Here’s a general purpose reason why you might like this. But specifically, here’s why you might like this with respect to what you gave it.
It’s either stalling out or it’s taking a long time. Oh, there we go. So here’s its suggestions.
And in particular, these things. These are things that only a human could know, at least for now. Two humans, specifically, the human who said they wanted to read a story, that’s the text that came in.
And then the human who added domain knowledge to script a sequence of interactions with the language model so that you could provide very targeted reasoning over something that was informed by that domain knowledge. So for these utility apps, bring your domain knowledge. Let me actually show you how this looks in code, because I think it’s useful to see how simple and accessible this is.
This is really a set of prompts. So why might they like a particular location? Well, here’s the prompt. That did that.
This is an open source project, and it has a bunch of examples. And then it says, well, here’s the one that we’re interested in, here’s the audience. Here’s a couple of examples of why might people like a particular thing in terms of audience.
It’s just another prompt. Same for topic, same for explanation. And if you go down here and look at how it was done, suggesting the story is what is this? Line 174 to line 203? It really is.
And again, over and over again, I want to impress upon you, this really is within reach. It’s really just, what, 20 odd lines of step one, search in the database for similar stories. Step two, given that I have similar stories, pull out the data.
Step three, with my domain knowledge in Python. Now run these prompts. Step four, prepare that into an output.
So the thing we’re scripting itself is some approximation of human cognition, if you’re willing to go there. Metaphorically, we’re not sure. I’m not going to weigh in on where we are on the is OpenAI life form argument.
All right? One kind of really far out there thing. And then I’ll tie it up for questions because I know there’s probably a lot. And I also want to make sure you get great pizza in your bellies.
And that is Baby AGI Auto GPT is what you might have heard them called on Twitter. I think of them as multistep planning bots. So everything I showed you so far was approximately one shot interactions with GPT.
So this is the user says they want something, and then either Python mediates interactions with GPT or GPT itself does some things with the inflection of a personality that you’ve added from some prompt engineering. Really useful, pretty easy to control. If you want to go to production, if you want to build a weekend project, if you want to build a company, that’s a great way to do it right now.
This is wild. And if you haven’t seen this stuff on Twitter, I would definitely recommend going to search for it. This is what happens.
The simple way to put it is if you put GPT in a for loop, if you let GPT talk to itself and then tell itself what to do. So it’s an emergent behavior. And like all emergent behaviors, it starts with a few simple steps.
Conway’s Game of Life many elements of reality turn out to be math equations that fit on a T shirt. But then when you play them forward in time, they, they generate DNA or they, they generate human life. So this is approximately step one, take a human objective.
Step two, your first task is to write yourself a list of steps. And here’s the critical part. Repeat now, do the list of steps.
Now you have to embody your agent with the ability to do things. So it’s really only limited to do what you give it the tools to do and what it has the skills to do. So obviously, this is still very much a set of experiments that are running right now, but it’s something that we’ll see unfold over the coming years.
And this is the scenario in which Python stops becoming so important because we’ve given it the ability to actually self direct what it’s doing and then it finally gives you a result. And I want to give you an example still of just, again, impressing upon you how much of this is prompt engineering, which is wild, how little code this is. Let me show you what baby AGI looks like.
So here is a baby AGI that you can connect to Telegram. And this is an agent that has two tools. So I haven’t explained to you what an agent is.
I haven’t explained to you what tools are. I’ll give you a quick one sentence description. An agent is just a word to mean GPT plus some bigger body in which it’s living.
Maybe that body has a personality. Maybe it has tools. Maybe it has Python mediating its experience with other things.
Tools are simply ways in which the agent can choose to do things. Like imagine if GPT could say, order a pizza and instead of you seeing the text, order a pizza. That caused the pizza to be ordered.
That’s a tool. So these are two tools it has. One tool is generate a to do list.
One tool is do a search on the web, and then down here it has a prompt saying, hey, your goal is to build a task list and then do that task list. And then this is just placed into a harness that does it over and over again. So after the next task, kind of uncue the results of that task and keep it going.
And so in doing that, you get this kickstarted loop where essentially you kickstart it and then the agent is talking it to itself. Talking to itself. Unless I’m wrong, I don’t think this has yet reached production in terms of what we’re seeing in the field of how people are deploying software.
But if you want to dive into sort of the wildest part of experimentation, this is definitely one of the places you can start and it’s really within reach. All you have to do is download one of the starter projects for it and you can kind of see right in the prompting here’s, how you kick start that process of iteration. All right, so I know that was super high level.
I hope it was useful. I think from the field, from the bottoms up, what we’re seeing and what people are building, kind of this high level categories of apps that people are making. All of these apps are apps that are within reach to everybody, which is really, really exciting.
And I suggest Twitter is a great place to hang out and build things. There’s a lot of AI builders on Twitter publishing, and I think we’ve got a couple of minutes before pizza is arriving. Maybe ten minutes.
Keep on going. So if there’s any questions, why don’t we kick it to that? Because I’m sure there’s some questions that you all have. I guess I ended a little early.
Yes, I have a question around hallucination. I’m giving you like, a physics problem from Pseud. We want to do that 40% of the time, just wrong.
Do you have any actual recommendations that your developers should be doing to make it hallucinate less or maybe even things that open AI on the back end should be doing RLHS. So the question was approximately how do you manage the hallucination problem? Like, if you give it a physics lecture and you ask it a question, on the one hand it appears to be answering you correctly. On the other hand, it appears to be wrong to an expert’s eye.
40% of the time, 70% of the time, 10% of the time, it’s a huge problem. And then what are some ways, as developers, practically, you can use to mitigate that? I’ll give an answer still, you may have some specific things too. So one high level answer is the same thing that makes these things capable of synthesizing information is part of the reason why it hallucinates for you.
So it’s hard to have your cake and eat it too, to a certain extent. So this is part of the game. In fact, humans do it too.
Like, people talk about just folks who kind of are too aggressive in their assumptions about knowledge. I can’t remember the name for that phenomenon where you’ll just say stuff right, so we do it too. Some things you can do are kind of a range of activities, depending on how much money you’re willing to spend, how much technical expertise you have that can range from fine tuning a model to practically I’m in the applied world, so I’m very much in the world of duct tape and sort of how developers get stuff done.
So some of the answers I’ll give you are sort of very duct tapey answers. Giving it examples tends to work for acute things. If it’s behaving in wild ways, the more examples you give it, the better.
That’s not going to solve the domain of all of physics. So for the domain of all of physics, I’m going to bail and give it to you because I think you are far more equipped than me to speak on that. Sure.
So the model doesn’t have a ground truth. It doesn’t know anything. Any sense of meaning that it’s derived from the training process is purely out of differentiation.
One word is not another word. Words are not used in the same context. It understands everything only through examples given through language.
It’s like someone who learned English or how to speak, but they grew up in a featureless gray room. They’ve never seen the outside world. They have nothing to rest on that tells them that something is true and something is not true.
So from the model’s perspective, everything that it says, it’s true. It’s trying its best to give you the best answer possible. And if it lying a little bit or conflating two different topics is the best way to achieve that, then it will decide to do so.
It’s a part of the architecture, we can’t get around it. There are a number of cheap tricks that surprisingly get it to confabulate or hallucinate less. One of them includes recently there was a paper that’s a little funny.
If you get it to prepend to its answer, my best guess is that will actually improve or reduce hallucinations by about 80%. So clearly it has some sense that some things are true and other things are not. But we’re not quite sure what that is.
To add on to what Ted was saying, a few cheap things you can do include letting it Google or Bing, as in Bing Chai, what they’re doing. It cites this information, asking it to make sure its own response is good. If you’ve ever had Chat GPT generate a program, there’s some kind of problem, and you ask Chat GPT, I think there’s a mistake.
Often it’ll locate the mistake itself. Why it didn’t produce the right answer at the very beginning, we’re still not sure, but we’re moving in the direction of reducing hallucinations now. With respect to physics, you’re going to have to give it an external database to rest on, because internally, for really, really domain specific knowledge, it’s not going to be as deterministic as one would like.
These things work in continuous spaces. These things, they don’t know what is wrong, what is true, and as a result, we have to give it tools. So everything that Ted demoed today is really striving at reducing hallucinations, actually, really, and giving it more abilities.
I hope that answers your question. One of the ways too. I’m a simple guy.
I tend to think that all of the world tends to be just a few things repeated over and over again. And we have human systems for this in a team, like companies work or a team playing sport. And we’re not right all the time, even when we aspire to be.
And so we have systems that we’ve developed as humans to deal with things that may be wrong. So human number one proposes an answer. Human number two checks their work.
Human number three provides the final sign off. This is really common. Anybody who’s worked in a company has seen this in practice.
The interesting thing about the state of software right now, we tend to be in this mode in which we’re just talking to GPT as one entity. But once we start thinking in terms of teams, so to speak, where each team member is its own agent, with its own set of objectives and skills. I suspect we’re going to start seeing a programming model in which the way to solve this might not necessarily be make a single brain smarter, but instead be draw upon the collective intelligence of multiple software agents, each playing a role.
And I think that that would certainly follow the human pattern of how we deal with this. To give an analogy, space shuttles, things that go into space, spacecraft, they have to be good. If they’re not good, people die.
They have no margin for error at all. And as a result, we over engineer in those systems. Most spacecraft have three computers and they all have to agree in unison on a particular step to go forward.
If one does not agree, then they recalculate. They recalculate. They recalculate until they arrive at something.
The good thing is that hallucinations are generally not a systemic problem in terms of its knowledge. It’s often a one off the model. Something tripped it up and it just produced a hallucination in that one instance.
So if there’s three models working in unison, just instead of saying, that will, generally speaking, improve your success. A number of the examples you show have assertions like you are an engineer, you are an AI, you are a teacher. What’s the mechanism by which that influences this computational probabilities? Sure, I’m going to give you what might be an unsatisfying answer, which is it tends to work.
But I think we know why it tends to work. And again, it’s because these language models approximate how we talk to each other. So if I were to say to you, hey, help me out.
I need you to mock interview me, that’s a direct statement I can make that kicks you into a certain mode of interaction. Or if I say to you, help me out, I’m trying to apologize to my wife, she’s really mad at me. Can you role play with me? That kicks you into another mode of interaction.
And so it’s really just a shorthand that people have found to kick the agent in, to kick the LLM into a certain mode of interaction that tends to work in the way that I, as a software developer, am hoping it would work. And to really quickly add on to that being in the digital humanities that I am, I like to think of it as a narrative. A narrative will have a few different characters talking to each other.
Their roles are clearly defined. Two people are not the same. This interaction with GPT, it assumes a personality.
It can simulate personalities. It itself is not conscious in any way, but it can certainly predict what a conscious being would react like in a particular situation. So when we’re going URX, it is drawing up that personality and talking as though it is that person because it is like completing a transcript or completing a story in which that character is present and interacting and is active I think we got about five minutes until the pizza outside.
Yes, sir. Yes. So I’m not a DS person, but it’s been fun playing with this and I understand the sort of word by word generation and the sort of vibe, the feeling of it, the narrative.
Some of my friends and I have tried giving it logic problems, like things from the LSAT, for example, and it doesn’t work. And I’m just wondering why that would be. So it will generate answers that sound very plausible rhetorically like given this condition, given this UI, but it’ll often even contradict itself in its answers, but it’s almost never correct.
So I was wondering why that would be. It just can’t reason, it can’t think. And would we get to a place where you know what I mean, I don’t mean to think like it’s conscious, I mean, have thought.
You want to talk about react. So GPT Four when GPT Four released back in March, I think it was it was passing LSAT It was, yeah. Yes, it it just passed, as I understand it.
Maybe it’s because we’re not using that’s. One of the weird things, if you pay for chat GPT, they give you access to the better model. And one of the interesting things with it is prompting it’s so finicky.
It’s very sensitive to the way that you prompt. There were earlier on, when GPT-3 came out, some people were going, look, it can pass literacy tests or no, it can’t pass literacy tests. And then people who are pro or anti GPT would be like, I modified the prompt a little bit, suddenly it can’t, or suddenly it can’t.
These things are not conscious. Their ability to reason is like an aliens. They’re not us, they don’t think like people, they’re not human.
But they certainly are capable of passing some things empirically, which demonstrates some sort of rationale or logic within the model. But we’re still slowly figuring out, like a prompt whisperer what exactly the right approach is. Obviously having GDP running and prompting it continuously.
Have you seen instances where it directly creates some sort of business value in terms of yeah, I mean, we host companies on top of us too. That’s their primary product. The value that it adds is like any company.
What is the Y combinator motto? Makes something people want. I wouldn’t think of this as GPT inherently provides value for you as a builder like that’s their product, that’s OpenAI’s product, you pay chat GPT for prioritized access. Where your product might be is how you take that and combine it with your data, somebody else’s data, some domain knowledge, some interface that then helps apply it to something.
Two things are both true. There are a lot of experiments going on right now, both for fun and people trying to figure out where the economic value is. But folks are also spinning up companies that are 100% supported by applying this to data.
A company that wouldn’t be like AI focused as just using or developing in house that use GPT for productivity. I think that it is likely that today we call this GPT and today we call these LLMs and tomorrow it will just slide into the ether. I mean, imagine what the imagine what the progression is going to be today.
There’s one of these that people are primarily playing with. There’s many of them that exist, but one people are primarily bidding on top. Tomorrow we can expect that there will be many of them and the day after that we can expect they’re going to be on our phones and they’re not even going to be connected to the Internet.
And for that reason, I think that like today we don’t call our software microprocessor tools or microprocessor apps. The processor just exists. I think that one useful model, five years out, ten years out, is to even if it’s only metaphorically true and not literally true, I think it’s useful to think of this as a second processor.
We had this before with floatingpoint coprocessors and graphics co processors already as recently as the 90s, where it’s useful to think of the trajectory of this as just another thing that computers to do can do, and it will be incorporated into absolutely everything. Hence the term foundation model, which also crops up. Pizzas ready.
One more question. Maybe one more and then we’ll break for some food in the glasses right there. I was just being told we need to get two more data.
It’s hard to get it to do that reliably. It’s incredibly useful to get it to do reliably. So some tricks you can use are you can give it examples, you can just ask it directly.
Those are two common tricks. And look at the prompts that others have used to work. I mean, there’s a lot of art to finding the right prompt right now.
A lot of it is magic incantation. Another thing you can do is post process it so that you can do some checking and you can have a happy path in which it’s a one shot and you get your answer. And then a sad path in which maybe you fall back on other prompts.
So then you’re going for the diversity of approach where it’s fast by default, it’s slow, but ultimately converging upon higher likelihood of success if it fails. And then something that I’m sure we’ll see people do later on is fine tune instruction, tuning style models which are more likely to respond with a computer parsable output, I guess. One last question.
Sure. A couple of things. One is you talked about domain expertise and you’re encoding a bunch of domain expertise in terms of the prompts that you’re putting.
Where do those prompts end up? Prompts end up back in the Jeep chat GP model. And is there a privacy issue associated with that? That’s a great question. So the question was and I apologize.
I just realized we haven’t been repeating all the questions for the YouTube listeners. So I’m sorry for the folks on YouTube if you weren’t able to hear some of the questions. The question was, what are the privacy implications of some of these prompts? If one of the messages is so much depends upon your prompt and the fine tuning of this prompt, what does that mean with respect to my IP? Maybe the prompt is my business.
I can’t offer you the exact answer, but I can paint for you what approximately the landscape looks like. So in all of software, and so too with AI, what we see is there are the SaaS companies where you’re using somebody else’s API and you’re trusting that their terms and service will be upheld. There’s the set of companies in which they provide a model for hosting on one of the big cloud providers.
And this is a version of the same thing, but I think with slightly different mechanics. This tends to be thought of as the enterprise version of software. And by and large the industry has moved over the past 20 years from running my own servers to trusting that Microsoft or Amazon or Google can run servers for me.
And they say it’s my private server even though I know they’re running it and I’m okay with that. And you’ve already started to see that. Amazon with hugging face, Microsoft with OpenAI, google Two with their own version of Bard are going to do these where you’ll have the SAS version and then you’ll also have the private VPC version.
And then there’s a third version that I think we haven’t yet seen practically emerge. But this would be the maximalist I want to make sure my IP is maximally safe version of events in which you are running your own machines, you are running your own models. And then the question is, is the open source and or privately available version of the model as good as the publicly hosted one? And does that matter to me? And the answer is, right now, realistically, it probably matters a lot.
In the fullness of time, you can think of any one particular task you need to achieve as requiring some fixed point of intelligence to achieve. And so over time, what we’ll see is the privately obtainable versions of these models will cross that threshold. And with respect to that one task, yeah, sure, use the open source version, run it on your own machine, but we’ll also see the SaaS intelligence get smarter.
It’ll probably stay ahead. And then your question is, well, which one do I care more about? Do I want like, the better aggregate intelligence or is my task somewhat fixed point and I can just use the open source available one for which I know it’ll perform well enough because it’s crossed the threshold? So to answer your question specifically, yes, you might be glad to know if. Chachi.
PT recently updated their privacy policy to not use prompts for the training process. But up until now, everything went back into the bin to be trained on again. And that’s just a fact.
So I think pizza is now pizza time. Let’s go get to pizza. I hope this was useful.
We’ll be around, ask some questions. Go ahead.