What is GitHub Copilot?
- [Instructor] Let's talk about what GitHub Copilot is and what makes it special. At its core, GitHub Copilot is a code assistant to help you write code faster. It's based on GPT, the model from OpenAI that powers ChatGPT, one of the most popular web applications in history. This is a special customized version of GPT which is a large language model. It has been trained on gigabytes of code. Because it's trained on public code, the more popular the language, the more code available and the better recommendations. So it's best for popular languages like Python, JavaScript and Ruby. Models are formulas used to predict events. One example you might be familiar with are hurricane models that attempt to chart the paths of some storms every year. LLMs try to predict what should come next in a sequence of words, technically tokens or numbers. It's like the auto complete that happens when you use a search engine. It will give you a suggestion that may or may not be what you're looking for. A programming language is a pretty simple language. There's grammar and rules and unlike a lot of human languages, it doesn't have things like slang and words with changing meanings. Let's dig a bit more specifically as to how LLMs work. LLMs in their training phase absorb all the available data and convert words, characters and other symbols to numbers known as tokens because they're easier for computers to work with. The tokens become the vocabulary of the language, so more complex languages have more complex vocabularies. When you make requests from the LLM, the size of the language, as well as the amount of information you're providing, can affect the cost of using the model, both in terms of compute power, as well as processing usage. After the tokenization, the models go through training phases where, given different inputs, they try to determine the probability of what the next piece of code or token should be. They do this continuously until they finish the message that they write back to you. This mimics the way people communicate, in that we often put together sentences by typing what the next word should be. You probably notice more when you can't find the right word and perhaps someone else suggests it. Most of the training is a statistical analysis of the tokens that determines a most likely next token, but that's followed by human led reinforcement. The models predictions get better over time as it is trained on what humans prefer. By looking at gigabytes of code, it's able to determine the answer to a problem that is most likely to be the correct answer for the question you've asked.