This is the sixth installment in a series about LLMs. You can find the fifth article here:?Making Sense of LLMs - A goal without alignment is just a wish.
Today, OpenAI released?GPT4. You will hear and read much about it in the upcoming days and weeks as we learn more about its capabilities and limitations.?
Following are a couple of noteworthy tidbits of information to get you updated with additional comments from my side:
- Bing is powered by GPT-4?already. No surprise there, but kudos to the execution speed of Microsoft. Satya Nadella?recently stated?that Google is the 800-pound gorilla in the search market but that he intends to make them "dance," and what an invitation that has been.
- Citing competition and safety concerns, OpenAI will not provide details on architecture, model size, hardware, or training data for their latest AI model. This sounds like the death rattle of open research, which has been the status quo for the last decade. The field will switch from a hugely collaborative to a fully proprietary model in the blink of an eye as each company tries to create a strategic moat around its business.?
- The model is multi-modal, i.e., it works with images and is highly apt at understanding infographics and charts. Users can refer to content within pictures and even ask for high-level explanations like, "Why is this comic funny?" Adding more modalities like video and sound needs to be considered a mere engineering challenge at this point. More interestingly, it appears to be the case that multi-modality also improves the baseline performance.
- The previous model, ChatGPT, was in the bottom 10% of a standardized bar exam. GPT-4 ranks in the top 10%. Certainly a cherry-picked stat, but still very impressive and one of the things the media undoubtedly will focus on.
- New emergent capabilities, like hindsight neglect, appeared for the first time, which has been an elusive quality so far. This shows that some abilities do not improve gradually but instead jump into existence at a certain threshold, which is fascinating and suggests that LLMs might still have some surprises up their digital sleeves.
- Hallucination?is still a problem, i.e., models creating non-factual but otherwise compelling content. However, major progress seems achievable with a better training regimen, including techniques like?supervised fine-tuning and RLHF. Unfortunately, this also means that one needs an army of manual labor, which could be prohibitive for smaller companies.
- The cut-off date for training data is 2021, which means that GPT-4 still needs to find out who won the last football world cup. OpenAI completed the training of the base model last summer. This is especially interesting because it tells us that cleaning and refining data, respectively answers, is more important than adding more data, i.e., human post-processing outweighs other factors like compute.
- Multiple companies like Duolingo have released?GPT-empowered product features?on the same day as the tech release. One might argue that they already worked with a chat-like interface, but still, this is no easy task to pull off, having everything ready from PR material to app updates, including pricing. I wonder if 'the ability to integrate LLM tech' will be the most significant competitive differentiator in the next couple of years.
- They?red-teamed?GPT-4 with a third party, meaning they hired researchers to serve as bad-faith actors to find exploits and security issues. The results are worrying to dystopian, depending on your level of tech optimism. We can expect a tidal wave of sophisticated and personalized spam, but more concerning is the fact that the model excels at autocratic?disinformation?and power-seeking behavior.
- I am especially inspired by the fact that LLMs are getting better at explanations out of the box. At the end of the?Developer Livestream, starting at 19:05, Greg Brockman gives an example of GPT-4 breaking down a tax code issue for him. He goes on to say:?"Only by asking the model to spell out its reasoning and me following along, I was like 'oh I get it now, I know why this works'… it doesn't care if it is code if it is language, all this can be applied toward the problems you care about."
- There is a waitlist for the API access, but already the?cost has dropped by an order of magnitude. It must be nice to be bankrolled like this, crushing competition before the race even starts.
For more details on the performance, check out the accompanying?research material?and?scientific paper.
In related news, Google does not catch a break. After botching their AI demo last month, which cost them a whopping?$100B in market cap. They announced a?massive roll-out of AI features?for their workspaces the same day GPT-4 came out. On the other hand, Meta used today's air cover to announce?further lay-offs?months in advance.
The unrelentless acceleration of change reminds me of a passage from Lewis Carroll’s book “Through the Looking Glass” in which Alice, the protagonist, runs together with the Red Queen without seeming to get anywhere:
‘Well, in our country,’ said Alice, still panting a little, ‘you’d generally get to somewhere else — if you ran very fast for a long time, as we’ve been doing.’
‘A slow sort of country!’ said the Queen. ‘Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!’
Looking at what has happened in the last months, I'd argue we still don't know where we are running but we certainly don't stop increasing the speed.