ChatGPT goes (not yet) to Hollywood
Since its official launch by end of 2022, chatGPT has demonstrated how AI systems have drastically improved. ?There is much excitement about the technology, - which we definitely share- but it remains important to get clear of what it does/does not. Here are a list of important considerations to have in mind, for an executive to think about implementing such a tool.
1.?????How strong is the performance curve, really?
While the combination of big data with AI led to major advance in deep machine learning, it took only a matter of one decade for AI to perform at about human capabilities for image, writing and speech recognition. What chatGPT further demonstrates is that the next step, -reading and language understanding- could match human capabilities only in a matter of a few years . ?In fact, beyond the anecdotes, recent academic studies such as by led by Choi and colleagues by late jan. 2023, ?blindly rated ChatGPT answers on real exam questions from the University of Minnesota Law School. achieving a low C, but passing grade in all four courses. And this was GPT 3.5, not the new version GPT4.
That level of conversational quality for LLMs (for ‘Large Language Model’) such as chatGPT does not come for free. ChatGPT had be trained on billions of data points, ?implying very large training costs. But here as well, things are quickly changing, with training cost of GPT 3 equivalent went down by more than 80% in 2,5 years.
Furthermore, shortcuts are being tested very successfully to democratize the cost of training a more limited LLM model; as example, a colleague of mine noted to me that Stanford researchers have built a much narrow parameters conversational model, which has then been reinforced by a series of prompts asked in parallel to OPenAI’s GPT, with surprising good results and for a cost of less than one thousand dollars. While this is to be further checked, this implies a cost of 1000 times lower than a typical enterprise model which will use chatGPT directly.
2.??????Are all use case/domains possible with chatGPT?
One of the first applications of chat GPT has been its rival use to search queries. The battle is on between Microsoft and Google.?This is not that Google is not ready with LLMs. The danger for Google is disruption—Google dominance in search obliges him to have a new perfect LLM to blend with search queries, but to date, ?chat queries are costing much more than search and can eat Google comfortable margins. Microsoft, on the other hand, can have an inferior, ( but already fascinating) product like ChatGPT to integrate into its search, Bing,?as ChatGPT it is hoped, is a clear way to rebalance the flow of queries to its advantage, it hopes.
Besides this evident case affecting tech superstars, other cases may abound for ChatGPT and other types of LLMs to be used in entreprises. ?One case is evidently ?education and information intelligence, aggregated from digital sources such as the web, and which are typically not yet structured for direct valuable insights ( which chatGPT will then deliver). Another case is virtual assistant for managerial organizational task, or even creative tasks, like developing a marketing tagline, or build up IT codes. ?
Still, one thing must remain clear. ChatGPT is a predictive model. Its accuracy is not perfect, and may fall quickly if it did not get enough training data around it. As a statistical model, it also may not deliver the same answer to the same prompt. The model is as good as to the data it has collected, ?so that it should be constantly retrained to be real time accurate. Finally, erven if it is trained of billions of data, a large part of data remains strictly private- so that chatGPT ?is blind around enterprise close doors.
Those are rather critical limitations that should be clearly been taken into account when using GPT. For example, in a sector like private equity where I advise (Antler and FortinoCapital), chatGPT may have a hard time to get a proper deal flow of newbie companies, if not trained on real time data. Private sources may also limit the capacity of finding interesting bootstrap companies for instance. Likewise, answers provided may not be fully perfect ( so called hallucination)
领英推荐
3.??????Is Artificial intelligence, ?human intelligence, really?
Finally, artificial intelligence does not mean that AI, under its current zoom of language model, matches all tasks of human intelligence,-especially reasoning. The shortcut made by some is a false logic that claims that chatGPT may have acquired simple reasoning from learning from a massive amounts of real-world data. OpenAI itself is aware of many limitations of ChatGPT as posted on its ?website.and as recognized in public by Open AI’s CEO.
In fact, in line with Open AI cautions,and despite those drumbeat claims, most of the recent works testing chatGPT reasoning performance demonstrate it remains ratherdumb. ?A recent study by Bang and colleagues shows that ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense rea[1]soning. While reinforcement learning techniques would make LLMs to become better in reasoning, it is not there yet for a large amount of reasoning tasks.
Finally, and not least, the question is not that AI has yet to prove strong reasoning capabilities. What can be missing is the prevalence of data bias, unethical use, and more. ?The genius is there, but this is not yet Artificial General Intelligence. While potentially powerful, though, we are also to understand the conditions, such as jailbreaking, where LLMs can be harmful too.
References
Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., ... & Fung, P. (2023). A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity.?arXiv preprint arXiv:2302.04023.
Choi, Jonathan H. and Hickman, Kristin E. and Monahan, Amy and Schwarcz, Daniel B., ChatGPT Goes to Law School (January 23, 2023). Minnesota Legal Studies Research Paper No. 23-03, https://dx.doi.org/10.2139/ssrn.4335905
Douwe Kilea et al, 3021, Dynabench: Rethinking Benchmarking in NLP"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Li, Belinda, Implicit Representations of Meaning in Neural Language Models, arXiv:2106.00737, ?https://doi.org/10.48550/arXiv.2106.00737
Smith, Craig, 2023, Hallucinations Could Blunt ChatGPT’s Success, IEEE Spectrum, march 13
Wang, James; 2020, Improving at 50x the Speed of Moore’s Law: Why It’s Still Early Days for AI, Ark Investments