My customized personal AI
Sebastien Laye
Economist, entrepreneur (AI and Financial Services), author, teacher
As early as June 2023, the technological conversation, particularly in the USA, revolved around the possibility of creating "one's own GPT chat". chat", to use a popular phrase. Technically, the challenge is to connect the API of one of the major Large Language Models on the market (Bard, Claude, Chat GPT, etc.) to your own database, and to train it on a smaller, qualified sample. A number of companies and start-ups, such as ADN AI in France, have launched tools to rapidly create these mini AIs. The initial cost announced by developers in June (several tens of thousands of euros) has been significantly reduced by these advances and new commercial offerings, even if a large part of the value (which these companies will undoubtedly also offer in terms of related services and consultancy) actually lies in the selection, curation and training of the data. This is what I discovered in August. There's no such thing as love at first sight in the field of artificial intelligence; once you've got past the stupor of product discovery (with the usual panegyric of reactions, "it looks like a human", "it's fast", "it thinks like me"), you really have to invest your summer nights to come up with a commercial product. And even then, in my case, my ambition with this first attempt was not to sell a service, but simply to offer the general public another way of accessing my economic reflections. By "act of love", I mean the care that must be taken when selecting data. You only have to look at the history of ChatGPT to understand that Open AI, in addition to processing web pages, had to scan, digitize, retrieve or purchase vast databases or publications. And so, although my AI theoretically and technically existed on August 5, it took three weeks of sweaty summer nights (in the USA) to perfect its analytical skills.
The first tricky issue in the choice of data and its use by the AI is the question of errors and hallucinations. These are generally due to conflicts between figures or points of view. This point, common in generalist AIs, should have been resolved by the choice of corpus in the case of my summer experiment: a priori, barring schizophrenia, I am consistent with myself, and despite variations in language or points of view on current events, my approaches and interpretations on the major economic subjects (growth, pensions, poverty, technological progress, public finances) do not vary. In the case of the first www.sebastienlayeecobot.com tests, it was the time perspective that sometimes broke the logic of the reasoning. At the end of 2016, I gave interviews in which I gave my opinion on the evolution of the economy in 2017. The AI sometimes seemed to respond as if it were actually at the end of 2016, sometimes reintegrating obsolete arguments and judgments into a contemporary discourse. It was therefore necessary to remove the old, forward-looking interviews from the corpus. This drawback was offset by the AI's ability to take an old analytical prism (e.g. my interpretation of the 2008 financial crisis) and apply it to a contemporary situation.
领英推荐
The second concern was the limited nature of the data corpus. With an initial 3.3 million signs or tokens in the database (the equivalent of a few hundred standard books), I hadn't covered all economic topics. When I tested the bot with some friends, one question out of three had no answer, the bot simply indicating that Sébastien Laye had never dealt with the subject. Remember that we could have corrected this by "rewiring" the AI on the web like ChatGPT, but this would have degraded the quality of the answer and betrayed the bot's promise. Since then, my treatment of this subject has been to write cards or notes reflecting my ideas and analyses on subjects not covered in my books, tribunes or reports, to compensate for the AI's black holes. As a result, the database is closer to 3.7 million tokens these days, and I will continue to update it over the long term.
Finally, the third - and most important - topic was the definition of style, the way of reasoning and pursuing analytical judgment. Simply using a database of my own writings was no guarantee of consistency of analysis or resemblance to my style. We had to work on what is known in AI as the persona, i.e. the bot's characteristics. As with prompt engineering, this phase is becoming a profession in the USA. After a few weeks of trial and error, the bot's persona is defined by around 40 lines and 50 criteria.