The second renaissance
Episode 24: 20/04/2023
?
It is raining. The rain is of that very fine type, common in England, which is almost imperceptible but still manages to get everything thoroughly soaked.
?
Soaked like me, right now. Water drips off the end of my nose. I nestle into the porch of the Fitzroy Tavern, hoping that the people idly watching me from inside realise it is water. I could have waited in there with them, but this is an important meeting and it’s somehow calming to stand.
?
It is April 2023. Amazingly, this is the first in-person meeting we’ve had for the El Toco search engine since January 2020. This is mainly for budgetary reasons, but you can’t say we aren’t doing our bit for carbon emissions.
?
The lack of meetings is not why I am slightly on edge. Today I am meeting part of Piba Studio , our web designers, to talk about artificial intelligence.
?
?
We’ve been working on El Toco for almost seven years by now. All this time, we’ve been waiting for search engines to come up with the next big idea.
?
Initially, this was with some trepidation. There was a period, during the lonely early years, where I didn’t dare look at the tech news online, for fear that some giant leap forward would force a big rethink of El Toco's platform.
?
But, as time has passed, they seemed to be contenting themselves with pootling around with the front end. The Google we have in 2023 looks and behaves much as it did in 2016, or even 2006. This has been frankly somewhat of a relief. But has it been a false sense of security?
?
2022 was a rough year for El Toco. We survived all those close shaves but, just as we were finishing our search pages before launch, ChatGPT went live.
?
ChatGPT isn’t in itself a surprise. Most people in tech, and particularly in search, have been watching natural language models get steadily better for roughly a decade by now. However the release may have been a surprise to the man on the street, who thought we'd cracked natural search with Ask Jeeves back in about 2001. Also, bizarrely, it definitely was a surprise for Google, who thought themselves the market leader but seem to have been completely wrongfooted by OpenAI, springing out of nowhere to poop their party.
What gave everyone, including our team at El Toco, pause for thought was how much better ChatGPT was than the laughable attempts that had been published before it. This might have been why OpenAI was able to deliver the marketing coup of putting it on a platform you could actually use. Nobody bothered doing that before, because its predecessors sucked.
?
With the release, AI was suddenly a topic everyone had an opinion about, and the resulting hype has made it difficult to separate these opinions from fact. By spring 2023, ChatGPT has been around for about six months, meaning the dust has settled somewhat. So while back in London to visit friends and family, I’m taking some time to see how early adopters are actually using it.
?
The meetings have taken place in old man pubs, which have been a nice counterpoint. We sit in dingy, darkly panelled interiors, that probably haven’t changed for several centuries, to talk about the cutting edge of computer science.
?
Yesterday, in one pub, a strapping great Bavarian studying a masters in London Business school explained how students on his course use ChatGPT. Aside from just getting it to write their essays for them, the main legit use seems to be brainstorming. For example, tell me five mid-cap companies which manufacture steel, and the year they were founded. Unfortunately, the service was down, so he couldn’t actually show me.
?
Today, in the Fitzroy Tavern, we're in luck because ChatGPT is working. The Piba and I settle down inside with some independently brewed cider to see how she uses it on her laptop. Her team has been encouraged to do so at least once a week for their non-El Toco projects.
?
We run through two examples. In one, it lists items of clothing, grouped together by type. It does put socks in the lingerie department, but even M&S hasn’t come to a firm conclusion on that yet, and ChatGPT has only been considering this weighty matter for 0.7 seconds. In the other example, it plans an itinerary for a week’s holiday in Prague.
These elementary case studies are informative because they’re how a real person has really used it, rather than a toy example that somebody has created for marketing purposes. What impresses me is that it can take one of its own answers and update it. When we tell it that socks are not lingerie, it corrects the list.
There’s more going on there than meets the eye. During a chat session, it must be holding its own answers in memory, so that it can refer back to them if the user asks it to. What’s interesting is that this implies another level of natural language processing. ChatGPT has to decide if the user’s next question relates to an answer it gave previously. Given that it’s not really thinking, this must be a fudge. When I eventually have time to look it up, out of the whole model, this will be the detail I’ll be particularly interested in.
?
The finding from this comprehensive survey, of, ahem, two people, is that as a sort of digital Watson, for bouncing ideas off, ChatGPT clearly has legs. Efforts to refine it and scale it up will doubtless continue for years to come.
It is difficult to predict where that line of research will lead. Given that it doesn't actually think, it may well turn out to be just an expensive technological cul de sac. But, perhaps, if you put layer upon layer of this kind of model on top of each other like the pastry in a croissant, you end up with a genuine artificial intelligence, of the sort that people are already getting excited about.
Seeing it in action, you can see why people are extrapolating out into the future and wondering about the downsides. This results in worrying questions, like how we will combat the call-centre-related terror that will be unleashed on the world.
If you put all those questions together, it seems like there are a great many downsides, meanwhile the upsides are rather unclear. The eight billion extremely sophisticated biological brains that we've already got on Earth aren't going to have much constructive stuff to do, if we've developed some mechanical brains that do every single task better.
In this episode, we will talk about how this impacts El Toco and the search market in general. But, before exploring any applications of the technology in more depth, it's worth briefly considering the much more fundamental question. Given the potential downsides, should we be developing artificial intelligence at all?
?
Should we be developing artificial intelligence at all?
?
?
The most memorable conversation I’ve ever had about AI was with a professor in computer science. I was still at university then, so it was many moons ago, but I can remember it like yesterday. Before relating that short conversation, let's review the context in which it took place.
?
It started, as these things do, with The Matrix.
?
I once heard a vicar say that everybody must find their own way to God. This superficially profound phrase applies to many things. As was revealed to me many years later during the pandemic, when the trainer in an exercise video said the same thing about yoga. Like these other religions, everybody finds their own way to science fiction. If you have not found your way yet, indulge me for a moment.
?
Science fiction is really just a vehicle to explore who we are today. What sets it apart from other genres, like vanilla drama, is that science fiction also explores the effects of technology. This sideline has proven to be remarkably useful for merchandising purposes, involving, as it does, things like cool laser swords and genetically-enhanced space marines. One of the technologies which science fiction has explored, in quite some depth, is artificial intelligence.
?
There are essentially two treatments of AI in science fiction. On the one hand, there is the benign treatment, where it is helpful, friendly, and has some funny escapades. Examples of this are the Doctor and Data in Star Trek, and the Minds in the books by Iain M Banks.
?
On the other hand, there is the not benign treatment. These stories rely a lot on dramatic foreshadowing. Because, sooner or later, they always get to the part where the AI does the maths, and decides to kill everyone.
?
Despite this tiresomely predictable outcome, the not benign treatment enjoys a rich heritage, going right back to Karel Capek’s 1920 play, Rossum’s Universal Robots, which is where we get the word robot from.
Yup, right from when we first imagined robots, the plot revolved around them killing everyone.
?
If you are a teenage boy, The Matrix is one of the coolest films you will ever watch. The reason for this is because of its message about who we are today. It tells you, irrespective of who you seem to be, that you too can be a badass computer expert with an edgy dress sense who can beat anybody up using kung fu. It is actually the same message as in Cinderella. But whereas she drops glass slippers and crockery, the only things they drop in The Matrix are spent clips and funky rap metal tracks.
?
The origins of the Matrix are only briefly alluded to in the film. If you dig around the subject a little, you eventually come across the Animatrix, a miniseries which explains that backstory. It describes how man creates sentient machines, they work together, but eventually that relationship breaks down, culminating in the machines turning everybody they haven’t killed into batteries.
?
If you squint a bit, the plot of the Animatrix dovetails nicely with other works of fiction, especially Ex Machina, I, Robot and the Terminator films, such that you can pretend they’re all part of the same narrative. Sometimes the writers get a bit carried away with details that are scientifically implausible, like human batteries and time travel. But, if you ignore all that, the overarching theme is very disturbing. It is disturbing because there’s nothing in that narrative which doesn’t also dovetail with what’s been happening in the real world over the last few decades.
?
Years after watching The Matrix, I studied economics at university. There I learned that economists think about killer robots too. They have to pretend they’re doing serious work, so they just refer to them vaguely as technology. Upon discovering this subterfuge, you realise you can substitute “killer robots” whenever an economist says “technology”, to make the subject more fun. Papers like The Effects of Technology on the Belgian Microbreweries Industry suddenly spring to life.
?
The summary of the literature is that economists are positive on the effects of technology overall. If you’re thinking "well, duh", that isn't the aim of the studies. The studies are mainly focused on the nuts and bolts of how technology affects daily life, at the economy-wide level. However, economists also note that the short run impact of technology, for certain specific groups of people, can be wholly negative.
?
The thing about AI is that the creation of a genuinely intelligent machine is not like other tech shocks. It goes a bit beyond the sudden ability to stream The Lion King in ultra high definition.
A genuinely intelligent machine, that actually thinks for itself, is a whole other ballgame. If comparatively minor tech shocks have entirely negative outcomes for some people, you can see how this massive one might have entirely negative outcomes for a lot more people.
?
So, from both science fiction and economics, there are clear warnings that it might not end well.
?
Thus we arrive at the serious question of whether we should be creating AI at all. After thinking idly about the subject for a number of years, that is the stage I’d got to when I plonked down in the office hours of the head of computer science at my university to discuss it with him.
?
By this point, I was actually studying AI, as a component of my masters in computational finance. When you study something it can really sap the joy out of it, and the AI modules were no exception. Machine learning algorithms are fiddly things, and their applications in the financial sector are rather underwhelming. So to keep the fire of personal interest burning, I decided one day to visit my lecturer to discuss my concerns about the killer robots.
?
Despite the fact that I got my foot in the door by pretending to have a query about the course material, he was happy to play ball.
?
So what did this senior computer scientist think? Should we be creating AI at all?
?
In his answer, my lecturer spent some time stressing the academic interest of this goal. Also how we were, at that stage, a very long way from reaching it. Neural networks were showing some promise in emergent behaviour. But we were so far from understanding the mechanics of even simple animal brains that recreating the most sophisticated brain ever known, the human one, didn’t seem likely any time soon. The notion of creating something superior was laughable.
?
Having given all of this preamble, he then got to his actual reply. It was succinct, but sometimes the best replies are.
领英推荐
?
The morality doesn’t matter. People are going to do it anyway.
?
?
The first renaissance
?
The point of the previous section is to illustrate that a lot of the debate about artificial intelligence already took place during the previous decades, both in academia and in people's imaginations. The conclusion is that development of AI isn't going stop, now that it has started. We're just going to have to deal with the consequences as and when they arise. This is just like any other technology, from farming to TikTok.
So in early 2023, the question for El Toco is what might be the consequences for web search.
?
The applications of services like ChatGPT to the daily grind of information retrieval were fairly obvious. Rather than spending ages meticulously organising information, you just feed it raw to a natural language model. The data is never organised, but the model doesn’t care, because it can spit out and summarise the data for you on the fly.
?
With the advent of good natural language models, Google's vision of web search behaving like a magic box seems to have taken one step closer to reality. There is a paucity of original ideas in search, so if you read around this subject, the whole thing feels like an inevitability.
But is it? It turns out that there is another model for information retrieval. This model goes back to the first renaissance, predating the concept of artificial intelligence by many hundreds of years. However, I only fully realised this quite recently, due to one of those coincidences that life throws at you which are so spooky they seem almost providential.
?
I have a very good friend who always exchanges books with me at Christmas and birthdays. Due to the pandemic and living in far flung parts of the world, the book exchanging got paused for quite a long time. In the meantime, the books accumulated, such that our happy reunion in late 2021 was like several Christmases and birthdays all rolled into one.
Because the resulting books were so numerous, it has taken me literally years to read through them. And so it wasn’t until early 2023, following the launch of ChatGPT, just as El Toco was getting ready for its own launch, that I sat down to read The Catalogue of Shipwrecked Books.
?
It is a true story, and the more I read it, the more amazed I became. Because it turns out that idea of El Toco, and the tale of creating it, has all happened before.
?
Fernando Columbus, son of the famous explorer, set out to create a new library in the early 1500s. Not just any old library. Fernando was bitten by the organisation bug. He wanted to create an index of organised books, catalogued by different features. You could use that index to find any printed book, ever.
Fernando had set out with the exact same goal in mind as we did with El Toco, five hundred years later. The parallels between the two projects are numerous. Like us, Fernando was trying to create order out of the chaos of a relatively new medium, for him printed books, for us, the web. Like us, he had to create a his organisational system on the fly. Like us, he soon discovered it was too big a job for one person. Like us, his solution was to employ a team of contractors, whose job it was to collect and file things in the right place. Like us, he had problems with the quality of data, and his solution was to give the same job to multiple people in order to sanity check each others' work. Like me, he lived partly in Latin America while doing this, and the rest of the time travelled around Europe.
The book is peppered throughout by comments from its author, speculating in the present day that if somebody was to apply themselves to doing this for the internet, it would be well useful, innit. Well, it turns out that's exactly what we have done. That's El Toco.
?
History repeating itself is a concept we're all familiar with. Less common is the people realising that it is going on, while history repeats itself around them. And I can tell you from first hand experience of being in that situation that you discover a new feeling, for which we don't really have a word in English. The best adjective is spooky.
?
The point of El Toco was to create a version of the internet which works more like a catalogue. It does use AI, and we have spent many years writing it. But the AI is behind the scenes, quietly organising that catalogue like a librarian. All the information is there, tidy and organised, but the onus is on the user to go in and pick out what that they want.
?
This is a different vision of search from the chat-based one. And the two technologies have different applications.
?
If you want a concrete answer to a specific question, you would use a natural language query. My usual example when discussing this “is how old is Britney Spears?”. Although to hide my own age this should probably be updated to be something to do with Taylor Swift. The natural language tools are also useful if you also want to summarise a topic, or rephrase an existing block of text.
El Toco’s filters-based search is very different. Rather than providing you with discrete answers, it directs you to their original source.
?
It’s analogous to a library, and this reveals something else about this model of information retrieval which is distinct from the chat-based one. Quite often, you'll go into a library or a shop to get something specific, but have a cheeky nose around anyway. This browsing activity is fun, and its one of the ways we discover new things. It particularly lets us discover new things we didn't even know existed. The sort of things that a chatbot will never recommend.
People do their browsing nowadays on social media. But it’s a very limited form of browsing, because an algorithm somewhere is deciding what stuff to show you, piecemeal. A case in point is this blog, where LinkedIn decides which articles to show to which people, leading to numerous offline conversations about why I stopped writing it, with people who it decided not to show the latest episode to. LinkedIn is a hand that I'm happy to bite, because it has yet to feed me.
You can't go onto social media and say "show me all the women's fashion brands in London", and then pretend you're on a sort of digital Oxford Street, and have a nose around each one. That sort of browsing activity has become impossible nowadays. Neither the social media platforms, nor the search engines want to talk about it, because it's not a problem they're trying to solve.
The alternative to natural language search is structured information. The idea behind Fernando Columbus' library. It is still a viable idea, and it lets you do things that chat-based tools cannot.
?
Implications of artificial intelligence for the search market
?
So in fact there are two competing models for web search. Natural language versus structured information. It is not a foregone conclusion that one will dominate the other. What might the search market look like, now that both of these cards are on the table?
?
Economics is a new field so there’s a lot of dismissive waffle about it not being a real science. Such comments generally say more about the person producing the waffle than the field itself. Having said that, economists are great at analysing the past, reasonable at analysing the present, and pretty godawful at anything to do with the future.
?
With that caveat in mind, let’s indulge in a little science fiction of our own.
?
Imagine we’re on the bridge of the USS Enterprise, or whatever futuristic setting tickles your fancy, and want to look up something on whatever the web eventually turns into. Will our query be carried out by flicking through a user interface, or will it be a sentence that we issue in plain English?
?
If you spend any length of time indulging in this fantasy, thinking about the svelte figure of Seven of Nine as she - oops. Where was I? Ah yes. If you spend any length of time indulging in this fantasy, you’ll come to the conclusion that futuristic information retrieval will probably involve both.
?
Given this end point, we can work backwards to Earth, 2023. People are actually already developing the technology that combines both structured information and natural language queries. The open source database community realised a hybrid approach would be handy in recent years. When this work settles down a bit, there will be a new generation of databases which store structured data but can be queried in plain English.
If we wanted to make a niche Star Trek reference, we could observe that this hybrid approach gives us The Best of Both Worlds. But we are above such cheap shots in this blog so, having observed that the technology is already coming along, let's consider the economics.
?
The people who believe in a free internet have enjoyed a good few decades but, loath as they are to admit it, the web is getting increasingly commercial. This is because if content is in some way useful, somebody’s eventually got to get paid for creating it. This is why Wikipedia often puts up those banners asking you to make a donation. Apart from such rare instances of charity, information provided online has to be funded by either advertising or subscriptions. This applies just as much to running a search engine as it does to providing news, or financial data.
?
And therein lies a challenge for the natural language search model. Nobody has figured out how to inject ads into an AI’s output without undermining it. How will you know if your AI is telling you the truth, if it's been paid to tell you things? This is a very key point, because it means that pure AI-based search will need to be funded by subscriptions.
?
A subscriptions-based search engine has already tried and failed. This was Neeva, which folded in 2023. However, the relevance of Neeva as a case study is questionable, because they didn’t really have a differentiated product from the existing, free, search engines.
?
So if you’ve spent all those billions developing natural language models, you’re going to ignore Neeva and at least give the subscriptions-based model a try. That experiment is in progress right now, in 2023.
?
The economic model they’re currently testing has users, who subscribe to the search tool, and the search tool, which subscribes to the underlying sources of information. Everybody gets paid, but it depends on money from the end users, so unless you subscribe you can't get access.
Whether or not this results in a scalable service remains to be seen. I can see my future self eating an entire factory’s-worth of hats here. But it seems right now there are three potential problems with this business model.
Firstly, how many people really want to pay for answers to their current doubts about Taylor Swift? Probably not enough to keep the lights on.
Secondly, a website-based, advertiser-funded search tool will always undercut the price of any subscription-based search tool, because it’s free for the users.
?
Thirdly, and this is the point that we tried to emphasise in the part about Fernando Columbus, people do more things online than just answer specific questions.
?
We can therefore conclude that, for now, for our advertiser-funded business model, we're able to proceed. There will be time to adjust if the situation changes.
This does leave us with a question about how to market El Toco, which we're still in two minds about. Despite the many years that went into creating it, El Toco's AI isn't mentioned in our marketing copy anywhere. The logic for this is that "using AI" isn't actually a user goal that makes them want to buy your product. So banging on about this in marketing materials doesn't seem to be a good way to bring in the punters. Having said that, everybody else is doing just that right now. There's an argument that a company bio along the lines of "El Toco is a search tool that uses AI to organise the web" would attract big investors. Or make it sound cool. The jury's still out on this one.
?
As far as the technology goes, the hybrid approach is on El Toco's roadmap for after we go live. This was always the plan even before ChatGPT. In this hybrid approach, you’ll still get your list of websites, but you can express tricky queries in plain English, so that you don’t have to click through all the filters manually. It’s not clear you’d want this all the time, but we can make it a feature you can toggle on and off.
?
Phew. Now that we’ve put the world to rights on AI and search, the time has finally arrived for El Toco’s launch.
?
Let’s blow this thing and go home.
?
?
This episode is dedicated to Professor Edward Tsang and Doctor Edward Wilson-Lee, for five minutes of their time.
Asset Allocation | Multi Asset Asset Portfolio Management | Public and Private Markets | Innovation & Collaboration | Experience in both London and Sydney
4 个月Your blog is very enjoyable Thomas but the link from Cinderella to The Matrix was the winner for me. ??????