RAGgregators: AI for next-gen publishing
Most of the time, I'm focused on next-gen AI instead of gen AI – dissecting and writing about the weaknesses and shortcomings of our conceptions of current tech to prod people into architecting more solid foundations for future AI work. This might lead you to conclude that I'm a naysayer or anti-AI, but in fact I'm thrilled by the advances that we see with the advent of LLMs and their applications.
Today, I want to change gears and focus on one example of what we can do – now – with today's tech, in spite of all its limitations.?
Jay (JieBing) Yu, PhD , a far more savvy and adept maker than I – with far better tools –, crafted a custom RAG system with OpenAI on the back end. He collected about a dozen posts that I authored on LinkedIn to make a sort of "MikeGPT " that will answer on my behalf – and quite accurately – any questions you have about me and how I think about knowledge graphs and LLMs – by weaving together content from my posts. [Thanks, Jay!] This simple chatbot is slaving away while I'm sleeping, out hiking, watching a movie with my youngest, seeing friends over coffee, talking to entrepreneurs, or thinking up topics for my next posts.? It does a far, far better job than I do of responding to highly technical and highly non-technical audiences and to people who speak other languages – and it doesn't have to take naps, go to the doctor, or stop to think of a hard-to-remember translation for some rare technical term. And it can easily find where and when I mentioned this or that key idea – which I'm having serious trouble with as the number of posts that I publish grows.
You can "talk" to "me" – or to my more highly skilled avatar – at the chatbot here .
How did Jay do this? He used the RAG-building tools from epsilla.com . They offer a cool, no-code/low-code workbench for creating special purpose, high-accuracy chatbots – with well-known techniques like vector search and retrieval-augmented generation.??
Cute. But so what?
People talk a lot about LLMs as content generators, which is technically correct.? But seen from a different angle, when we load a gazillion words of content during training an LLM (or inject them at prompt time) and then selectively wrangle the most relevant of them into answers for specific questions, we're doing large-scale content aggregation. And as we adapt the output for different languages, reading levels, and levels of expertise, we're personalizing these aggregations and multiplying the audiences that have access to the same content. So we're talking here not about answering questions but about building platforms for personalized aggregation of targeted content.? For fun, we can call these systems RAGgregators (RAG-based content aggregators), as in the stylish subject of the image above.
The result is something light years ahead of our usual experience with what-you-want-might-be-somewhere-in-one-of-these-6M-documents-type search that returns the original documents and leaves the user on his/her own to slog through and (maybe) understand them. I call this the FEDEX-style search experience:? the search engine (like FEDEX) dumps a pile of content on your doorstep, wishes you luck, and you, the consumer, have to slog through it all to figure out what's useful, what's understandable, and what to do with it. A tall order, even if you're in a familiar domain.? If you're starting something new (like students), then it's overwhelming.
领英推荐
One key difference, then, is that RAGgregators not only locate information but synthesize it according to your needs.
It's also light years ahead of our usual experience with publications more generally, whether gossip magazines, technical journals, news websites, or textbooks.? They aggregate content for us, the subscribers. But we, the readers, have to adapt our behavior to the language, phrasing, and organization that the publishers and authors have chosen, instead of vice-versa.? The end result is that more often than not the knowledge that we need or want is barricaded behind distributed storage, multiple publication channels, paywalls, expert jargon, turgid writing styles, and lousy search engines. Right now, the best kludge we have to find new knowledge is to check the date stamp; to find related knowledge, only overlapping keywords (if and when we know them).
Caution: Storm front looming, turbulence ahead
The experience with chatbots like what Jay built in this case is so much better than current experiences with content that RAGgregators seem neatly poised to disrupt the whole US$ 33 B newspaper industry, the US$ 9 B textbook industry, and several other kinds of content aggregator/distributors.? We could build personalized RAGgregators for music, for video, for art, for events, for recipes, for restaurants – even for perfumes(!).? The growing availability of multimodal LLMs (the underlying technology) means that even these sci-fi-style systems are already real possibilities.
So a perfect storm seems to lie straight in the path of traditional publishers, who are not known for being very tech savvy. And a very real opportunity for disrupting yet another industry seems within reach for their competitors.
What's missing?
Even at the nano-scale level of my tiny chatbot some key pieces to the puzzle seem to be missing.?
The opportunities seem endless.? Content creators can benefit from wider audiences. Content consumers will benefit from more, more relevant and accessible content. Platform and tooling providers will enjoy a growing market. Existing content can be licensed in new ways. What's not to love?
CVO @ Algopoetica | Exploring AI with conscious cognitive architectures | Creating AI that thinks, decides, and acts with purpose
8 个月Krzysztof Deneka interesting! Worth checking for our internal assistant.
AIMLUX : Equitus. Ai Consulting
8 个月I am convinced the future of all computing js going to interface with llm's, workflow automation layer and deep learning ing engines...
Business Leader Offering a Track Record of Achievement in Project Management, Marketing, And Financial.
8 个月Exciting times ahead for content aggregation with RAGgregators!
Mike Dillinger, PhD Thank you for another thought-provoking piece! Glad that you recognized the value of the chatbot (https://tinyurl.com/MikeDillenger) dedicated to your writings. I just tried the following: Q: What does Mike write the most about ? A: ... Mike Dillinger writes extensively about the intersection of human expertise and machine intelligence, with a particular focus on knowledge architecture and knowledge graphs ... I hope the chatbot catpured your intention well.
Ingenious concept! How will RAGgregators impact traditional journalism and knowledge dissemination? ??