RAGgregators: AI for next-gen publishing
Image by you.com

RAGgregators: AI for next-gen publishing

Most of the time, I'm focused on next-gen AI instead of gen AI – dissecting and writing about the weaknesses and shortcomings of our conceptions of current tech to prod people into architecting more solid foundations for future AI work. This might lead you to conclude that I'm a naysayer or anti-AI, but in fact I'm thrilled by the advances that we see with the advent of LLMs and their applications.

Today, I want to change gears and focus on one example of what we can do – now – with today's tech, in spite of all its limitations.?

Jay (JieBing) Yu, PhD , a far more savvy and adept maker than I – with far better tools –, crafted a custom RAG system with OpenAI on the back end. He collected about a dozen posts that I authored on LinkedIn to make a sort of "MikeGPT " that will answer on my behalf – and quite accurately – any questions you have about me and how I think about knowledge graphs and LLMs – by weaving together content from my posts. [Thanks, Jay!] This simple chatbot is slaving away while I'm sleeping, out hiking, watching a movie with my youngest, seeing friends over coffee, talking to entrepreneurs, or thinking up topics for my next posts.? It does a far, far better job than I do of responding to highly technical and highly non-technical audiences and to people who speak other languages – and it doesn't have to take naps, go to the doctor, or stop to think of a hard-to-remember translation for some rare technical term. And it can easily find where and when I mentioned this or that key idea – which I'm having serious trouble with as the number of posts that I publish grows.

You can "talk" to "me" – or to my more highly skilled avatar – at the chatbot here .

How did Jay do this? He used the RAG-building tools from epsilla.com . They offer a cool, no-code/low-code workbench for creating special purpose, high-accuracy chatbots – with well-known techniques like vector search and retrieval-augmented generation.??

Cute. But so what?

People talk a lot about LLMs as content generators, which is technically correct.? But seen from a different angle, when we load a gazillion words of content during training an LLM (or inject them at prompt time) and then selectively wrangle the most relevant of them into answers for specific questions, we're doing large-scale content aggregation. And as we adapt the output for different languages, reading levels, and levels of expertise, we're personalizing these aggregations and multiplying the audiences that have access to the same content. So we're talking here not about answering questions but about building platforms for personalized aggregation of targeted content.? For fun, we can call these systems RAGgregators (RAG-based content aggregators), as in the stylish subject of the image above.

The result is something light years ahead of our usual experience with what-you-want-might-be-somewhere-in-one-of-these-6M-documents-type search that returns the original documents and leaves the user on his/her own to slog through and (maybe) understand them. I call this the FEDEX-style search experience:? the search engine (like FEDEX) dumps a pile of content on your doorstep, wishes you luck, and you, the consumer, have to slog through it all to figure out what's useful, what's understandable, and what to do with it. A tall order, even if you're in a familiar domain.? If you're starting something new (like students), then it's overwhelming.

One key difference, then, is that RAGgregators not only locate information but synthesize it according to your needs.

It's also light years ahead of our usual experience with publications more generally, whether gossip magazines, technical journals, news websites, or textbooks.? They aggregate content for us, the subscribers. But we, the readers, have to adapt our behavior to the language, phrasing, and organization that the publishers and authors have chosen, instead of vice-versa.? The end result is that more often than not the knowledge that we need or want is barricaded behind distributed storage, multiple publication channels, paywalls, expert jargon, turgid writing styles, and lousy search engines. Right now, the best kludge we have to find new knowledge is to check the date stamp; to find related knowledge, only overlapping keywords (if and when we know them).

Caution: Storm front looming, turbulence ahead

The experience with chatbots like what Jay built in this case is so much better than current experiences with content that RAGgregators seem neatly poised to disrupt the whole US$ 33 B newspaper industry, the US$ 9 B textbook industry, and several other kinds of content aggregator/distributors.? We could build personalized RAGgregators for music, for video, for art, for events, for recipes, for restaurants – even for perfumes(!).? The growing availability of multimodal LLMs (the underlying technology) means that even these sci-fi-style systems are already real possibilities.

So a perfect storm seems to lie straight in the path of traditional publishers, who are not known for being very tech savvy. And a very real opportunity for disrupting yet another industry seems within reach for their competitors.

What's missing?

Even at the nano-scale level of my tiny chatbot some key pieces to the puzzle seem to be missing.?

  • Content ownership is very problematic. Whatever you dump into the maw of OpenAI gets drowned in an ocean of other content, so attribution and ownership – and potential revenue – get murky and unmanageable. This is a deal breaker for most commercial uses of RAGgregators. If I "publish" my proprietary content through a RAGgregator, businesses will ask, how can I protect it from the LLM provider? Right now, I can't see a way to build a firewall between RAG content and the LLM that will process it.
  • Navigation is problematic.? For na?ve end users, a blank chatbot window is intimidating – it's the cold start problem all over again.? They have no idea what to even ask about – especially if they're newbies to a particular field. For my content, I'll have to develop some kind of map (adapted to the user's needs and interests?) so they know what to look for without wasting their time. And they'll need more info about the capabilities of the RAGgregator – like which languages it can handle well, its abilities to paraphrase at different levels of readability, its limitations, etc. – so they get better at asking questions and have a more rewarding experience. ?
  • Input modalities are evolving. Text-only entry through a blank window will cover many but probably not enough use cases. Multimodal LLMs will allow us to upload a QR code or photo as a "question" or an audio snippet à la Shazam.? For RAGgregators-as-textbook-replacement scenarios, we'll need to be able to upload written or scribbled student work, compare it to some canonical content, and make suggestions for improvements.
  • Output is only answers.? For many applications, we'll need mixed-initiative systems that can follow a (self-generated?) agenda to gather the information it needs to personalize outputs or to guide next steps with its own questions.
  • Tooling is not yet available enough. But there are lots of companies like epsilla.com that are developing low-code/no-code tools to bring RAGgregators within reach of people without computer science degrees and big budgets.
  • Feedback for content creators. ?I haven't seen any tools yet that aggregate consumer behaviors on RAGgregated content delivered with LLMs. This will be crucial for marketing teams and fundamental for teachers and other content creators.

The opportunities seem endless.? Content creators can benefit from wider audiences. Content consumers will benefit from more, more relevant and accessible content. Platform and tooling providers will enjoy a growing market. Existing content can be licensed in new ways. What's not to love?

Jan Wilczyński

CVO @ Algopoetica | Exploring AI with conscious cognitive architectures | Creating AI that thinks, decides, and acts with purpose

8 个月

Krzysztof Deneka interesting! Worth checking for our internal assistant.

David Zlotolow

AIMLUX : Equitus. Ai Consulting

8 个月

I am convinced the future of all computing js going to interface with llm's, workflow automation layer and deep learning ing engines...

回复
Chris Brown

Business Leader Offering a Track Record of Achievement in Project Management, Marketing, And Financial.

8 个月

Exciting times ahead for content aggregation with RAGgregators!

Mike Dillinger, PhD Thank you for another thought-provoking piece! Glad that you recognized the value of the chatbot (https://tinyurl.com/MikeDillenger) dedicated to your writings. I just tried the following: Q: What does Mike write the most about ? A: ... Mike Dillinger writes extensively about the intersection of human expertise and machine intelligence, with a particular focus on knowledge architecture and knowledge graphs ... I hope the chatbot catpured your intention well.

Ingenious concept! How will RAGgregators impact traditional journalism and knowledge dissemination? ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了