登录查看更多内容

RAGgregators: AI for next-gen publishing

Mike Dillinger, PhD

发布日期: 2024年3月1日

Most of the time, I'm focused on next-gen AI instead of gen AI – dissecting and writing about the weaknesses and shortcomings of our conceptions of current tech to prod people into architecting more solid foundations for future AI work. This might lead you to conclude that I'm a naysayer or anti-AI, but in fact I'm thrilled by the advances that we see with the advent of LLMs and their applications.

Today, I want to change gears and focus on one example of what we can do – now – with today's tech, in spite of all its limitations.?

Jay (JieBing) Yu, PhD , a far more savvy and adept maker than I – with far better tools –, crafted a custom RAG system with OpenAI on the back end. He collected about a dozen posts that I authored on LinkedIn to make a sort of "MikeGPT " that will answer on my behalf – and quite accurately – any questions you have about me and how I think about knowledge graphs and LLMs – by weaving together content from my posts. [Thanks, Jay!] This simple chatbot is slaving away while I'm sleeping, out hiking, watching a movie with my youngest, seeing friends over coffee, talking to entrepreneurs, or thinking up topics for my next posts.? It does a far, far better job than I do of responding to highly technical and highly non-technical audiences and to people who speak other languages – and it doesn't have to take naps, go to the doctor, or stop to think of a hard-to-remember translation for some rare technical term. And it can easily find where and when I mentioned this or that key idea – which I'm having serious trouble with as the number of posts that I publish grows.

You can "talk" to "me" – or to my more highly skilled avatar – at the chatbot here .

How did Jay do this? He used the RAG-building tools from epsilla.com . They offer a cool, no-code/low-code workbench for creating special purpose, high-accuracy chatbots – with well-known techniques like vector search and retrieval-augmented generation.??

Cute. But so what?

People talk a lot about LLMs as content generators, which is technically correct.? But seen from a different angle, when we load a gazillion words of content during training an LLM (or inject them at prompt time) and then selectively wrangle the most relevant of them into answers for specific questions, we're doing large-scale content aggregation. And as we adapt the output for different languages, reading levels, and levels of expertise, we're personalizing these aggregations and multiplying the audiences that have access to the same content. So we're talking here not about answering questions but about building platforms for personalized aggregation of targeted content.? For fun, we can call these systems RAGgregators (RAG-based content aggregators), as in the stylish subject of the image above.

The result is something light years ahead of our usual experience with what-you-want-might-be-somewhere-in-one-of-these-6M-documents-type search that returns the original documents and leaves the user on his/her own to slog through and (maybe) understand them. I call this the FEDEX-style search experience:? the search engine (like FEDEX) dumps a pile of content on your doorstep, wishes you luck, and you, the consumer, have to slog through it all to figure out what's useful, what's understandable, and what to do with it. A tall order, even if you're in a familiar domain.? If you're starting something new (like students), then it's overwhelming.

European Leadership 2 个月前

Top 10 GPTinf Alternatives: Best AI Bypassers in 2024…

Parul Gautam 5 个月前

Top 10 Alternatives to HIX Bypass to Beat AI Detection

Shushant Lakhyani 5 个月前

One key difference, then, is that RAGgregators not only locate information but synthesize it according to your needs.

It's also light years ahead of our usual experience with publications more generally, whether gossip magazines, technical journals, news websites, or textbooks.? They aggregate content for us, the subscribers. But we, the readers, have to adapt our behavior to the language, phrasing, and organization that the publishers and authors have chosen, instead of vice-versa.? The end result is that more often than not the knowledge that we need or want is barricaded behind distributed storage, multiple publication channels, paywalls, expert jargon, turgid writing styles, and lousy search engines. Right now, the best kludge we have to find new knowledge is to check the date stamp; to find related knowledge, only overlapping keywords (if and when we know them).

Caution: Storm front looming, turbulence ahead

The experience with chatbots like what Jay built in this case is so much better than current experiences with content that RAGgregators seem neatly poised to disrupt the whole US$ 33 B newspaper industry, the US$ 9 B textbook industry, and several other kinds of content aggregator/distributors.? We could build personalized RAGgregators for music, for video, for art, for events, for recipes, for restaurants – even for perfumes(!).? The growing availability of multimodal LLMs (the underlying technology) means that even these sci-fi-style systems are already real possibilities.

So a perfect storm seems to lie straight in the path of traditional publishers, who are not known for being very tech savvy. And a very real opportunity for disrupting yet another industry seems within reach for their competitors.

What's missing?

Even at the nano-scale level of my tiny chatbot some key pieces to the puzzle seem to be missing.?

Content ownership is very problematic. Whatever you dump into the maw of OpenAI gets drowned in an ocean of other content, so attribution and ownership – and potential revenue – get murky and unmanageable. This is a deal breaker for most commercial uses of RAGgregators. If I "publish" my proprietary content through a RAGgregator, businesses will ask, how can I protect it from the LLM provider? Right now, I can't see a way to build a firewall between RAG content and the LLM that will process it.
Navigation is problematic.? For na?ve end users, a blank chatbot window is intimidating – it's the cold start problem all over again.? They have no idea what to even ask about – especially if they're newbies to a particular field. For my content, I'll have to develop some kind of map (adapted to the user's needs and interests?) so they know what to look for without wasting their time. And they'll need more info about the capabilities of the RAGgregator – like which languages it can handle well, its abilities to paraphrase at different levels of readability, its limitations, etc. – so they get better at asking questions and have a more rewarding experience. ?
Input modalities are evolving. Text-only entry through a blank window will cover many but probably not enough use cases. Multimodal LLMs will allow us to upload a QR code or photo as a "question" or an audio snippet à la Shazam.? For RAGgregators-as-textbook-replacement scenarios, we'll need to be able to upload written or scribbled student work, compare it to some canonical content, and make suggestions for improvements.
Output is only answers.? For many applications, we'll need mixed-initiative systems that can follow a (self-generated?) agenda to gather the information it needs to personalize outputs or to guide next steps with its own questions.
Tooling is not yet available enough. But there are lots of companies like epsilla.com that are developing low-code/no-code tools to bring RAGgregators within reach of people without computer science degrees and big budgets.
Feedback for content creators. ?I haven't seen any tools yet that aggregate consumer behaviors on RAGgregated content delivered with LLMs. This will be crucial for marketing teams and fundamental for teachers and other content creators.

The opportunities seem endless.? Content creators can benefit from wider audiences. Content consumers will benefit from more, more relevant and accessible content. Platform and tooling providers will enjoy a growing market. Existing content can be licensed in new ways. What's not to love?

Knowledge Architecture

3,307 位关注者

Jan Wilczyński

CVO @ Algopoetica | Exploring AI with conscious cognitive architectures | Creating AI that thinks, decides, and acts with purpose

8 个月

Krzysztof Deneka interesting! Worth checking for our internal assistant.

2 次回应

David Zlotolow

AIMLUX : Equitus. Ai Consulting

8 个月

I am convinced the future of all computing js going to interface with llm's, workflow automation layer and deep learning ing engines...

Chris Brown

Business Leader Offering a Track Record of Achievement in Project Management, Marketing, And Financial.

8 个月

Exciting times ahead for content aggregation with RAGgregators!

1 次回应

Jay (JieBing) Yu, PhD

8 个月

Mike Dillinger, PhD Thank you for another thought-provoking piece! Glad that you recognized the value of the chatbot (https://tinyurl.com/MikeDillenger) dedicated to your writings. I just tried the following: Q: What does Mike write the most about ? A: ... Mike Dillinger writes extensively about the intersection of human expertise and machine intelligence, with a particular focus on knowledge architecture and knowledge graphs ... I hope the chatbot catpured your intention well.

2 次回应

John Goliash

8 个月

Ingenious concept! How will RAGgregators impact traditional journalism and knowledge dissemination? ??

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

RAGgregators: AI for next-gen publishing

Mike Dillinger, PhD

领英推荐

Knowledge Architecture

3,307 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Uncheck AI vs StealthWriter: AI Checker and Humanizer Showdown

10 Best AI Bypassers to Bypass AI Detection Every Time

Netus AI Review: Does Netus AI Make AI Content Undetectable?

The Great Restructuring: How Generative AI Is Changing Creative And Media Agencies Worldwide!

Bypass ZeroGPT AI Detection: A Comprehensive Guide

?? Pepperoni Hug Spot

Stealthly vs BypassGPT: The Ultimate Showdown for Bypassing AI Detection

The Generative AI Revolution: Takers, Shapers, and Makers

Embracing the Future of AI - Join the DeepBrainz AI Trusted Tester Program and Help Shape Human-AI Co-Writing!

ConVEx 2024: Demystifying Generative AI, Structured Content, and Knowledge Graphs

领英推荐

Knowledge Architecture

3,307 位关注者

Knowledge Graphs are Essential for Safe AI

2024年11月11日

Knowledge graphs, Linguists, and the Last-mile problem of AI

2024年11月4日

Audio: How to make AI safe and reliable?

2024年10月21日

Audio: What are Knowledge Graphs?

2024年10月1日

Entity Resolution: Priority #1 for Building Real Knowledge Graphs

2024年9月6日

Google's Semantic Search: Going to the Dogs?

2024年8月26日

Spelling-driven Reasoning in LLMs

2024年8月2日

Stuck in the Muck: Big Data means Big Problems

2024年7月31日

Better Knowledge for Better AI

2024年7月24日

Psychological Foundations of AI

2024年7月22日

社区洞察

其他会员也浏览了

Uncheck AI vs StealthWriter: AI Checker and Humanizer Showdown

10 Best AI Bypassers to Bypass AI Detection Every Time

Netus AI Review: Does Netus AI Make AI Content Undetectable?

The Great Restructuring: How Generative AI Is Changing Creative And Media Agencies Worldwide!

Bypass ZeroGPT AI Detection: A Comprehensive Guide

?? Pepperoni Hug Spot

Stealthly vs BypassGPT: The Ultimate Showdown for Bypassing AI Detection

The Generative AI Revolution: Takers, Shapers, and Makers

Embracing the Future of AI - Join the DeepBrainz AI Trusted Tester Program and Help Shape Human-AI Co-Writing!

ConVEx 2024: Demystifying Generative AI, Structured Content, and Knowledge Graphs