Garbage In, Garbage out – The case for proper knowledge management
Gen AI isn’t a magic cure for every business problem.?
Now, some of you might be wondering:
“Why does he repeat this point over and over again?”
Well… Last week, I talked to a non-tech business development (BD) professional about the topic. And he told me about a new vendor they were considering.
Specifically, they were buying into the following line:
“We use Gen AI plus RAG (Retrieval Augmented Generation) over your proprietary data and create a full sales proposal, a full contract, a full marketing slick, a full [insert your content of choice].”
First off, this vendor was not us. ??
But in all seriousness, I see so many product vendors still spinning this kind of magical thinking. Especially some of the newer GenAI-first startups. The message goes, sprinkle a little RAG over an LLM and wonderful things will happen.
When I hear these types of stories, I want to scream, “STOP IT!”?
It’s not helping any of us, and it's muddying the waters massively. Gen AI has incredible value. But not if it’s used for the wrong task or against crappy data.
Now, I’ve talked about choosing the right task for Gen AI before, but now I want to focus on the data itself. Specifically the right IA (Information Architecture).?
IA is a fancy phrase for how you handle & structure important proprietary data. This is not a new argument, but Gen AI is shining a big light on the issue of good quality data. And personally, I think it’s well overdue.
TL:DR - It’s Garbage in, Garbage out with AI. You must get your data house in order to take full advantage of it.
Let’s go a little deeper.
Is your data ready for Gen AI?
The Harvard Business Review (HBR) considered this very topic just a few weeks ago. Check it out here.
This quote from the article says it all:
“These large language models do not solve the problem of disparate data sources. Companies need to address data integration and mastering before attempting to access data with generative AI.”
To reinforce the point, I had an excellent chat with a customer just this week on this exact topic. They are publicly traded (>$1b US$ revenue), operating in the defense sector, and have been using VT Docs for the last 7 years.
Rob (keeping other details private) bemoaned the fact that he’s having a tough time trying to cleanse their proprietary data structures.
These are some of the challenges he mentioned:
- Folders with multiple revisions (eg, v1, v2, vFinal, etc.),?
- out of date content (old product collateral, etc.),?
- content spread across multiple locations,?
- incorrectly tagged documents.
This other quote from that HBR article sums up the real challenge as a human behavior issue:
“We need data hygiene first, and anyone who’s had any experience in any kind of large or small organization knows this. I characterize it as a human behaviour problem, and like all attempts at behaviour change, it’s not a trivial matter.”
Jeff McMillan, the chief data, analytics, and innovation officer at Morgan Stanley Wealth Management.
So, how do we fix it?
The good thing is – it’s far from rocket-science. Here are a few quick tactical/process steps to get quick wins:
领英推荐
1) Start with the Job To Be Done (JTBD)
Slice off strands of automation rather than taking the “boil the ocean” or “big bang” approach.
2) Choose a specific task (or tasks) to automate with AI.?
One example from the Business Development world could be: “Find me relevant past performance to help make a bid/no-bid decision.”?
3) Create a new library
Let’s say your org uses Microsoft SharePoint as a repository (as is the case with most enterprise organizations we work with).?
Create a new library to hold the relevant data. Let’s call it “Past Performance”. Restrict permissions on this library to one or two key admins.?
If you’re not familiar with Sharepoint, a library is akin to a folder.
4) Prepare the new library
Locate your prior wins and copy the FINALs from their current location to the new library. Then, create a clear naming convention for the files.?
Note: I always include customer name, solicitation name and some form of date notation. It helps to play with various naming approaches using REAL examples. Just make sure the files are findable by eye.
5) Update the workflow
Now that we have our baseline, we need to update our workflow in two ways:
First, we need to add the final submitted documents into the new library using the naming convention everytime we win a new contract.
Second, we need to lock down the new Library permissions-wise to prevent contaminating it with the wrong documents.
But the final (and most important) step is to communicate the new structure to all team members clearly. And explain the rationale. If you don’t, you’ll find that “weeds will grow in the garden.”?
In other words, your best laid plans will fall apart. And it’ll become the wild west of document storage systems, with people uploading all sorts of random stuff.
Now, I’m almost embarrassed to suggest the above, as it’s so brain-dead obvious. But at its core, this is a people and behavior issue, not a tech issue.
Honestly, if you succeed with the above, you’ll see a massive difference in the quality of the results you get from Gen AI.
It’s just about having clear folder structures and feeding that to your LLM + RAG solution of choice. In our case, that’d be VT Docs & VT Writer.
Now the technical approach a vendor takes re: chunking and RAG is a whole other topic. And for another day.
In the end, good data hygiene is a human challenge, not a technical one. Prepping data structures now will pay dividends as you roll out any Gen AI solution.
Worth putting in the time and energy now.
---
Best,
Fergal
Founder & CEO of VisibleThread
PS: I hope you found this info useful. Follow me on LinkedIn, where I share more insights like this.
Improve Your Win Rate On Bids & Tenders | Public Procurement Specialist in Aerospace, Defense, Architecture, Engineering & Construction | 81% Win Rate
9 个月I enjoyed reading your article Fergal and I agree wholeheartedly with your observations and insights. Integrating GenAI is not a typical digital transformation where you buy new software, give everyone training videos and dump the implementation on your IT department. It's a lot more about change management and data strategy. Here's a value proposition canvas I created recently to look at the implementation holistically.
Exited founder turned CEO-coach | Helping early/mid-stage startup founders scale into executive leaders & build low-drama companies
11 个月Excited to dive into this edition, it sounds like a fascinating exploration of AI and real-world challenges!