What companies need to know about AI and content science
Peter Hinssen
International Keynote Speaker | LinkedIn Top Voice | Best-selling Author | London Business School Lecturer | Serial Entrepreneur | nexxworks Co-Founder
A few months ago, I became obsessed with this new field that I’ve started calling “content science”. I believe it will become even more crucial for companies than the already very popular data science.
We have ChatGPT to thank for that, which burst onto the scene at the end of November 2022 and then basically changed everything. The entire world became hugely excited about generative AI, which is pretty much everywhere now. Even Khan Academy, of which I’ve been a huge fan for a long time, has integrated some sort of AI driven private tutor.
Knowledge dream or privacy nightmare?
The domain of content science is just as fascinating as it is fast moving. AWS recently launched the generative AI tool HealthScribe, which develops clinical notes on the basis of patient-clinician conversations. Doctors normally spend an enormous amount of time documenting these, which diverts their focus from their core, which is patient care. This is without any doubt a really useful tool. But just imagine the amount of information that AWS and Amazon are gathering by analyzing all these doctor-patient conversations. That could also be a privacy nightmare.
The wealth of knowledge being generated is truly fascinating. However, this rapid evolution can also be really scary for some. Just think about how the Writers Guild of America is currently on strike out of fear that studios will start using AI to create scripts for movies or TV shows. While I really don't believe this will become the norm, it's also evident that we're moving toward a new paradigm, giving rise to what I call "content science".
In fact, I’ve already seen many fascinating cases of companies capturing their internal knowledge
Faster, ChatGPT! Bill! Bill!
McKinsey, for instance, recently introduced Lilli, their very own generative AI tool, which packages all their in-house knowledge. So now, every time they receive a customer question, the first thing they do is to run it past Lilli. I’m really curious to see what this will mean for their business model and their fees. They're used to selling warm bodies, for which they charge a lot of money. But now that it's a lot faster, more efficient and easier for them to generate answers, how will they charge their services? Will they offer Lilli as a licensed “consulting-as-a-service”, for instance? There are plenty of opportunities for massive disruption, here.
Another fantastic and very similar example is Harvey, which offers generative AI services to law firms. Harvey AI is trained on general internet data from OpenAI’s GPT-4 as well as general legal data. On top of that it is trained on its customers internal documents and data, which allows it to assists them with contract analysis, due diligence, litigation, regulatory compliance and much more. In short, it helps lawyers become faster and more efficient.
And then we have OpenAI, which launched its enterprise customer license. This version is more robust, but the most interesting part here is that you can train it on your very own company documents, files and content.
What will that do to IP and copyright, I wonder?
Proving a negative
This summer, I received an e-mail from a big tech firm, which I worked with quite often as a keynote speaker, asking me to prove that I have never uploaded any of their information - PowerPoints, Word documents, e-mails, etc. - into an LLM (large language model, like the ones used by ChatGPT and Bard). How can you possibly prove something like that?
But that’s probably just the beginning. I think we may need to brace ourselves for a very messy compliance situation, where we might need to show that we are not using somebody else's data to train our large language model. We will need to be very mindful of what goes in and what comes out.
领英推荐
So this is where I think the world of content science will be playing a crucial role.
Starving for knowledge
One of my favorite quotes comes from the late John Naisbitt who used to be a popular author and public speaker in the area of futures studies: “We are drowning in information, but we are starved for knowledge”. He said that in the 1980’s, way before the digital revolution. But his observation still resonates today. Individuals already have a ton of information on all of their devices, but think of how much companies have on their SharePoints, OneDrives, Google drives, e-mail servers, Slack channels etc.
Today, every company of some size and significance, has a data science department or at least one or more data scientists. The bank, at which I'm a board member, features a department of 200 data scientists, for instance, who clean up and catalogue all of its structured information. Over the years, data science has become widely established, but I think we're now seeing the birth of a new field, which will less focus on that neatly structured data, but more on the messy, unstructured data that is so abundant in companies: Word documents, PowerPoints, PDFs, emails.
That’s huge, if you realize that merely 20% of company data is neatly structured in a database while all the rest is unstructured.
You’re grounded!
Data science triggered the rise of a new type of technology players, offering data governance tools which help orchestrate all the databases in a company: from quality control to GDPR proofing. Now we will need content governance players. Companies will need to figure out how to orchestrate all their content sources: if and how they're going to use them to train these large language models, and how they're going to combine that with the ChatGPTs of this world.
The technical term for that is “grounding”, which is where you take an LLM like ChatGPT, but you connect it - ground it - with your own personal data, your own intellectual set of content. And to be able to manage and control that is going to be crucial in the world of content science.
A tale of two predictions
That’s why I predict two things.
Where companies now have a data science department and data scientists, they will also need to hire content scientists and build content science departments in the future. These are not database nerds, but a whole new breed: “Conan the Librarians” who love to work with unstructured data in strategic ways. So that’s one prediction.
And the second one is that we will need to implement content governance tools
So be prepared to hear and think a lot about content science and content governance in the coming years. If you already are active in that department - generative AI for enterprises
Want to inspire your employees or customers with a keynote about what's next in business and technology? Check out the topics on my keynote page.
I help new trainers learn from my mistakes
1 年Very interesting and thought provoking
Group Chief Information Officer at Credendo
1 年Anna Z.
Some remarks about the Generative AI hype: LLMs are probabilistic not deterministic. They are trained on selected huge datasets using a bias, who controls the bias controls the narrative. If we are using Generative AI to generate content...and LLMs models are trained on content expect content degradation over time. Generative AI is not AGI and it probably will not lead to AGI. LLMs are limited to the data they are trained upon, hence they don't know your context/data. Look at RAGs for that. For a more complete view on the coming wave of technologies (and containment and regulation) I recommend "The Coming Wave" by Mustafa Suleyman (Google Deepmind) about what's coming and what impact it will have on society.
General Manager Blossom - Simplifying EV charging
1 年Laura Leyssens Kate Engels Gil Nimmegeers
Create??Publish???Amplify?? TechInfluencer, Analyst, Content Creator w/600K Social Media followers, Deep Expertise in Enterprise ?? Cloud ??5G ??AI ??Telecom ?? CX ?? Cyber ?? DigitalHealth. TwitterX @evankirstel
1 年Content science is definitely an intriguing concept! Would love to have you on my podcast to discuss this further!