A Teaching Dilemma and The Importance of Having The Right Data and Information
Stephen Cribbett
I'm a passionate leader, trusted advisor to small business founders and leaders, growth and marketing specialist and strategist leveraging insight, commercial know-how and creative flair.
I spent most of this week thinking over and worrying about something that most parents will understand - the quality of teaching at my daughter's next school. I've got to choose where she goes next, and it's no easy decision.
A child's experience of school is loosely defined by the friends they make, the curiosity they have, their ability to learn and absorb information, and how their teachers 'teach'.
Reflecting on the latter, there are teachers who are very intelligent, but can't teach. Then there are those that have a wonderful and very natural way of engaging chidlren and holding their attention, but not feeding them the right information. And of course, there are lots in the middle.
What kept me awake this week was the thought that the quality of what is put into my child's head might be incorrect, or worse still, be passed on with an unconscious bias.
My brain then joined this dot with another. The AI dot.
AI is processing a lot of biased data right now, and the negative impact it has on our learning and education is huge.
ChatGPT, as just one example of a publicly available AI, now has an astonishing 180 million users and around 1.6 billion visits per month. Like my children, these large language models (LLMs) need teaching.
There's an explosion of AI-generated content all over the web right now. Worringly, this means there's a huge amount of misinformation, fabricated nonsense and social negative stereotypes, all coming from an un-regulated source.
Let me explain.
For LLMs to work, they must be fed vast datasets. On thece this done, AI model captures a given phenomena such as human language or the visual world in a way that is as close as possible to the real phenomena.
The bigger the volume of datasets that the model consumes, the more representative its output becomes. Right now, the race is on to quickly grow the volume of these datasets and the power of the models. As an example, LLMs are now well into the trillion-parameter machine learning models, which means they need datasets well into the billions to work.
One of the only places to source datasets of this size is......you guessed it, the web.
This is where the issue (that's keeping me awake) lies.
The web is being used as a proxy for truth and reality. It is representing the social world.
Search scientific journals and research papers and you'll quickly see that online datasets are poor in quality. They flourish negative steroptypes and, as we've all heard in the news too often, contain hate speech and slurs directed at under-represented and marginalised groups.
What goes on the web and on social media isn't a true representation of society and of how we relate to other people. It's a country mile off!
领英推荐
The wrong social stereotypes are being extracted and amplified, leading to the models producing outputs that are racist, sexist and discrimnatory towards marginalised groups and cultures.
In simple terms, rubbish in, rubbish out!
What's going 'in' is what's now referred to as synthetic data.
The synthetic data generation market has been valued at US$288.5 million in 2022, and is projected to grow to US$2,339.8 million by 2030.
That's staggering growth and something that quantifies the size of my concerns.
The sheer volume of data that is being fed into these models means that it's hard to extract an accurate on its providence. It is likely, however, that today's synthetic data will be used as input to train the next generation of AI models.
Inequalities will be heightened. We will get stuck in a recursive loop that relies on contaminated data. It's worrying!
And where are all the AI tools being used right now to greatest effect?
Medicine, education, law!
The very fabric of society.
It's toxic.
What can you do about it?
Know the providence of your data. Know your data. Slow things down, ask the question, ask for the evidence. Stop, think.
Long before ChatGPT was a thing (in the public domain), the research industry has been grappling with data quality. The dark art of sampling and panels has never truly been unpicked. The problem of authenticity, professional survey participation and highly-monetised panel markets falls into insignificance in comparison, but both need to be addressed, now.
We need education and we need to know that the information we are being fed is the ground truth.
Hopefully I'll make the right decision for my children.
https://www.research-live.com/article/news/a-question-of-bias/id/5122342?_zs=LRt8H1&_zl=EUOM7 Just seen this today.....
Authority on lived experiences and unmet needs among seldom-heard groups in the UK. Helping leaders identify and prioritise opportunities to reduce inequalities and deliver positive social impacts. MRS Main Board member.
9 个月It’s a scary thought how AI and algorithms reproduce existing biases of all types: gender and race are especially clear, partly because they are, by and large, visible. All this old data feeds into not just education but also mortgage approvals, insurance premiums, business loans, policing, cybersecurity, health provisions, and so on. That’s one of the reasons why it remains so essential to counter this data usage with excellent insight from speaking to ‘real’ diverse people, to understand what is really going on and what might make lives better - and business better, too.