Life as a Data Scientist: Data Literacy, Data Contextualization, and ChatGPT

When people hear the phrase data scientist, a picture emerges of someone who works with data in a very mechanical way.?They gather data, clean the data, and most importantly create models that allow their clients to answer key questions.?And that is all true.?However, I would argue that skills sets centered around data literacy and data contextualization are as important as any of the other skills data scientists possess even though most folks won’t immediately think of them as critical.?Data literacy is the ability to understand and work with data in an accurate way.?Often people include the ability to put data in the proper context in that definition but I like to pull it out as data contextualization.?Data contextualization is to provide the proper context for data so as to enhance how one properly understands what the data means and the implications of it.

Indulge me for a minute or a several.?I am going to take a walk down memory lane and then talk about two instances I encountered in the last day before getting to what I think most people will be interested in, the linkage to ChatGPT.?

I moved over to London in November of 2003 to take a role as the head of the research methodologies at Harris Interactive Europe.?My first day in the office I was really tired as I had flown overnight to get there.?I already knew some folks in the office but my first day was spent getting to know others.?I was particularly interested in the projects people were working on.?One group was very excited to tell me about their work, groundbreaking research of the blind for the main society for the blind in the UK. ?In particular, the research had discovered some counter-intuitive things, which is always exciting in this field as it is something new to say. I was intrigued and wanted to know more.?As we talked, it came out that they conducted the research online.?In 2003, there were hardly any tools for the blind to help them navigate the Internet.?Things are much better now.?I serious stood there repeating “you did a survey of the blind online and are surprised you got counter intuitive results?”?I wasn’t sure I was hearing it correctly in my haze from being up all night flying over.?I finally said, the research is invalid, we need to redo it.?Turns out they were presenting it the next day to the board of this society.?Here I was this young American effectively calling their baby, that they were very proud of, ugly.?They disregarded my advice to cancel the presentation and figure out what to do.?What did I know??They gave the presentation to the board and the technical term is “they got their asses handed to them.”?Furthermore they made their client look really bad in front of the board of their workplace.??Now I can say that the folks that did this are genuinely good researchers.?I worked well with them in my tenure in Europe and even afterwards.?But they got caught up in group think.?They had numbers on a page and it told an interesting story.?That drove them forward without thinking about whether or not they made sense and whether the data was generated in a reasonable way.

Fast forward to yesterday.?I had flown cross country the previous night and only got three and a half hours sleep.?Most of my stories where I am less diplomatic than I should be usually start with me being tired.?In one of our products the distribution of employment status had changed dramatically over the last few months.?As we measure who is buying what, having a lot more unemployed people in the sample whose buying patterns are different than those who are employed significantly shifts the data.?If it is real then it is a great measure of what’s happening but if it isn’t then it is hugely problematic.?The first response from the supplier, not somewhere I have worked, is it is a mirror of what is happening in the economy.?This response irks the hell out of me.?When I am tired and irked I can be a bit of a jackass.?The economy is complex and what we are going through right now is really strange as parts of the economy are bad (inflation for example) and parts had been great, at least for the last few month, and one of those areas it had been great was in employment, despite some news worthy ?layoffs.?I expect this month may be different but a 20 percentage point increase in the number of unemployed in the data when the numbers are going the opposite direction isn’t acceptable.?It is more than double the percentage increase that happened in the great depression.?So trying to play it off as if it is natural is insane to me.?After I pushed back hard, the supplier came back and said they checked everything else and they couldn’t find anything wrong.?I, again inelegantly, put it forward that if they couldn’t find anything wrong then their panel was basically worthless as it isn’t producing accurate data.?That quieted the room quickly.?And it was way harsher than I intended but it is accurate.?Needless to say they are looking further into the issue.?Some of my colleagues jumped to the suppliers defense with the argument that they really looked hard to find something wrong.?I am sure they did.?My colleagues were judging the firm on effort not results.?They evaluated the process and couldn’t find anything wrong but there clearly is something wrong.?They need to understand the data in context.?They need to understand the generating mechanism.?They need to figure it out.?We were able to come up with a short term solution for the back data and the data going forward but it is short term.?Not being able to figure it out calls into question the validity of their data as we don’t know what we don’t know.

I swear I am getting to ChatGPT but one more story.?Last night, I was looking through LinkedIn before heading to bed.?I came across a press release on Veganuary done by a firm I used to work at.?When I was there they would run press releases through me for the sniff test.?I can tell you I would have told them to ditch the press release.?First of all, strangely enough vegetarianism is a topic I know a lot about.?My first media event was talking about some research on vegetarianism on CNN when the person who normally did it, was unavailable.?The big item in the poll was that close to 50% of people who call themselves vegetarians eat either fish or chicken.?We checked that stat out 80 different ways before going forward with it on the air.?I found that interesting so I have followed the space ever since.?Now Veganuary had an official incidence rate globally of less than 0.01% in 2022.?The claim in the press release is that 55% of adults in the UK were considering it in 2023, basically unchanged from 2022 when it was 54%.?For something that has an incidence rate of 0.01%, this is an utterly ridiculous assertion.?I pointed this out.?The person who put out the release doubled down, asserting that the difference was that the > 0.01% number was those that officially signed up.?Those that actually did it could be much more.?That was a true statement but it really could not be off by a factor of 5000 which in effect is what they were saying. I then went contextual.?The rates of those who are vegetarian and / or are considering vegetarianism is rising over time, there is no doubt about that.?In order to consider a Vegan January, you have to be open to consider being vegetarian for the month at the very least.?That figure in the UK is less than 25% even now.?The folks willing to make the jump to veganism is just a fraction of those who consider themselves to be vegetarian. ??So to claim that 1 out of every 2 adults in the UK is considering doing a Vegan January does not pass the sniff test.?Having numbers on a page, generated through a typically reliable process does not guarantee that they make sense.?The big lesson is that I really need to get more sleep.

Now I am linking this to ChatGPT.?If you are in the research business, or any other industry for that matter, you have probably started to have conversations about the impact of ChatGPT and other forms of AI to your business.?AI has been around for a while and most of us are already using flavors of AI in our business.?But the recent news about ChatGPT seems in many ways to be a significant step forward, although some will argue with that.?Beyond using it to generate rap songs about your business school or generate a love story between a rock and possum, both real, the big question is does this replace work currently being done by people in the research industry.?The sad answer is we don’t know yet.?First of all right now we haven’t adequately had time to evaluate it yet.?We can generate fun things while playing with it but the extent of its usage to truly generate insights from data in a way that is unbiased is unknown.?What I am going to say is you are going to see a lot of people saying you can’t replace the work people are doing in this area with AI.?Take those sentiments with a grain of salt.?For certain things, you probably can.?Monthly reporting on the same data for example would be a prime case that could probably happen very soon.?Furthermore, quantitative models are being more and more designed to be solved via network methodologies.?Guess what is the backbone of the text capabilities of most of the AI solutions are these days? ?It is conceptual that AI could do the work typically done by data scientists but we aren’t there yet. This isn’t to say the sky is falling by any means but we do need to know more.

One interesting take on all of this is by the head of AI of Google.???He asserts that there is really nothing new here.?Most of the major tech companies have the same level of AI capabilities as ChatGPT and OpenAI but it is just packaged better is his sentiment.?This take is meaningful if it is true.?We know that in the field of AI, things often look great initially but the system can be knocked off kilter by certain elements making their way into the training set.?Google had to take down their latest effort as various inputs started to result in potentially harmful information being passed on to the public.?There are reported cases where folks have purposefully gotten ChatGPT to produce bad information.?Whether it is native or not, the program in its current form can be manipulated.?We have now come full circle back to the need for data literacy and data contextualization.

As AI programs start to be used more and more often, my sense is that something generated by an AI program as opposed to a human may be seen as more credible.?Humans are fallible and biased by our very natures.?A computer is not emotional and if it is deemed intelligent, why not believe it? “I, for one, welcome our new robot overlords.”?But here’s the thing if we are not able to critically judge the output and correct when we run afoul, we are potentially entering a very dangerous period.?

Data literacy and data contextualization have always been important skills.?It enables us to create real insight and provide value.?It is more important now that we take leadership roles in using our skills to establish what reality is and call out when data is used inappropriately.?Oh and we have learned that I need more sleep.

Terrance Kent

Board level mentor/strategic advisor

2 年

Good read and yes, you need more sleep :-)

回复
Kathi Love

CEO || Independent Board Director || Executive Leadership Coach

2 年

Great essay. Thank you.

回复
Jonathan Siegel

Partner at Meliora Research

2 年

Nice. I would hope that some of the experiences with facial recognition and other tools would make people pretty cautious about assuming any AI application is unbiased just because it isn't human. AI can probably be used now, or shortly, to do a lot of stuff researchers do but that doesn't mean it will be adopted quickly. So I can hope that at my age (67) I can hang in there before it becomes a huge force or, even if it retires me, that I won't care. :-)

要查看或添加评论,请登录

John Bremer的更多文章

  • 2023 Off Year Elections

    2023 Off Year Elections

    I try to keep out of politics on my LinkedIn page, preferring to keep to quant stuff. But as you guys might know, I…

  • My Grandfather, Jamie Dimon, and Risk

    My Grandfather, Jamie Dimon, and Risk

    For each of the last ten years before my grandfather passed, my dad and I would have a conversation where my dad sadly…

    3 条评论
  • It is Time for a Revolution (Again)

    It is Time for a Revolution (Again)

    At the CASRO Research Conference sometime around 2010, I was giving a talk on the topic du jour. I don’t remember the…

    8 条评论
  • What's Up With These Forecasts?

    What's Up With These Forecasts?

    January’s Hiring Boom Caught Economists by Surprise. Why Forecasts Often Miss the Mark - WSJ The jobs report on 2/3/23…

    1 条评论
  • Advice from an Old Data Scientist

    Advice from an Old Data Scientist

    Lessons for Data Scientists (Particularly Those Just Coming Up in the Ranks) 1. The term “data science” is similar to…

    22 条评论

社区洞察

其他会员也浏览了