The Worst Survey Question in the World

The Worst Survey Question in the World


The worst survey question in the world

This is a question that I saw recently in a report on using AI in data governance tasks. This might seem hyperbolic, but I genuinely loathe it. The intent was good, but I refuse to believe anyone can glean any shred of useful information out of this mess.

First, it’s important to understand what is the point of asking this question. Based on the conclusions drawn, it seems like they want to know how to free up the data team’s time by automating the activities on which they spend the most time. I think that this is flawed on the face of it, since automation can’t be implemented for broad genres of activities but for specific use cases in those activities. As an example (outside of the data sphere), take house cleaning. If I spend too much time cleaning, can I automate my tasks? I can easily get a robot which will automate my daily vacuuming. It might even have sensors that would prevent it from running into shoes or toys or other objects left on the floor. To my everlasting disappointment though, it is a lot more difficult to get a robot that will pick up those objects and put them away. The same may be said for data cleansing; some parts of it are easily automated, others are so difficult as to be practically impossible.

My robot, vacuuming.

But for the sake of argument, let's say that the question was asked with the specification that the only activities considered were those that would be easy to automate. The data team is entirely college interns, and as I previously mentioned, ChatGPT is basically a decently skilled college intern. Even in this situation, having the question be multiselect negates any meaning the results can have. It’s nonadditive. Consider the following situation: We all want to spend less time on chores. I spend the majority of my time on vacuuming; you spend the majority of your time on dishes and laundry, and our roommate spends the majority of her time on laundry and mopping. According to this survey that would mean that 66% of people spend the majority of time laundry. This is clearly wrong. The very fact that you spend a majority of your time on two activities necessarily means that you can’t spend the majority of your time on either activity individually.

The one exception to the tautology above is when two activities are not both individual activities. If I spend a majority of my time vacuuming, I can also say that I spend a majority of my time cleaning. This overlap in terms essentially renders the question useless. What do you mean when you say that you spend the majority of your time doing data governance? Is it different than what you mean when you say data validation or data consolidation? Say I get a ticket from a business user that asks about a potential duplicate record. I investigate where each record came from and discover that the discrepancy was due to two separate source systems sending data in a slightly inconsistent way. I combine both records and run some checks on the database to see if there were any other discrepancies of a similar nature. I then write up my actions and suggest some policy that might be implemented to prevent this situation from occurring in the future. This one action might then be data validation, data consolidation, data lineage, responding to a ticket from business users, and data governance. I can’t imagine that saying in that situation that I spend the majority of my time on those five activities would lead to any valid insights.

A photo of me providing data governance support

How could this question be made better? The purpose of the question is valid. With the recent democratization of AI and subsequent push for AI automated solutions, it is important to know what sort of activities might be good candidates for automation. But I don’t think that this question can be asked in this manner. As I pointed out above, there are significant issues with equating ‘majority of one’s time’ to ‘should be automated’. The simple solution would be to ask people, “What tasks does your data team do that you think would be beneficial to have automated?”. The better solution would be to conduct real interviews with data leaders and to truly understand what the data teams’ tasks are, what sort of difficulties they face, and what sort of tedium they endure. These interviews can then be categorized by someone who understands the capabilities of AI automation and how that might fit into the business use cases.

要查看或添加评论,请登录

Leah Schneider的更多文章

  • Can AI Reason?

    Can AI Reason?

    Can AI reason? You may have seen several recent articles about an Apple study that questioned whether or not LLMs can…

  • Asking For Context

    Asking For Context

    Here’s a question: A man and a goat are on one side of the river. They have a boat.

  • AI Bias and Bad Prompts

    AI Bias and Bad Prompts

    A major cause of bad responses in AI is the fact that we don’t know how to ask for what we want. Sometimes we don’t…

  • Bar Chart Axis Musings

    Bar Chart Axis Musings

    Researching data best practices, I’ve heard it said that you should always start a bar graph y-axis at 0 because not…

    2 条评论
  • Why Do We Keep Proposing a ‘Data Driven Culture’?

    Why Do We Keep Proposing a ‘Data Driven Culture’?

    What is a data driven culture? I hear all the time that data is important, but is anyone doubting that point anymore? I…

    1 条评论

社区洞察

其他会员也浏览了