When Data Isn't Beautiful and Simple Anymore

When Data Isn't Beautiful and Simple Anymore

What will you do if your dependent variable say (mortality rate) has data points that are missing completely at random (MCAR)?, What will you do if the same variable is an influential outlier?

What do you do if the same variable is not normally distributed? do you normalize it? Are there machine learning algorithms that handle this automatically, like linear regression? The answers might seem clear-cut, but diving into these complexities reveals otherwise.

Over the years as I have worked with data, I realized data is more than what meets the eye. from a dataset of sample size 10 to a dataset of sample size > 100,000,000. data is not that simple, neither is it that hard. Depends on the context really.

Recently, while working on CA at CCT College Dublin , I encountered a dataset of less than 2000 observations. One might assume it'd be a breeze to work on, but it posed unforeseen challenges. Unrelated thought, is it just me, or everyone else thinks this to "People don't really appreciate people handling data in any capacity the way they should be appreciated". Anyways.

Mere adherence to general data principles or proficiency in analytics, machine learning, statistics, or programming doesn't cut it. The game-changer? Critical thinking. Yes, it's the key differentiator.

Critical thinking is deciding if the data is supervised or not. Critical thinking is deciding if your data requires transformation or not. Critical thinking is deciding from over 23 methods of handling outliers which one best suits your data. Critical thinking is deciding to use the keep method, meaning keeping outliers despite knowing they affect your computations.

Critical thinking is deciding whether your data is MAR, MCAR, etc, critical thinking is deciding whether your dependent variable that is not normally distributed should be normalized or not.

In my view, critical thinking stands as the pinnacle skill in data handling. Even when you know all the data techniques without critical thinking, the data is simply just raw facts. We therefore need to build upon it. We do so simply by reading and more reading.

Now that we understand how important critical thinking is, here are some pointers to help you improve your critical thinking skills aside from just saying read more.

1. Once you have your data, before doing anything, understand the context of your data. You can only understand the context by reading more about what others have done. This will improve your critical thinking skills. You will see what problems to anticipate, and what successes to expect. In simple terms research. There is nothing new under the sun, truly. Whatever thought you have, trust me someone had it before. Understanding your context makes your data come alive.

2. Understand your objectives and goals with the data. What do you want to do with the data? This is critical thinking. It is only from understanding and internalizing the questions you ask concerning your data can you really start working with data.

3. Take your time. Data preparation is the most expensive component in any business. So take your time at this stage. Preparing your data now involves EDA and data preprocessing. It's not enough to just follow the known steps but to apply critical thinking. Justify why these and not that.

As I conclude, It's not about memorizing techniques but understanding the 'why' behind each step.

Therefore, sharpen your critical thinking skills. Dive into reading—immerse yourself in a sea of knowledge.

If you forget everything else, just remember critical thinking is the skill that elevates data from mere facts to actionable intelligence. Without it, data remains unexplored potential.

要查看或添加评论,请登录

Diana Namaemba的更多文章

  • Building Smarter: AI & Data Analytics in Construction

    Building Smarter: AI & Data Analytics in Construction

    How can AI and data help construction projects run smoother, faster, and more efficiently? In this series, I’ll break…

    1 条评论
  • AI IN ONCOLOGY

    AI IN ONCOLOGY

    AI IN ONCOLOGY According to Wikipedia, Artificial Intelligence (AI) refers to the intelligence demonstrated by…

    1 条评论
  • Biostatistics Series

    Biostatistics Series

    Biostatistics Series: Analyzing patient data (Oncology Data) I am thrilled to announce the launch of my new…

社区洞察

其他会员也浏览了