Debunking Data Myths

Debunking Data Myths

Data can be intimidating due to its complexity and vast amount of applications. This can make it easy for data myths to be perpetuated. In celebration of Love Data Week 2023, we’d like to shine some light on some common myths that are often assumed about data science and analytics.

You have to have a programming or mathematical background.

While yes, data science involves a fair share of statistics, share of statistics, you can learn how to apply data science principles to your own field of expertise. Data science is most powerful when you partner subject matter expertise with a strong foundation in analytic methods.

Data scientists and data analysts perform the same tasks.

The boundary between these two fields can be difficult to discern. Both fields involve the detailed study of data to extract meaningful and actionable insights for the business. In general, data analysts focus on pulling together existing data and research findings, visualizing that data to tell a story or drive action. Data scientists generally work to derive complex insights from past patterns that could drive future insights for the business.

The more data you have, the more accurate your model will be.

While this myth is partially right, it is also partially wrong. If you put garbage in, you will get garbage out. Data that is poorly processed will have poor accuracy as well. Only a greater quantity of quality data will increase your model’s accuracy.

By adding more/new data to a model, you implicitly assume that this additional data is somehow related to the outcome you are trying to predict. The new data will only add meaningful predictive power if there is a relationship between your new data and your outcome.

It’s all machine learning

Although building and deploying models is a key feature of the role of a data scientist, real-life data is rarely available in a clean, processed form. Much effort goes into data processing to ensure that it meets quality standards and can be useful throughout the model-building process. Our team at Ascend uses the process model CRISP-DM (Cross Industry Standard Process for Data Mining). CRISP-DM is the most widely-used analytics model and consists of 6 steps: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

Only about 20% of a data scientist’s time is spent modeling. The rest of the time is spent working to understand the business needs, understanding the data, preparing the data, and deploying the modeling results. Many steps need to happen before building those models, from collection to processing to visualization to analysis. Only after all these steps are complete can you build a successful and useful model.

No alt text provided for this image
"Shearer, C. 2000, "The CRISP-DM Model: The New Blueprint for Data Mining", Journal of Data Warehousing vol.5, no.4 Fall, pp.13."

There is one code solution to a problem.

Problems can be attacked from many directions, and there is often no one clear answer to a question. Having a team with diverse perspectives, experiences, and ideas is invaluable to a data science team. One can come to the same conclusion, numbers, stats, etc. by writing code in 5 or 10 different ways.

We hope that by debunking these myths we can create a better understanding for those interested in entering the world of data science and promote knowledge within our community. Happy Love Data Week!

#lovedata23 #mythbusted #datachange #datascience #ascendinnovations

要查看或添加评论,请登录

Ascend Innovations的更多文章

社区洞察

其他会员也浏览了