Problems with data
Madhur Modi
Providing free career advice| 30k connections | Spearheading Data Science and Digital Initiatives | Business Analytics, IIM C, IIT Kgp, ISI K
Data is central to almost every kind of problem and is very important. Your model is as good as the data it is built upon! The data collected can require a lot of things from data cleaning, munging, outlier handling, feature engineering and exploratory analysis. Even after that, the same data can be looked upon and analyzed from different perspectives. Among the various data processing problems, two of the most common problems I have faced are missing values and correlation but not causation.
Missing information/missing values - I heard this story of pedestrian traffic pattern study in Amsterdam. It was found that passing through a particular spot was unnecessarily delaying the tourists. As it turned out, a dustbin was kept there and people were walking up to the spot to throw garbage. The map seemingly did not capture the presence of a dustbin at the spot. Imagine how missing information such as this could lead to wrong conclusions.
Correlation without causation - It so happens that murder rates are supposedly higher in summers than in winters. We know that summer heat will mean more ice cream consumption. Now this could also mean that someone finds a high correlation between murder rate and ice cream consumption. It will be obviously wrong to assume here that ice cream causes murders or conversely, murders cause ice cream. They both just happen to follow the same seasonal pattern. Correlation may imply causation but does not necessarily mean causation.
One may have other problems which are more common for the projects one has worked upon. Feel free to share your views!
∞ | Data Science Consultant with expertise in AI/ML and Analytics
5 年Thanks for sharing Madhur. It was really informative.
CEO, WIPPA
7 年Rightly said Madhur. Also we must understand the assumptions underlying the data and the purpose for which we are analysing the data.
Training / Counselor / Industrial Engineering / Software Developer / Life Planner and General Insurance Proposer
8 年Madhur Modi Very nicely titled "problems with data'. Both your examples are quite interesting. What is missing in 'data analytics' is our ability to come up with observations which are totally new and could not be reasoned. Like 'murder and ice creams' revealed by data analytics. With the present technology, such like of analytics will reveal interesting inferences. Next step is our ability to infer and take appropriate corrective steps. Let the analytics lead us possible correctives steps with a % confidence level. Else we only research the problems with excessive data. Data is data as long as they are authentic and secured. Rest is in our ability to infer by using those data. Thanks for the post and regards
Principal @ Together Fund | AI Investor | x-AWS | DMs: superdm.me/177pc
8 年Wonderful insights!
Senior Manager- Efficio Consulting | Ex Accenture Strategy, EY| IIM Kozhikode
8 年Hey Madhur, those were pretty good example.