What's cooking?
Ashish Mohan Jha
Co-Founder & COO at Impactsure Technologies Private Limited | Product Management & Marketing | Enterprise Software | Startup
A good meal definitely requires fine ingredients. But it is not enough. A proficient chef knows how to clean, cut and mix to prepare multiple delicacies. In analytics also once data is available from multiple sources, data preparation is key for any meaningful insights.
I have found the simplest transformation of data type to be the most important. A numeric data can be quantity, amount, percentage, duration, postal code, longitude/latitude or ID field. Date is another tricky one given different formats being used and available in multiple data sources. Any kind of data aggregation and advance insights is subject to data type being maintained correctly.
Data coming from multiple sources may have redundant or concatenated columns. In order to match other sources and drive desired output, data columns need to be split, combined or transposed. Sometimes new column needs to be added from custom formula from existing data.
Do you really need all the data for your specific analytics? I am personally ruthless in filtering out redundant rows and removing unwanted columns. This makes data size more manageable and your canvas cleaner.
My smart friends in the database team can make your life tough by using cryptic column and field names. In order to make analytics more user friendly, renaming of columns is essential so those novice individuals can also understand it. Also giving alternate text can help in AI-driven Insights later.
“You can use an eraser on the drafting board or a sledgehammer on the construction site”.
Is this saying also applicable in analytics? I would say, definitely yes. Data preparation is key and may take upto 60% of analytics service effort. Yet, fixing the issue at drafting board in data preparation would save significant effort post go-live.
Please check my article on data source here and watch out this space for more.
Let me know what you think.
Product Leader | SAAS | Enterprise Apps, Fintech, Supply Chain
4 年Great Article Ashish. ‘Infected’ data can lead to biased business insights and decision making. Data hygiene like personal hygiene is essential.
Senior Vice President at Citi - Risk Management
4 年I can very well relate to this. The better data preparation definitely led to development of better quality of statistical model.
Technology Consulting - Digital Supply Chain
4 年Very informative and well written!