3 Tips to Tame the Big Data Beast
It will come raw, naked and dirty. But before you clean, clothe and tame this beast, It would be a good idea to assess its value in domesticating it.
Big Data Junk Yard strategy lets you play around with big data in its natural form. The approach allows enough runway for technology to ramp up infrastructure and for business to find right use cases through data discovery. It’s easy, economical and quicker way to get on the big data band wagon.(See Big Data: Junk Yard to Gold Mine).
With that said, there is a need for some method to madness though. If you want your Big Data Junk Yard to be productive, some discipline is required to nurture and grow it so that you can start mining valuable gold soon enough. Here are the three recommended basic steps:
An Invisible Fence
It is important to draw a line in the sand. In my last blog, I strongly advocated for not investing too much time thinking about the use cases. However, that does not imply not investing any time at all and start hoarding every data set that’s available.
Aligning strategic business goals with the futuristic data needs will help defining the boundaries. This is also a great opportunity to engage business in the big data initiative. Start with a laundry list of data sources with some perceived value to business in next 3-5 years. Keep the list alive and agile but also use it as a guideline for your hoarding strategy. This will help you manage the mix and keep your junk yard from getting out of control.
Know your Junk
Keeping it organized and tagged will help save a tremendous amount of time and efforts for data scientists looking for those gold nuggets in the data junk yard. Keep all Chevys in one corner and Hondas in another. It will make that car mechanic scavenging for 1999 Chevy parts really happy. Start with a comprehensive tag list to mark your data and schemas. There are multiple technology options that you can find in the Hadoop eco-system that can help keeping the yard in order.
Ingest. Digest. Divest
Do not forget to spring clean. In order to enjoy the freedom of bringing home all new toys, we need to let go some old stuff that we know we will never play with again for sure. Hadoop may be a cheaper alternative to our traditional darling databases but it is still a finite resource.
It is recommended to form a joint governance group of business and technology stakeholders. The group can review the data discovery findings on a regular basis and can help align the big data strategy on what to ingest, what to digest and what to divest. This will be the first step of start refining the junk to get some gold.
Coming up next: How to sell Big Data Junk Yard Strategy to Business.
SAP Finance & Analytics Excellence Leader, Global SAP Center of Excellence, Business Transformation & Digital Technology at Cummins, Inc
9 年Thank you for this latest in the series, Ashu. I agree with the idea of keeping like things in like places. And the concept of having an active laundry list of data sources in line with strategic goals certainly can be a barometer for the data sources to simply leave out of the junk yard all together. Instilling a joint governance group is also appealing. Along with this, I would emphasize a mindset of stewardship for all teammates within the organization (caring for data on behalf of all who benefit from it) as well as designated data stewards in specific business areas. Engagement at the executive level is imperative for any governance initiative to have true life and teeth. Foster an understanding of the integrated nature of our data as well as ensuring standardization drives toward supporting integration. Just one other thought that I had while reading this latest offering revolved around the selection of a single high impact, low effort nugget of gold to prove the worth of the junkyard. Looking forward to your thoughts on that as previewed by the title of your next post.
Director Enterprise Data & Analytics @ ATI specializing in data strategy & delivery
9 年Interesting perspective Ashu. I like the reserved approach early on yet still focusing on lineage, classification, context and standardisation/harmonization. More often than not the right amount of effort and patience is not given to the establishment of a business driven data governance program so i really appreciate you calling that out. Big Data and MDM are so similar when it comes to these foundational needs. Data is an asset and most will argue it's the most important corporate asset. As we build enterprise integrated solutions like MDM or a Hadoop data lake the success and ROI will come through the business and we should always integrate new capabilities into our existing data governance policies, procedures and consistent stewardship practices. One consistent place to manage, maintain and enrich master data, reference data or our enterprise standard Xmap. A data steward can map Race/ Ethnicity/Gender codes once and use it anywhere including Hana & Hadoop. Thanks for sharing!