Structured Data - Unstructured Data - Voice - Image - Video

Over the past 10-15 years, if your work involved serious data related tasks, you may have observed the advent of much more dense, multi-dimensional, high velocity and in general more complex data sources. (Cross tabs - centralized databases - distributed data - free form text - hand written notes - voice file - images - video...)

What is interesting to me, however, is that how the ease of handling complexity in data has actually increased at the same time as complexity has shot up exponentially.

There were not too many options to handle data files while programming in low level languages or too many libraries to use while doing SQL. 10 years ago, text files had to be taken through a step-by-step process of TF-IDF, dimensionality reduction and what-not. Scanned documents containing hand-written notes to be interpreted at an industrial scale - you better have a PhD in OCR.

Fast forward to now: a large number of free or paid API's, open source libraries and tools are available to generate basic insights, metrics and convert your input data into a format that your analytic platform can handle. Key qualification is your ability to quickly learn how to use all that's available freely. Of course, usual exceptions apply (real business value generation needs custom prep of the data, etc.)

I guess the "arms-race" going on in the Big Data, Machine Learning, Analytics or whatever you call it space (between the likes of G, FB, Amz, IBM, etc.) is responsible for a large part of this change. The gap between leaders and laggards in this space is huge and accelerating. What does that mean for community of analytics professionals outside of these companies? Let me hold that thought for another post.




要查看或添加评论,请登录

社区洞察

其他会员也浏览了