My data is bigger than your data or My data is smarter than your data
Pratyusha Chakraborty
PhD Scholar - Data Science, Data Analyst, Business Analyst, ML, AI, Tech blogger, Clean Tech, Sustainability
Everybody wants to be a data scientist. Vast learning resources are available on the internet. Every good thing comes with a side effect. I am not surprised when I meet all those data practitioners claimed themselves as a data scientist. Struggling with basic research methodology or probability concept. But that is a topic for another day. Today we will discuss what you should know and what are the tools you might need to use to learn data science or analytics at the beginning. In the end, you will also find some recommendations for tools to set up your Lab.
Why do your data need to be smart?
Big Data is not magic. You cannot expect to throw everything into a pot and hope whatever comes out to grow your business. In working with data, the mantra “the more the better” is an erroneous doctrine. Despite having the fanciest algorithm, you feed it trash and it eventually gives you trash. Choosing an inappropriate data strategy can cost your company a fortune, or even bankrupt the whole operation.
Every once in a while, Big Data is regarded as a technology buzzword. I’ve seen companies claim they are using Big Data to change the world while they can’t even change the root permission of their MySQL database. I’ve exaggerated on the database part, but I am trying to make my point.
Big Data is famous for its five Vs: volume, velocity, variety, veracity, and value. The final V is occasionally underrated and neglected, but de facto is the most critical element of all data systems. Data value doesn’t come after its volume or velocity in the long run but it decides the approach towards the rest. In case your data doesn’t bring any value to your business, do you still bother collecting it?
You are proud of having hundreds of terabytes of data but you let them root in your data center’s racks. You insist upon hiring the most skilled data team but all you let them do is cleaning and renaming JSON files. That counterintuitive and not-at-all-environment-friendly approach will hold you back from becoming a data-driven business. But...
Contentedly saying “My data is bigger than your data” might make you feel in the same league with other Big Data gurus, but here another saying for you to hone: “My data is smarter than your data”.
These are two different definitions of Smart Data I’ve encountered on the Internet
“Smart data is digital information that is formatted so it can be acted upon at the collection point before being sent to a downstream analytics platform for further data consolidation and analytics”
“Smart Data” means information that actually makes sense. It is the difference between seeing a long list of numbers referring to weekly sales vs. identifying the peaks and troughs in sales volume over time.
The former even states that smart data is indeed coming from smart devices, huh? As a data engineer, it is a far cry from the way I think about smart data. The latter seems to suit me better, but it still doesn’t feel right. I put together a new definition using my own words
Smart Data is the part of data whose value can be used directly to answer specific business needs or to fulfill pre-defined outcomes. Smart Data fits thoroughly technical specifications and should be employed as it is without any redundant information in the data structure.
To put it simply, Smart Data is not a separate entity of Big Data, but from the unprocessed data, it would be retrieved with pieces of valuable information. Smart Data is what you put on your PowerPoint presentation, what you project in the decision-making process.
As surprising as it sounds, Smart Data doesn’t come last but it should be on the initial design of the process. It determines how you collect the data, the way you construct your data architecture, the people you want to hire to get the job done. Using the outcome to decide the income is a lean method to maximize the result and minimize the cost.
Big Data helps organizations to create new growth opportunities and entirely new categories of companies that can combine and analyze industry data. These companies have ample information about the products and services, buyers and suppliers, consumer preferences that can be captured and analyzed. We all know this concept but we should start with the why big data or we should go with Big Smart Data...
Without valuable information, Big Data is nothing but fruitless digital pieces and merely a burden to your business. Redundant, unnecessary, and duplicate data is the root of all evil. In a chaotic data era, Smart Data would be our savior, and it doesn’t come last but rather the first of all things.
Think About this ???
https://www.dhirubhai.net/in/pratyushachakraborty/ #letsconnect
PhD Scholar - Data Science, Data Analyst, Business Analyst, ML, AI, Tech blogger, Clean Tech, Sustainability
4 年PhD Scholar - Data Science, Data Analyst, Business Analyst, ML, AI, Tech blogger, Clean Tech, Sustainability
4 年