How to Make Better Data-Driven Decisions
You have probably heard from me that more than 80% of all data that is collected by organizations is not in a standard relational database. Instead, it is trapped in unstructured documents, social media posts, machine logs, images, and other data, right?
What maybe is new that every two days we create as much information as we did from the dawn of civilization to 2003. Organizations are struggling to make sense of this growing amount of data. If an organization can make sense of 7 percent of its observation space today, tomorrow they might only be able to make sense of 4 percent of what they know, as the information continues to grow.
Sometime it is hard to get insights from your systems, due lack of context. What I mean by context is that we need to look around that piece of existing data and search for possible connections, just like a puzzle.
Imagine an ever-growing pile of puzzle pieces of different sizes, colors and shapes and you want to figure out what is the picture that those puzzles pieces represents.
What you don’t know is that there are several puzzles, some pieces are duplicates, missing, incomplete, low quality or you think that is something which it isn’t and lately we are experiencing a situation where some pieces that are “fake” , can you believe it?
So, until you put all pieces together in a table and start working on it, you have no idea what you are dealing with.
An efficient strategy to finish a puzzle is once you add a new piece, you do an analysis and you “sort” that piece as:
- Not part of the puzzle
- Group of similar parts
- Connected part
You can extrapolate this exercise with more categories, depending on the picture, such as: color patterns, borders, an known image, etc…
Another thing about puzzles is that if you want to put a small puzzle together, let’s say 20cm x 20cm of size, you need more than 400cm2 of table to execute the job right? But once the picture is forming a shape, less space is required.
Computationally, the most expensive piece of puzzle occurs when the puzzle parts extrapolate the picture size, and the tipping point happens when similar parts begin to connect to each other and form shapes and the space consumed by the puzzle start to collapse until the whole picture is revealed.
In this flow, basically more amnesia you have, more compute power is required, once the “pieces” are connected and forming chunks, this process starts to be faster and faster.
Many organizations face similar challenge trying to manage this deluge of unstructured data, such as:
- Pinpointing and activating relevant data for large-scale analytics.
- Lacking the fine-grained visibility that is needed to map data to business priorities.
- Removing redundant, obsolete, and trivial (ROT) data.
- Identifying and classifying sensitive data.
In order to speed up the "puzzle tale" dilemma and create sense of the data even when the information are "puffed out" of the table, IBM developed a solution called IBM Spectrum Discover which is a modern metadata management software that provides data insight for petabyte-scale file and object storage, storage on premises, and in the cloud. This software enables organizations to make better business decisions, and gain and maintain a competitive advantage.
Imagine boosting your Artificial Intelligence & Data Science productivity just like the “puzzle tale” by:
- Unifying data Silos, wherever they the data resides, on-premises or off
- Simplify and accelerate data curation
- Improve the quality of your data by eliminating redundant, obsolete and trivial data
- Enhance the value of your data with semantic metadata
IBM Spectrum Discover provides a rich metadata layer that enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data. It also improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed-critical research.
What is metadata?
Metadata is data that describes data. Metadata captures the useful attributes of the associated source data to give the metadata context and meaning. For example, source data is a file or an object. The metadata is a set of attributes that are key-value pairs. The metadata records are associated with the file or object and are typically stored on the same system as the source data.
System metadata is created and updated by the host system and not the application software. IBM Spectrum Discover enables the addition of tags that can capture non-system metadata-specific attributes.
IBM Spectrum Discover provides the following benefits:
- Simplify data discovery and data heritage so organizations can much more easily identify, prepare and optimize their data.
- Data Insight for analytics, governance and optimization
- Help organizations derive greater business value from their unstructured data.
- Automates identification, classification and tagging of unstructured data at scale.
- Provides comprehensive data insight by combining system and customer metadata to give data more context and meaning.
IBM Spectrum Discover can scan or ingest billions of records in the course of a day. Ingesting data consists of reading metadata information from the source storage system and automatically cataloguing the information into the IBM Spectrum Discover platform. This feature enables IBM Spectrum Discover to deliver results of complex queries or multi-faceted searches against the metadata information ultrafast, even when the catalog contains billions of entries. The search results are visualized by the GUI’s drill-down dashboard nearly instantaneously, for IBM and non-IBM systems, please check with your local IBM sales representative for the current supported platforms by IBM Spectrum Discover.
As we know, ingested “as is”, data might not be as useful as it could be, however IBM Spectrum Discover address the “Enterprise Amnesia” by enriching the metatada, classifying it, unifying silos, boosting your Artificial Intelligence initiative, reducing the expenditure in Storage and many others benefits, but mainly it will help your organization to make more assertive decisions.
When you are playing with your kids with puzzles, we teach them to create a strategy before solving them, right? In the same way, when your organization is facing a challenge when dealing with a huge amount of data, the strategy is needed too, so if you would like to build an efficient strategy, let me know and I am here to help, or visit
https://www.ibm.com/au-en/it-infrastructure/storage
As a IBM Senior Technology Specialist I help clients explore use cases for Sustainability, RedHat OpenShift and Cyber Security | Entrepreneur | Seamstress | Survivor
4 年Good thought process. I always thought of it a bit like the bridge you build in an assessment center where the rules of engagement continually change with the goal.
Nice analogy Abilio, really trying to make sense of tons of data without a tool like Spectrum Discovery it is like trying to build a puzzle with your eyes closed! ??