登录查看更多内容

How to Make Better Data-Driven Decisions

Dr Abilio Oliveira

发布日期: 2020年10月1日

You have probably heard from me that more than 80% of all data that is collected by organizations is not in a standard relational database. Instead, it is trapped in unstructured documents, social media posts, machine logs, images, and other data, right?

What maybe is new that every two days we create as much information as we did from the dawn of civilization to 2003. Organizations are struggling to make sense of this growing amount of data. If an organization can make sense of 7 percent of its observation space today, tomorrow they might only be able to make sense of 4 percent of what they know, as the information continues to grow.

Sometime it is hard to get insights from your systems, due lack of context. What I mean by context is that we need to look around that piece of existing data and search for possible connections, just like a puzzle.

Imagine an ever-growing pile of puzzle pieces of different sizes, colors and shapes and you want to figure out what is the picture that those puzzles pieces represents.

What you don’t know is that there are several puzzles, some pieces are duplicates, missing, incomplete, low quality or you think that is something which it isn’t and lately we are experiencing a situation where some pieces that are “fake” , can you believe it?

So, until you put all pieces together in a table and start working on it, you have no idea what you are dealing with.

An efficient strategy to finish a puzzle is once you add a new piece, you do an analysis and you “sort” that piece as:

Not part of the puzzle
Group of similar parts
Connected part

You can extrapolate this exercise with more categories, depending on the picture, such as: color patterns, borders, an known image, etc…

Another thing about puzzles is that if you want to put a small puzzle together, let’s say 20cm x 20cm of size, you need more than 400cm2 of table to execute the job right? But once the picture is forming a shape, less space is required.

Computationally, the most expensive piece of puzzle occurs when the puzzle parts extrapolate the picture size, and the tipping point happens when similar parts begin to connect to each other and form shapes and the space consumed by the puzzle start to collapse until the whole picture is revealed.

In this flow, basically more amnesia you have, more compute power is required, once the “pieces” are connected and forming chunks, this process starts to be faster and faster.

Many organizations face similar challenge trying to manage this deluge of unstructured data, such as:

Pinpointing and activating relevant data for large-scale analytics.
Lacking the fine-grained visibility that is needed to map data to business priorities.
Removing redundant, obsolete, and trivial (ROT) data.
Identifying and classifying sensitive data.

In order to speed up the "puzzle tale" dilemma and create sense of the data even when the information are "puffed out" of the table, IBM developed a solution called IBM Spectrum Discover which is a modern metadata management software that provides data insight for petabyte-scale file and object storage, storage on premises, and in the cloud. This software enables organizations to make better business decisions, and gain and maintain a competitive advantage.

Imagine boosting your Artificial Intelligence & Data Science productivity just like the “puzzle tale” by:

Unifying data Silos, wherever they the data resides, on-premises or off
Simplify and accelerate data curation
Improve the quality of your data by eliminating redundant, obsolete and trivial data
Enhance the value of your data with semantic metadata

IBM Spectrum Discover provides a rich metadata layer that enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data. It also improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed-critical research.

What is metadata?

Metadata is data that describes data. Metadata captures the useful attributes of the associated source data to give the metadata context and meaning. For example, source data is a file or an object. The metadata is a set of attributes that are key-value pairs. The metadata records are associated with the file or object and are typically stored on the same system as the source data.

System metadata is created and updated by the host system and not the application software. IBM Spectrum Discover enables the addition of tags that can capture non-system metadata-specific attributes.

IBM Spectrum Discover provides the following benefits:

Simplify data discovery and data heritage so organizations can much more easily identify, prepare and optimize their data.
Data Insight for analytics, governance and optimization
Help organizations derive greater business value from their unstructured data.
Automates identification, classification and tagging of unstructured data at scale.
Provides comprehensive data insight by combining system and customer metadata to give data more context and meaning.

IBM Spectrum Discover can scan or ingest billions of records in the course of a day. Ingesting data consists of reading metadata information from the source storage system and automatically cataloguing the information into the IBM Spectrum Discover platform. This feature enables IBM Spectrum Discover to deliver results of complex queries or multi-faceted searches against the metadata information ultrafast, even when the catalog contains billions of entries. The search results are visualized by the GUI’s drill-down dashboard nearly instantaneously, for IBM and non-IBM systems, please check with your local IBM sales representative for the current supported platforms by IBM Spectrum Discover.

As we know, ingested “as is”, data might not be as useful as it could be, however IBM Spectrum Discover address the “Enterprise Amnesia” by enriching the metatada, classifying it, unifying silos, boosting your Artificial Intelligence initiative, reducing the expenditure in Storage and many others benefits, but mainly it will help your organization to make more assertive decisions.

When you are playing with your kids with puzzles, we teach them to create a strategy before solving them, right? In the same way, when your organization is facing a challenge when dealing with a huge amount of data, the strategy is needed too, so if you would like to build an efficient strategy, let me know and I am here to help, or visit

https://www.ibm.com/au-en/it-infrastructure/storage

Richard Austin

Stay connected with #ibmfusion | All opinions are my own

4 年

Good thought process. I always thought of it a bit like the bridge you build in an assessment center where the rules of engagement continually change with the goal.

1 次回应

Nilton Santos

4 年

Nice analogy Abilio, really trying to make sense of tons of data without a tool like Spectrum Discovery it is like trying to build a puzzle with your eyes closed! ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Dr Abilio Oliveira的更多文章

?? Do Meio do Mato ao Mundo da Matemática: O que o Favo de Mel Me Ensinou Sobre Organiza??es

2025年3月21日

?? Do Meio do Mato ao Mundo da Matemática: O que o Favo de Mel Me Ensinou Sobre Organiza??es

Como algumas pessoas sabem, eu moro no meio do mato. Literalmente.
?? 5 Mulheres que Moldaram Quem Sou Hoje – e Como Suas Li??es Podem Transformar Você Também ??????

2025年3月7日

?? 5 Mulheres que Moldaram Quem Sou Hoje – e Como Suas Li??es Podem Transformar Você Também ??????

Nesse Dia Internacional das Mulheres, quero prestar uma homenagem especial às mulheres que moldaram minha jornada…

3 条评论
Microsoft e o Majorana 1: Um Salto na Computa??o Quantica, Mas Ainda Longe da Tolerancia a Falhas

2025年2月21日

Microsoft e o Majorana 1: Um Salto na Computa??o Quantica, Mas Ainda Longe da Tolerancia a Falhas

A Microsoft acaba de anunciar um grande avan?o no campo da computa??o quantica com o Majorana 1, um chip quantico…

4 条评论
?? De Terminator a SuperAgency: O Futuro da IA é Muito Mais Do Que Hollywood Nos Contou ??

2025年2月18日

?? De Terminator a SuperAgency: O Futuro da IA é Muito Mais Do Que Hollywood Nos Contou ??

Se você cresceu assistindo Terminator, se questionou sobre a realidade depois de Matrix, ou sentiu um desconforto…

5 条评论
A BIOS Pode Ser Velha, Mas Ainda Funciona: Seguran?a em Camadas no Modelo OSI

2025年2月12日

A BIOS Pode Ser Velha, Mas Ainda Funciona: Seguran?a em Camadas no Modelo OSI

Semana passada, durante uma reuni?o com minha equipe, fui alvo de uma brincadeira. Alguns membros do time, notoriamente…

1 条评论
Ada Lovelace - O Propósito de um Programa

2025年2月6日

Ada Lovelace - O Propósito de um Programa

Ontem à tarde, em uma reuni?o intensa com um cliente, fui desafiado com uma pergunta instigante: “O que é mais…

3 条评论
A Era da Amnésia Informacional: Por que Redes Sociais e a Internet n?o S?o Mais Fontes Confiáveis de Informa??o

2025年1月28日

A Era da Amnésia Informacional: Por que Redes Sociais e a Internet n?o S?o Mais Fontes Confiáveis de Informa??o

Chegamos a um ponto crítico na história da informa??o. As redes sociais e a internet, outrora vistas como os grandes…

2 条评论
O Poder do Pensamento Crítico e da Curiosidade: Uma Li??o de Natal

2025年1月25日

O Poder do Pensamento Crítico e da Curiosidade: Uma Li??o de Natal

Na noite de Natal do ano passado, vivi uma experiência marcante que uniu o encanto da infancia e o despertar do…

3 条评论
Cr?nicas da Memória: Li??es de Lideran?a Aprendidas com Meu Pai

2025年1月16日

Cr?nicas da Memória: Li??es de Lideran?a Aprendidas com Meu Pai

Uma Semana Especial e um Despertar Inesperado Depois de muitos anos trabalhando fora do Brasil, uma das maiores…

9 条评论
Uma Jornada épica: Do Inferno Digital ao Paraíso Tecnológico

2025年1月11日

Uma Jornada épica: Do Inferno Digital ao Paraíso Tecnológico

No vasto e fascinante universo da TI, encontramos paralelos surpreendentes com “A Divina Comédia” de Dante Alighieri…

See all articles

How to Make Better Data-Driven Decisions

Dr Abilio Oliveira

Dr Abilio Oliveira的更多文章

社区洞察

其他会员也浏览了

RAG Systems: TOP 3 pros and cons (compared to fine tuning)

My Journey to Becoming a Warm Data Lab Host: Insights and Impacts

#135 Lake Vectoria: Navigating the Waters of Data and Vectors

Dynamic Time Warping (DTW): A Powerful Tool for Time Series Analysis

What kind of data does your company have?

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

The Physical and the Mental . . . and the Information in Between Dimensions of Breaking the Plane

Stop Using Vector Indexes (When You Don't Need Them)

Tales from the Trail: Data Foundations

What Are Label Spans?

Dr Abilio Oliveira的更多文章

?? Do Meio do Mato ao Mundo da Matemática: O que o Favo de Mel Me Ensinou Sobre Organiza??es

?? 5 Mulheres que Moldaram Quem Sou Hoje – e Como Suas Li??es Podem Transformar Você Também ??????

Microsoft e o Majorana 1: Um Salto na Computa??o Quantica, Mas Ainda Longe da Tolerancia a Falhas

?? De Terminator a SuperAgency: O Futuro da IA é Muito Mais Do Que Hollywood Nos Contou ??

A BIOS Pode Ser Velha, Mas Ainda Funciona: Seguran?a em Camadas no Modelo OSI

Ada Lovelace - O Propósito de um Programa

A Era da Amnésia Informacional: Por que Redes Sociais e a Internet n?o S?o Mais Fontes Confiáveis de Informa??o

O Poder do Pensamento Crítico e da Curiosidade: Uma Li??o de Natal

Cr?nicas da Memória: Li??es de Lideran?a Aprendidas com Meu Pai

Uma Jornada épica: Do Inferno Digital ao Paraíso Tecnológico

社区洞察

其他会员也浏览了

RAG Systems: TOP 3 pros and cons (compared to fine tuning)

My Journey to Becoming a Warm Data Lab Host: Insights and Impacts

#135 Lake Vectoria: Navigating the Waters of Data and Vectors

Dynamic Time Warping (DTW): A Powerful Tool for Time Series Analysis

What kind of data does your company have?

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

The Physical and the Mental . . . and the Information in Between Dimensions of Breaking the Plane

Stop Using Vector Indexes (When You Don't Need Them)

Tales from the Trail: Data Foundations

What Are Label Spans?