Solving the Dark Data Problem
Vishal Krishna
Upstreamlife Media Pvt Ltd helps tech businesses to curate content that reiterate values to their stakeholders and enables them to continuously tell stories in channels of distribution owned by them.
A decade ago organizations just discovered the possibilities of crunching data at scale. Punters called it the era of big data and then began the narrative of crunching structured and unstructured data.
While the narrative shifted to Artificial Intelligence & Machine Learning organizations began to grapple with a larger problem with the explosion of information. They began to store information in digital archives, yet not knowing what to do with them. Today we are living in the era of Dark Data.?
Think of it like Frank Zappa's "Vault" where even to this day there is so much music that musicians and musicologists are discovering new material all the time.?
Today, organizations generate so much data, which eventually ends up in a vault, waiting for someone to make sense of them. It's one of the largest business opportunities since big data & AI technologies have become mainstream globally.?
For those who want to know what the technical meaning of dark data is:
Dark data is all of the unused , unknown and untapped data across an organization, generated as a result of users’ daily interactions online with countless devices and systems — everything from machine data to server log files to unstructured data derived from social media.
It is obvious that organizations use AI & machine learning to crunch dark data to put them to good use in business.?
IBM's Datacap, Google's Cloud Vision and AutoML, and Microsoft's Azure Cognitive Services are some of the technologies used by dark data practitioners.?
Just remember that if you don't use your data, it is prone to security risks and theft, which means you have to invest more in securing items that you don't know what you are doing with. What is sensitive data is stolen - today data sitting in public clouds and stored away can be prone to higher risks of attack.?
All data now needs to be compliant with the local laws of every country, including GDPR, dark data can expose to the company during audits by regulators. So please beware of the data you store, put it to good use.?
Who is putting dark data to good use?
Genpact and Envision Racing have created a novel way to use data received in the form of audio streams from racing tracks.?
Envision Racing is a leading e-racing team which was able to draw insights from alternative data sources, such as GPS and radio. By cleansing and analyzing publicly available GPS race data, for example, Envision Racing created heat maps that reveal rival drivers' tendencies on the track.?
Race strategists can then identify the drivers who are likely to over-consume energy and how they might behave on different parts of the circuit, giving the Envision Racing drivers a decision-making edge.
While AI and machine learning were used, the teams of Envision and Genpact required skills like Docker, Kubernetes, Java, and Python, as well as NLP. On top of all this, the teams added an MLOps architect to manage the complete process.
While the first use case covered the future of racing, which is electric. The second use case delves into an industry which defined the last 150 years, which is the oil & gas industry.?
The oil & gas industry records subsurface data and this data is recorded in tapes for more than 40 years.
Oil and gas companies collect data at various scales from a few miles to hundreds of miles to small tiny samples of rock being drilled. This data is stored offsite in taps and has no benefits of being used digitally.?
The cost of storing data in tape vaults on a cost-per-Gb basis can be high according to AWS .?
Tape Ark, an IT company, realized that data locked on legacy tape needs to be used by the industry when going through digital transformation. It worked with AWS to use the cloud to make tape data digital.?
The partnership enabled the creation of a high-level workflow solution which starts by receiving media and performing a detailed tape media audit. This allows oil and gas companies to predict the cloud footprint they will create from their data at a granular level, seek out duplicates, and remove data for ingest that may be part of a joint venture (JV) or belong to a third party. This method ensured the scalability of data in the oil and gas industry with dark data.?
Coming to India
Now let's look at the fintech opportunity in India as the third use case because Indian Banks will spend on IT transformation where crunching data will be of paramount importance.?
According to Gartner, Indian Banks will spend a lot on IT to manage core banking, loans and consumer experience. A bank on average has more than 200 applications and data is stored in several formats. Banks don't recognize their most loyal customers today digitally and customers receive so many bank calls that it creates customer animosity over the experience.?
Gartner forecasted that IT spending in India was $101.8 billion in 2022 an increase of 7 per cent when compared with 2021.?
Going by this forecasted number, fintech companies, in India, will have to set policies to use technology that can crunch all forms of data at scale before dumping what is not necessary.
So summing up; dark data just lying in a Data Centre somewhere is not good for the enterprise because of security and business risks.?
领英推荐
Neither is it good for the planet because of the energy being consumed to manage unused data.
Either way it is clear that organizations today need to use every bit of their data be it for reporting and compliance or even scaling up their business. It's just not prudent to hoard data anymore, one must use it to scale the business.?
According to CB Insights with over 175 zettabytes of data expected by 2025, data canters will play a vital role in the ingestion, computation, storage, and management of information.?
What will then play a larger role would be those platforms that can enable seamless data flow between enterprises and vendors by using data which previously everyone thought had to be shelved.?
Don't be in the dark anymore, lead your data into the light.?
Follow The UpStreamLife for more and don't forget to subscribe to our newsletter for news on biggest tech events, Startup news and industry updates.
Senior Digital Marketing Specialist- Data Dynamics
11 个月Great article! The shift from #bigdata to #AI and #ML has given rise to #DarkData—untapped information within organizations. Key to addressing this is data democratization through platforms like Data Dynamics, making data accessible and understandable across the organization. The Data Dynamics (https://www.dhirubhai.net/company/data-dynamics/mycompany) Unified Data Management Platform plays a crucial role in unlocking the latent value of dark data. However, challenges like security risks and compliance issues highlight the need for a strategic approach to data utilization. Embracing data democratization through unified platforms not only unlocks business potential but also promotes sustainable and responsible data management practices.
Founder, ceo tobrand.biz pikk.company
1 年Use pikkC.app ??
Next Trend Realty LLC./wwwHar.com/Chester-Swanson/agent_cbswan
1 年Thanks for Sharing.
| Strategy & Innovation Specialist | Helping Startups & Scaleups GTM | EV Technology Consultant | Automotive Semiconductors & Electronics Product Management Expert | Your FRACTIONAL CxO, ex-Gartner |
1 年Vishal Krishna Great Piece and Truly Insightful, Eventually, ofcourse over a period of time, We will witness an Evolution, ‘Decentralization of Data’ happening… Data will have 3 Broad Contours that I call as the HIV trend - Human & Home (Personal and Private Data and Info) - Institutions (Offices, Universities, Hospitals…) - Vehicles of Mobility (Cars, Trains, Flights, Ships…) So Data Ownership will be traced by its Origins and Data Storage will evolve into broadly 3 Categories depending on Privacy Levels as - Edge (Personal and Private for Self) - Cloud (Public, Social for Governance) - Shared (Interactions b/n HIV Clusters) This way the Data will be Distributed and so will the Data Centers too be decentralized… Creating what I called Distributed Data Centers that will be pervasive and be available within 1. Homes for Human Healthcare Data and Home Connected Devices 2. Institutions that you work for carrying all your Workplace Interactions and Touchpoints 3. On Your Vehicles storing your interactions as you travel from your Home to your Institutions So with Cost of Storage further reducing you will see a trend of Data Centers or Hubs that are purpose built for the Data LifeCycle Origin to Archive… ofcourse with AI
You're absolutely right! The abundance of data generated by organizations presents a significant business opportunity. Thanks for sharing, Vishal! How do you envision organizations harnessing the power of big data and AI to maximize their potential?