登录查看更多内容

The Data Space-Time Continuum for Analytics Innovation and Business Growth

Kirk Borne, Ph.D.

LinkedIn Top Voice, Thinkers360 Top 25 Overall Thought Leader, Founder of Data Leadership Group (Data Scientist. Top Influencer. Speaker. Trainer. Consultant. Astrophysicist). Advisor to PrimeAI and other AI startups.

发布日期: 2023年7月14日

We discussed in?another article?the key role of enterprise data infrastructure in enabling a culture of data democratization, data analytics at the speed of business questions, analytics innovation, and business value creation from those innovative data analytics solutions. Now, we drill down into some of the special characteristics of data and enterprise data infrastructure that ignite analytics innovation.

First, a little history – years ago, at the dawn of the big data age, there was frequent talk of the three V’s of big data (data’s three biggest challenges): volume, velocity, and variety. Though those discussions are now considered “ancient history” in the current AI-dominated era, the challenges have not vanished. In fact, they have grown in importance and impact.

While massive data volumes appear less frequently now in strategic discussions and are being tamed with excellent?data infrastructure solutions from Pure Storage, the data velocity and data variety challenges remain in their own unique “sweet spot” of business data strategy conversations. We addressed the data velocity challenges and solutions in our previous article: “Solving the Data Daze – Analytics at the Speed of Business Questions”. We will now take a look at the data variety challenge, and then we will return to modern enterprise data infrastructure solutions for handling all big data challenges.

Okay, data variety—what is there about data variety that makes it such a big analytics challenge? This challenge often manifests itself when business executives ask a question like this: “what value and advantages will all that diversity in data sources, venues, platforms, modalities, and dimensions actually deliver for us in order to outweigh the immense challenges that high data variety brings to our enterprise data team?”

Because nearly all organizations collect many types of data from many different sources for many business use cases, applications, apps, and development activities, consequently nearly every organization is facing this dilemma.

Orchestrating analytics and insights discovery across diverse, distributed data sources is hard enough, but especially so if those data are hard to find, hard to access, and each burdened with its own data delivery latency bottleneck from the source to the end-user. Distributed data sources can create friction for data teams when attempting to integrate multiple datasets. That’s a big problem for analytics innovation since those high-variety datasets promise—when combined—to yield deep, actionable insights that create new value for organizations.?One way that I have described this particularly positive characteristic of data variety is this: Variety is the spice of discovery! Here are three common examples:

(1) Data variety enables the use of data features from multiple data sources to disambiguate two different entities (e.g., customers, products, events, behaviors, cyber actors) that would otherwise appear to be the same when viewed in a small number of “low information” features within a single data source. Data variety can thereby significantly improve analytics model accuracy—reducing false positives, false negatives, and other misclassifications.

(2) Data variety enables the use of multiple data features to detect when two different entries in different data sources are actually referring to one and the same entity (e.g., the same customer in the marketing database, sales database, customer call center CRM database, and product returns database).

(3) Data variety enables the discovery of new classes and categories of entities and events—exploring the high-dimensional data space to uncover new types of entities and events in your domain that were previously not identified as such because the data space was unintentionally being projected into a lower dimensional view, using too few data features, thereby yielding a biased projection of a more complex and diverse data space of the sample population.

High-variety data lives in a high-dimensional data feature space that also includes real space (geospatial, location-based data features) and real time (e.g., time series, streaming sensor data, time-of-day labels on data, etc.). An example of a business analytics application where these features are especially important is marketing, when making personalized location-based time-dependent product recommendations to a customer.

To summarize what we have just described regarding the “what” and “why” of big data variety, we will use a rocket science metaphor: the data infrastructure of an enterprise may not be a “starship”, nevertheless it really does represent a data space-time continuum (a federation of high-variety data features) that can ignite and accelerate analytics innovation for stellar business growth.

Now, what about the “how” of big data variety – how can an enterprise deal with its challenges?

Benjamin Rogojan 3 年前

7 Elements of a Data Strategy

Analytics8 | Data & Analytics Consultancy 1 年前

Big Data: Do You Have a Plan?

? Daniel Burrus 9 年前

We find that Pure provides uniquely powerful solutions for the high-variety big data challenge. First and foremost, from a strategic perspective, Pure makes this possible because Pure is a foundational data platform that can support many types and modalities of data (structured and unstructured), recognizing that data variety is critical to the analytics strategy, applications, and infrastructure of the modern organization. Next, from a tactical (practical applications) perspective, we recognize that Pure Storage?products can handle data variety in impressive ways.

For example, typically an organization would need different types of data storage devices for different kinds of data types. That can then lead to isolated data silos—a frequent cause of unsuccessful data strategies and broken analytics applications. Pure’s storage platform can handle the variety of today’s data: structured, semi-structured, unstructured, file, block, object, streaming/batched, small files/really large files, etc.

Pure solutions can also?parallelize the data operations. This capability is a game-changer in simplifying data and analytics operations as well as speeding?time to insights. Parallelism is also an essential benefit of the data infrastructure when there are many users, use cases, applications running data in and out of the storage system. With Pure, fragile data staging orchestrations (discovery, access, delivery, integration) across distributed sources are no longer required for complex multi-dataset correlations. Pure solutions greatly simplify data staging and make it much more robust, reproducible, and transparent, so that data scientists can spend more time in the knowledge and insight layer, and less time in the IT layer.

That’s “how” enterprise leaders and data practitioners learn to love data variety.

Pure’s data infrastructure solutions keep high-variety data analytics processes running smoothly and continuously, especially when low-latency discovery and response are critical. On-prem business analytics applications and solutions require data to be transported across the data space-time continuum at “starship enterprise” speed. That requires on-prem data infrastructure solutions that are analytics-ready and AI-ready. Learn more how this is already happening for Pure customers in the following case studies within these different domains:

Read our 2 related articles in this 3-part series focused on Enterprise Analytics Innovation powered by?Pure Storage:

Follow me here at?https://www.dhirubhai.net/in/kirkdborne/?and on Twitter at?@KirkDBorne

要查看或添加评论，请登录

查看全部

The Data Space-Time Continuum for Analytics Innovation and Business Growth

Kirk Borne, Ph.D.

LinkedIn Top Voice, Thinkers360 Top 25 Overall Thought Leader, Founder of Data Leadership Group (Data Scientist. Top Influencer. Speaker. Trainer. Consultant. Astrophysicist). Advisor to PrimeAI and other AI startups.

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

What Is Big Data?

Data Mesh: How to manage continuous change and empower value streams

Blending Data Mesh and Data Fabric: Crafting a Balanced Data Strategy

Data Products: The Future of Data Strategy in Business

The Big Data Game Board?

Can we really call it "Big" Data ?

Updated: The “4 M’s” of Big Data

What can your data do for you?

Sure, Trust Your Data... Until It Breaks Everything

HOW TO CHOOSE A BIG DATA SOLUTION: STEP BY STEP

领英推荐

A Significant Step Forward in AI research – Incorporation of Long-Term Memory into Dynamic AI Models and Agents

2024年11月14日

It’s Not Magic if it is Producing Real Global Benefits and Business Value

2024年1月22日

Three Types of Actionable Business Analytics Not Called Predictive or Prescriptive

2023年10月6日

Delivering Low-Latency Analytics Products for Business Success

2023年7月19日

Solving the Data Daze – Analytics at the Speed of Business Questions

2023年7月14日

Top 9 Considerations for Enterprise AI

2023年7月6日

AI Readiness is Not an Option

2023年7月6日

SAP Datasphere Powers Business at the Speed of Data

2023年3月20日

My top learning and pondering moments at Splunk .conf22

2022年6月17日

How to Go from Data Paradox to Data Productivity with a Business Culture Transformation

2021年10月21日