登录查看更多内容

Delivering New Business Success Requires a Unified Data Space

Mark Stadtmueller

发布日期: 2018年8月14日

Nothing in business is passive. Every day that a business operates is driven by a strategy. Even having no strategy is a strategy. Whatever that strategy is, it drives growth. No matter if it is a food truck business deciding on food type, location, or truck type or a global energy company, choosing sources, means, and markets; strategy, choices, and decisions take place and their results drive growth (or lack thereof).

With digital transformation, the presence of strategy in business growth does not change. Growth is always about choosing new or better products and services, new or better customer interactions, or new or better ways of doing business. A strategy leveraging Digital Transformation is merely the newest, fastest, least expensive way to deliver growth. Jeanne Ross at MIT/Sloan gives a strong definition of Digital Strategy as an: “integrated business strategy inspired by the capabilities of powerful readily accessible technologies like social, mobile, analytics, cloud, and internet of things and responsive to constantly changing market conditions.” (link)

Social and mobile are usually focused on new or better customer interactions. Analytics and cloud on new or better ways of doing business. And Internet of Things about innovative anticipatory capabilities based on pervasive connectivity and data gathering to delivers new or better products and services. But, the root of all of this is data. Social and Mobile customer interactions rely on data to and from social and mobile platforms to drive the new and better interactions. Cloud and Analytics drive new or better ways of doing business from data that is provided or enabled. And internet of things drives new products and services from the data that IoT provides and presents. Moreover, and maybe more importantly, social, mobile, analytics, cloud, and IoT together create more expressive data at unbounded volumes. Data is both the root and a result.

Getting a handle on data in business is not new.

However, harnessing data for digital transformation is new. Databases, and Data Warehouses, and more recently Data Lakes have made their mark and become established business capabilities. However, the recent advances in Artificial Intelligence have become a key capability for digitally inspired business initiatives. In AI, models need to be trained from datasets that need to be created, fused, and then run through models to train the models. Databases, Data Warehouses, and Data Lakes are not well suited or specifically designed to train models from created “virtual” datasets.

Indeed, while AI continues to capture the imagination of the public and rightly demands the attention of enterprises and businesses of all types, a recent KDNuggets article sums up the current situation nicely. “Academic papers are almost entirely focused on new and improved models, with datasets usually chosen from a small set of public archives. Everyone I know who uses deep learning as part of an actual application spends most of their time worrying about the training data instead.” https://www.kdnuggets.com/2018/06/improve-training-data-how.html. The article posts a picture that humorously describes the challenge with data in business (in this case Tesla) vs in Academia (side note, I hear from many academics that data is no less a challenge there as well).

Lisha Li from a presentation by Andrej Karpathy

Challenges in business with data

The first challenge with Business data is the oft-mentioned data silos. Data silos are viewed as a negative result of outdated systems and processes or the result of the creation of many different data stores for any individual initiative or merger or acquisition. While this is true, it is only part of the story. Data sometimes has to be put into silos to meet security and compliance requirements. For example, there might be obvious benefit to merging customer purchase history with external data sources. However, the price a customer paid for products and services can be very sensitive and therefore there are good business reasons to keep it in the silo. So, while disparate, costly and outdated silos are an issue, they are not the only issue. Sensitive, secure, and compliant data is required to be siloed because the systems do not have the ability to provide role based access or authorization based access to individual data elements once they are merged.

The second challenge is that missing and messy data is a de-facto part of business data. Getting clean data sets is a pre-requisite for modern AI based models. But, that is just not the case with most all business data. And data science techniques to deal with missing and messy data can often have negative consequences because often those null values (“NAs”) and errors have inherent meaning in themselves.

The third challenge is scaling and cost of data initiatives. The systems used to capture, secure, and harness data have traditionally been very expensive and efforts to merge or migrate data are almost always, very large, long, time consuming and very costly. Very often the effort to capture, secure, and harness data becomes intractable in business.

Why Data Lakes, Databases, Data Warehouses do not address the needs of digital transformation

While it is common in business discussions to discuss the limitations or license costs of databases, the investments required for Data Warehouses, or how Data Lakes often seem to become data swamps, the reality is that all these platforms are very good for the purposes they perform, however none are very well suited for the goals of Digital Transformation and leveraging AI. In order for a business to deliver on a strategy leveraging Digital Transformation, data must be leveraged to detect, classify, segment, predict, or recommend something that can then be applied to existing or new business applications and processes.

In AI, this is called “creating a model”, “training the model on data”, and finally “serving the trained model”. In order to do this, data must be captured, secured, and harnessed into a “dataset” that a model can be trained on. Wikipedia describes attributes of a dataset here: https://en.wikipedia.org/wiki/Data_set. “Most commonly a data set corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question.” In general, AI requires a dataset of “well organized” data.

The problem with this in business is two-fold:

First, where well formed data already exists, most businesses have already leveraged business intelligence systems. While these systems may be costly and may not deliver the modeling accuracy of an AI platform, they are already in use and serving a purpose.

But, more importantly, secondly, the opportunity for Digital Transformation lies in new relationships that can be gleaned from seemingly disparate data. E.g. relating geographic weather data to retail store location sales data as well as current traffic patterns and conditions to better predict inventory requirements for retail stores or for that matter, location of stores, or what type of stores). This data can and does come from multiple data locations, it comes in different modalities, it is often not well formed and incomplete, and it often has unique security and handling requirements that varies across the collected data.

The challenge for businesses is that it is impractical or too costly to do this data ingestion and transformation in databases or data warehouses and not doable in a data lake.

A Dataspace is required for Digital Transformation empowered with AI

A dataspace is required to overcome these challenges. Dataspaces allow for abstractions in data that can reduce effort and overcome data integration challenges.

To address the aforementioned challenges, Lucd has developed our Unified Data Space leveraging Accumulo NoSQL.

The Lucd Unified Data Space (UDS) is a self describing object store with fine grained security. It balances efficiency, flexibility and readability when storing data objects. Data access is efficient because a single access is all that is required to retrieve any object. Decoding object values and metadata information is simple and straightforward. The data space leverages Accumulo and is flexible since there are no restrictions on what data can be stored. And it is readable because data is stored in text form wherever possible. Only truly binary data is stored in binary form. This allows developers, analysts and administrators to browse and even insert data into the data space using existing command-line tools.

In the Unified Data Space, one row represents one data object. An object may represent a data file, a database row or any other entity composed of attributes. The attribute is the basic unit of storage in UDS. Attributes are grouped into objects based on the row ID. The attribute is the fundamental unit of data in UDS. Attributes are primitive data types which can be grouped into complex structures internally and then aggregated to form a complete object representation.

Names give attribute values meaning. In UDS, attribute names represent the hierarchical structure of the data model using dot (.) notation. Examples in this document will use the following object definition:

Rather than storing an entire row as a single element, individual attributes are stored as cells. Reads from and writes to Accumulo are done one cell at a time.

Cells are stored as key-value pairs, where the key and value are comprised of the following elements:

The Unified Data Space stores objects in a table called dataspace. The dataspace table is the primary object store. Other tables contain analytic output and indexes that all refer to objects in the dataspace table.

The Visibility mark contains the rules governing which users have access to each attribute. Boolean operators are used to specify combinations of access controls. For example, assume that a system has access controls labeled A, B, C and D. The visibility mark “(A&B)|D” says that a user must have A and B access privileges or the D access privilege in order to access this attribute. The list of access control marks, the rules governing access to attributes and the method to map users to authorization lists all depend on the specific requirements of the customer.

The Unified Data Space empowers a Virtual Dataset

When delivering business value from data, available data is searched and that search yields a “result set”. But, that result set may span many databases, data stores, data types, data modalities, and formats that require data transformation in order to leverage for models. How to perform these transformations, where and how to store results can be challenging from a compute, storage and security point of view. With a Unified Data Space like deployed in Lucd, and leveraging the Lucd EDA (exploratory data analysis) capability, the challenges of “the messy data required for valuable digital transformation” and “the well formed data required for AI” is overcome through the creation of a “Virtual Dataset”.

The Virtual Dataset empowered by the Lucd Unified Data Space becomes the dynamic glue to rapidly bring data to models. An existing or new model may require data from many different datasets. When envisioning that data needed to feed the model, it is easy in the Unified Data Space to search, identify, and tag that data. And then join that data into the Virtual Data Set that is needed for that unique model.

The Virtual Data Set can then either be saved virtually (reference to locations in the UDS) or a separate data set can also be created. This Virtual Data Set can then be readily used to train models. But, more importantly as models change or new models are incorporated, that VDS can be recalled and the corresponding transformations can also be recalled and edited.

As the business requirements and/or model requirements change, new or different data would be required and new Virtual Data Sets need to be rapidly created. The Lucd Unified Data Space empowers this capability and therefore empowers the dynamics needed for timely leveraging of data and leveraging of models to meet business opportunities.

Conclusion

Jeff Bezos has stated “The only sustainable advantage you can have over others is agility, that’s it. Because nothing else is sustainable, everything else you create, somebody else will replicate.”

The fastest, most efficient, and cost effective way to create is through digital transformation that is driven by data. But, the valuable data that drives differentiation is messy while the models needed to transform require organization. The Lucd Unified Data Space crosses this chasm and allows businesses to implement Enterprise AI.

Randy Schrock

Strategic Opportunities & Programs Director, Zoom

6 年

Good stuff Mark. As always, intelligent, timely and relevant.

2 次回应

Peter Carroll

6 年

Excellent article - great job in capturing today's challenges around data analysis and realizing true digital transformation

1 次回应

Russ Blattner

6 年

Building your Competitive Digital Advantage Leverage your Data Assets – Capture and Securing Your Data Supply Chain – Lucd's Unified Data Space & Data Compliance is a must for Enterprise AI

1 次回应

查看更多评论

要查看或添加评论，请登录

Mark Stadtmueller的更多文章

Mobile Agentic Microservices – Three capabilities to verify for any workforce empowering technology adoption for 2025

2025年1月6日

Mobile Agentic Microservices – Three capabilities to verify for any workforce empowering technology adoption for 2025

It is hard to believe that “The New New Thing” was published over 25 years ago now. There have been a lot of “New New…
The Five AI Basics Every Business Executive Needs to Understand Right Now – Revisited 7 years later

2024年11月25日

The Five AI Basics Every Business Executive Needs to Understand Right Now – Revisited 7 years later

Note: All the text in this post was generated by myself without the use of generative AI. However, I grabbed one image…
Use LLMs like you use a Dishwashing Machine, not an Escuelerie

2024年11月4日

Use LLMs like you use a Dishwashing Machine, not an Escuelerie

Note: All the text in this post was generated by myself without the use of generative AI. However, the cover image and…
Metaverse notes from Dubai

2022年9月18日

Metaverse notes from Dubai

I was in Dubai last week for two metaverse conferences. What is going on in Dubai with respect to metaverse activity is…

1 条评论
Nuclear Power - An Emerging Technology

2022年8月28日

Nuclear Power - An Emerging Technology

I never met Hyman Rickover. He died before I interviewed to become a Nuclear Engineering Officer with Admiral McKee.

5 条评论
The Top Emerging Technology at Each Layer of the OSI Stack

2022年8月7日

The Top Emerging Technology at Each Layer of the OSI Stack

The Top Emerging Technology at Each Layer of the OSI Stack You don’t hear much about the OSI model anymore. When I made…
Use Analytics to Stay on top of Employee Connectedness during COVID-19 Outbreak

2020年3月11日

Use Analytics to Stay on top of Employee Connectedness during COVID-19 Outbreak

It is hard to write anything with respect to COVID-19 when significant updates occur by the hour. However, “flattening…
The Journey to Enterprise AI

2018年8月21日

The Journey to Enterprise AI

“but I believe that we, as humans, have an insatiable demand for products and services that will help us to achieve…

2 条评论
Is “Enterprise AI” different than Business/Enterprise use of AI?

2018年8月6日

Is “Enterprise AI” different than Business/Enterprise use of AI?

Spoiler Alert: Yes August 6, 2018 By Mark Stadtmueller, VP, Product Strategy, lucd.ai A florist using a van for flower…

5 条评论
Top 10 AI New Year’s Resolutions for 2018

2017年12月31日

Top 10 AI New Year’s Resolutions for 2018

Looking forward to keeping all of them! Number 10: Any Cloud, Any Platform: AI has progressed in leaps and bounds. But,…

See all articles

Delivering New Business Success Requires a Unified Data Space

Mark Stadtmueller

Mark Stadtmueller的更多文章

社区洞察

其他会员也浏览了

Harnessing Denodo Platform: Real-World Use Cases in Business

Systems of Insight Market Size and Analysis Research Report 2031

Understanding the Footfall Analysis Process: Step by Step with Technical Analytics Solutions

The Strategic Imperative for Data-AI Convergence

What Is Digital Transformation And What Are Its Benefits?

The Power of Data Analytics: Unlocking Business Insights

From Data to Insight: The Role of GenAI in Business Strategy

Empower Your Data Journey with Qritrim's Data Ingestion as-a-Service

Weaving Insights: The Power of Data Fabrics in Hybrid Cloud Environments

Big Data: The Secret Sauce of Business Excellence

Mark Stadtmueller的更多文章

Mobile Agentic Microservices – Three capabilities to verify for any workforce empowering technology adoption for 2025

The Five AI Basics Every Business Executive Needs to Understand Right Now – Revisited 7 years later

Use LLMs like you use a Dishwashing Machine, not an Escuelerie

Metaverse notes from Dubai

Nuclear Power - An Emerging Technology

The Top Emerging Technology at Each Layer of the OSI Stack

Use Analytics to Stay on top of Employee Connectedness during COVID-19 Outbreak

The Journey to Enterprise AI

Is “Enterprise AI” different than Business/Enterprise use of AI?

Top 10 AI New Year’s Resolutions for 2018

社区洞察

其他会员也浏览了

Harnessing Denodo Platform: Real-World Use Cases in Business

Systems of Insight Market Size and Analysis Research Report 2031

Understanding the Footfall Analysis Process: Step by Step with Technical Analytics Solutions

The Strategic Imperative for Data-AI Convergence

What Is Digital Transformation And What Are Its Benefits?

The Power of Data Analytics: Unlocking Business Insights

From Data to Insight: The Role of GenAI in Business Strategy

Empower Your Data Journey with Qritrim's Data Ingestion as-a-Service

Weaving Insights: The Power of Data Fabrics in Hybrid Cloud Environments

Big Data: The Secret Sauce of Business Excellence