Data Marketplace
Nuremberg Christmas Market

Data Marketplace

From Data Mesh's Missing Component to Real-Time Data Auctions

Introduction

The Data Mesh Framework reveals a critical gap in the provision and use of data products, which is closed by the Data Marketplace. Aligned with the Data Mesh principles of domain ownership and treating data as a product, the Data Marketplace bridges collaboration between domains.

Beyond internal collaboration, the Data Marketplace plays a central role in accessing external data. It serves as a unified platform that enables seamless integration of external data sets into the internal data ecosystem. This integration drives innovation by unlocking new dimensions from external and alternative data sources.

The following sections address the role of the Data Marketplace in publishing and sharing data products, creating value from external data, monetizing assets and extending beyond raw data to AI models. The final section looks at the evolving landscape of real-time AI model- and prompt-enrichment and offers insights into the future use of real-time data auctions.


Data Mesh’s Missing Component

In the area of data management, Data Mesh is an important concept that emphasizes domain ownership, data as a product, self-service platforms and federated governance. Often overlooked is that a robust internal data marketplace is an important linchpin that ties these components together.

Provisioning Data Products: Data Mesh pioneers begin the process by defining domains, implementing a Data Mesh platform, updating governance and establishing domain teams to create customized data products. The goal is cross-domain use, but challenges arise in this process :

  • Evaluation and Externalization: Will domains find the time to assess the needs of other departments and externalize their data products?
  • Discoverability: How will other domains discover these data products?
  • Product Evaluation: How can the suitability of these products for different needs be determined?
  • Access Control: Who can access these data products?
  • Customization: What if users require customized versions of the products, and who manages them?
  • Service-Level Agreements (SLAs): Are domain teams obligated to deliver data products within certain SLAs?
  • Usage Tracking: Is there a mechanism to track the usage of the products?

Integrating with Data Mesh: The provision of data products requires more than a traditional data catalog that merely describes the assets without facilitating the provision. What is essential is an environment that promotes seamless collaboration between data providers and consumers - a data marketplace on a data exchange platform that bridges the gap between them.

A data mesh provides the platform for distributed development of data products, while the data marketplace provides the mechanism for publishing and exchanging data products.” — Data Mesh’s Missing Ingredient: A Data Marketplace


A data marketplace formalizes data products by requiring detailed descriptions, terms of use, access, distribution, and delivery guarantees. This onboarding process addresses the limitations of Data Mesh and ensures that providers consider external benefits by overcoming potential incentives or time constraints of data domain owners to serve external groups.

Collaboration: A data marketplace facilitates the entire relationship between data providers and data consumers and brings structure to data exchange practices. It enables bidirectional interaction in which domain owners can act as both providers and users. This creates a dynamic ecosystem that promotes a culture of data exchange and collaboration in the Data Mesh. Data products are consumed, integrated and used to create new products and thus contribute to the marketplace. Once the data marketplace is established internally, it can be seamlessly extended to external partners.

Streamlining Operations: Efficient data processing is a hallmark of a well-implemented data marketplace. It streamlines the request for new or custom data sets and automates data distribution. Data providers set up pipelines for the ingestion, transformation, creation and delivery of products. At the same time, data consumers create pipelines to ingest these products and integrate them seamlessly into internal data processing.

Functionality: A well-designed data marketplace efficiently and securely publishes data products for internal and external partners and reduces the time needed to process external requests. A typical data exchange platform consists of modules for data providers (sellers), data consumers (buyers) and data marketplace operators.


Data Mesh’s Missing Ingredient: A Data Marketplace

The data marketplace within Data Mesh simplifies data collection, product creation and documentation for domain teams. It streamlines the definition of access rules and license terms, facilitates collaboration with data consumers and automates the distribution of data products. On the consumer side, the data marketplace serves as a catalog for finding data products and provides detailed information on attributes, values, provenance, ownership and data quality.

Without a data marketplace, there is a risk that Data Mesh becomes a static repository for data products, where domains prioritize individual needs over collective goals. The synergy between Data Mesh and a data marketplace is a catalyst for innovation and efficiency and promises a paradigm shift that unlocks new potential and ensures the sustainable success of companies.


Realizing Value from External Data

Companies primarily prioritize their 1st-party data (1PD), consisting of information from internal sources such as websites, CRM systems and customer interactions. However, the enormous potential lies in external data, especially third-party data (3PD), which exceeds the scope of 1PD by orders of magnitude. Recognizing the true value of external data is a strategic move with significant benefits for improving business insights, decision making and the creation of advanced data products.

The data you own (1PD) is probably much more valuable to you if it is augmented with data someone else owns (3PD)Clemens Mewald

Understanding External Data: The distinction between internal and external data is vital. Internal data provides insights into the company's processes, while external data broadens the perspective by incorporating information from sources outside the company.


Harnessing the power of external data


In addition to external data, there is also: Alternative Data. It comes from non-traditional channels and offers a unique perspective that is valuable for companies seeking a competitive advantage. Examples include credit card transactions, web and app tracking data, social media comments, product reviews and even satellite imagery or economic indicators. The increasing demand for alternative data has meant that there are more providers, the data is more affordable and the technologies have improved


Value of External Data: Leveraging the value of external data comes with challenges. This includes the process of discovery, procurement and integration :

  1. Discovery: Recognizing the potential of 3PD is the first step. It requires creativity to identify all possible external data that could complement 1PD. Use unconventional sources, such as satellite imagery to predict retail sales, to capture a broader range of data.
  2. Procurement: Obtaining external data through data marketplaces presents some difficulties, such as navigating free text searches, understanding different data schemas, negotiating licenses, and deciphering terms and conditions given the different contracts from different providers.
  3. Integration: The goal is to integrate external data with existing first-party data. Organizations may face challenges such as insufficient utility, outdated data or insufficient refresh rates that impact the effective use of resources. The integration process requires careful consideration of these issues.

Harnessing the power of external data


The use of external data brings challenges, including legal aspects such as regulatory compliance and internal governance, as well as the protection of personally identifiable information (PII). Managing multiple formats, ensuring data quality and addressing privacy concerns are essential for regulatory compliance and protecting sensitive information. Creating value from external data requires strategic planning, careful execution and a nuanced understanding of the challenges involved.


Monetizing Data

The monetization of data offers companies lucrative opportunities to generate revenue from their data assets.

Monetizing Proprietary Data: Companies can capitalize on their data by turning proprietary information into money. This involves identifying valuable data sets, packaging them into consumable data products and offering them for sale or exchange. For example, a retail company could monetize data on customer purchases by making it available to market researchers. Similarly, a real estate portal can offer its market data to banks and insurance companies seeking insights into the real estate market. Treating data as a valuable asset allows companies to tap into new revenue streams while retaining ownership and control of their proprietary information.

Exploring New Revenue Streams: In order to tap into new sources of revenue, companies need to identify untapped markets and customer segments for their data products. This strategic approach goes beyond traditional offerings. For example, a transportation company could generate revenue by offering real-time traffic data to city planning authorities or logistics companies. The key is to think creatively about how the data can fulfill specific business needs to drive diversified revenue streams.

Strategic Partnerships: Collaborating with other organizations to share data is mutually beneficial and extends the reach and value of data products. Secure data sharing platforms or participation in existing data marketplaces foster a collaborative ecosystem where organizations can share valuable data sets. For example, a financial institution could partner with a technology company to share anonymized transaction data to improve fraud detection algorithms. These partnerships provide access to a wider range of data, enriching the offering and creating additional revenue streams.

Challenges of Data Monetization: Identifying potential data customers and understanding their data usage requires market research, engagement and a deep understanding of business needs. Mastering legal and regulatory frameworks, especially in terms of data privacy and compliance, is crucial for ethical and lawful data monetization.

Commercial Marketplaces: Commercial data marketplaces such as Databricks Marketplace and Neudata platform facilitate the monetization of data by providing platforms for companies to showcase and sell their data products. These platforms provide a structured environment that simplifies the complex process of data exchange.


More than just Data: Exchanging AI models and more…

Traditional data marketplaces are evolving to include a wider range of assets, including AI models, notebooks, dashboards and applications. The shift towards more comprehensive marketplaces is exemplified by platforms such as Databricks Marketplace .

Traditional Marketplaces: Roadblocks for Providers and Consumers: Data providers face challenges in monetizing diverse assets beyond raw data and encounter limitations in reaching users securely across platforms. The lack of secure technology and standardized governance makes data sharing even more complex. Meanwhile, consumers struggle with narrow focus, lengthy search processes and delayed insights, often exacerbated by lock-in to a particular provider.

Databricks Marketplace: Open Marketplace for Data and AI: The Databricks Marketplace goes beyond traditional data marketplaces and promotes an open platform for the discovery, exchange and monetization of comprehensive solutions in a secure environment. It extends the scope to notebooks, ML models, dashboards, solution accelerators, applications and more, marking a paradigm shift from the traditional data-centric approach.


Databricks Marketplace, ? Databricks Inc.


The Databricks Marketplace enables providers to monetize a wide range of assets beyond data sets. It solves the challenge of secure user accessibility across platforms and enables providers to expand securely. The marketplace ensures secure sharing across clouds, regions and data platforms, fostering trust between providers and customers.

The Databricks Marketplace expands consumer access to assets beyond data sets, offering ML models, notebooks, applications and solutions. It simplifies the evaluation of data products through pre-built notebooks and sample data. The marketplace prevents vendor lock-in by encouraging open sharing and collaboration. Users can work seamlessly across clouds, regions and platforms, integrating their favorite tools and work environments without vendor-specific restrictions.

Solution Accelerators: Databricks has launched Solution Accelerators for various industries in the Marketplace, offering pre-built solutions for financial services, healthcare, communications, media, retail and consumer goods. These free accelerators aim to accelerate time-to-value for data practitioners.

AI Model Sharing: In response to demand from companies new to AI, Databricks will enable the sharing of AI models on its Marketplace . With this feature, users can access both open source and proprietary AI models, making it easier for data consumers and providers to discover and monetize AI models. Users can seamlessly evaluate models with rich preview images, and Databricks curates and publishes open source models for common use cases. This advancement accelerates innovation in organizations using the Databricks Lakehouse Platform for both real-time and batch inference. This represents a significant step in the integration of AI into data solutions and underscores the marketplace's commitment to a holistic exchange of valuable assets.


Real-time Data Auctions

Real-time programmatic data exchange is transforming the interaction between data providers and consumers. This section delves into the paradigm shift introduced by real-time model enrichment and prompt augmentation, with a focus on the Arcus Data Enrichment Platform as a key player in this field.


Real-Time Programmatic Data Exchange,


Prompt Enrichment: Prompt Enrichment connects prompts directly to the Arcus Data Platform and serves as a turnkey solution for model enhancement. It seamlessly integrates relevant features from internal and external data sources and automates the matching of prompts with valuable external data. This extends the context for generative models in real time.

Model Enrichment: Model Enrichment enhances AI models by incorporating external signals and data alongside first-party data to improve performance. By integrating with ML frameworks such as PyTorch, the process requires minimal additional code and ensures the automatic integration of high-quality external data sources into ML workflows while maintaining data privacy for internal data.

Real-time Programmatic Data Exchange: The vision is to create a real-time programmatic data exchange, similar to programmatic ad buying. This exchange would bring data providers and consumers together, streamline licenses and automate transactions, which would fundamentally change the dynamics of the data economy.

Discovery and Procurement: Imagine a data exchange where providers work together, licenses are standardized and transactions are program-driven. Data consumers formulate tasks and assign values to each improvement unit. The exchange autonomously identifies relevant third-party data (3PD), conducts real-time auctions based on customer budgets, and optimally selects subsets of 3PD to fulfill their requirements. In this way, the problems of discovery and value extraction that occur with traditional data marketplaces are solved.

Real-time Data Auctions: As valuable prediction tasks are performed continuously, the exchange becomes a hub for repetitive transactions that ensure continuous value creation. Real-time auctions for each prediction task improve the flexibility of the ecosystem and enable the seamless integration of new data providers and consumers. This evolution parallels the shift in ad buying from offline and manual to real-time, programmatic and measurable.

Economic Incentives: The proposed real-time programmatic data exchange offers economic incentives for both data providers and consumers.

  • Improved Discoverability: streamlined transactions and standardized terms improve discoverability and accelerate the data economy.
  • Broader Market Expansion: Simplified transactions make the data economy accessible to a wider audience and significantly expand the overall market.
  • Optimized Pricing Mechanism: Auction-based pricing ensures fair deals and allows providers to charge different prices based on consumers' individual ratings.
  • Insights for Data Providers: Demand aggregation provides valuable insights to data providers and helps them prioritize product development based on consumer demand and willingness to pay.

Challenges: The vision of a real-time programmatic data exchange faces commercial and technical challenges. Commercial issues include various data licenses, market resistance to disintermediation and the introduction of new pricing models. Technical challenges include improving semantic type recognition, efficient data discovery, meaningful linking of data and consideration of data security aspects.

Despite these obstacles, the real-time programmatic data exchange aims to transform the data economy through a dynamic, measurable and inclusive ecosystem. The potential benefits of improved discoverability, increased liquidity and optimized value make this paradigm shift a compelling proposition for the future of data marketplaces.

Conclusion

A robust internal data marketplace is a critical data management hub that seamlessly connects the key components of Data Mesh. Delivering data products involves overcoming challenges such as valuation, discoverability, product suitability, access control, customization, SLAs and usage tracking. Integration into Data Mesh requires more than a traditional data catalog, which underscores the importance of a data marketplace on a data exchange platform.

The value of external data, especially third-party and alternative data, is critical to improving business insights and decision making. Leveraging external data requires overcoming challenges related to discovery, sourcing and integration, including legal and privacy considerations. Monetizing data offers lucrative opportunities that require strategic partnerships, diversified revenue streams and participation in commercial data marketplaces such as Databricks.

Traditional data marketplaces are evolving to incorporate AI models, notebooks and applications, as exemplified by the comprehensive Databricks Marketplace. This evolution simplifies accessibility for users, prevents vendor lock-in and speeds up the evaluation process with pre-built solutions. The marketplace also introduces solution accelerators for different industries and plans to enable the sharing of AI models.

The vision of real-time programmatic data exchange creates economic incentives by addressing challenges through improved discoverability, broader market expansion, optimized pricing mechanisms and insights for data providers. Despite the obstacles, the potential benefits of improved discoverability, increased liquidity and optimized value make this paradigm shift compelling for the future of data marketplaces.

References

Giustino Di Donato

Ceo and Founder A-Fold houses - ?????????Modular Homes - International Partner presso World Business Angels Investment Forum

7 个月

Axel, thanks for sharing!

回复
Fassahat Ullah Qureshi

Senior Data Engineer | LinkedIn Top Data Engineering Voice | I help people land their first data analytics jobs | Data Mentor | Highly Passionate about Data Analytics

9 个月

Great insights Axel Schwanke keep up the awesome work

要查看或添加评论,请登录

社区洞察

其他会员也浏览了