Unlock the Data Driven Intelligence & Innovation
Wilson Leung
Regional Channel Director, Hong Kong, Macau | Advisor helping enterprise to innovate with Gen AI Powered Intelligent Business Process Automation, Data Governance & Identity Security Management | UiPath
#boomi #ipaas #datadiscovery #datacatalog #datamanagementplatform #dataintegration #dataorchestration #dataops #masterdatamanagement #datafabric #dataanalytics
As the life of a public companies has dropped from 55 years to 20 years to 11 years, one distinction has become clear between digital disruptors and legacy businesses. Disruptors are good at data and most importantly, they do not suffer a siloing of their business or use duct tape and band aides to stich their businesses and data together. And instead of trying to build an operational backbone, disruptors are applying digital technologies especially data and AI to transform industry business models and value propositions. How does a legacy business respond? Read on!
Why do Analytics and Data matter?
Today, digital business models are changing how businesses provide value and extract revenues from customers. Marco Iansiti and Karim Lakhani suggest in their book “Competing in the Age of AI” says that AI is causing the corporation and the concept of scale to be rethought. They give the example of Ant Financial. With a staff of a few thousand people, Ant has grown from 10 million to 700 million users. Ant’s secret is using an integrated data platform that uses AI to power application processing, fraud detection, credit scoring, and loan qualification. Everything at Ant is automated via analytical models.?
To contrast this, JP Morgan Chase—a much larger business—has 82 million online customers with 250,000 employees worldwide. Data and analytics blow up what Michael Porter in “Competitive Strategy” say as the competitive advantage of physical scale and size. Born digital Fintech companies like Ant and SoCal in the U.S. are using digital operating models that leverage data and analytics to transform the financial services industry. In the long run, Marco and Karim see a collision coming between disruptors with industry incumbents. Applying AI to business model thinking and operating models, considers data’s ability to drive a broad variety of functions including personalization, revenue optimization, and recommendations as well as sophisticated analytics to understand the value created by potential digital products and services. One more thing disruptors understand the connection between experimentation and innovation. They use sophisticated experimentation that allows organizations to learn and understand the opportunities and risks provided by new features and digital products. This, in the end, transforms value creation, capture, and delivery.
According to Iansiti and Lakhani, “data is the fuel that powers the AI factory. (Competing in the Age of AI, Page 58)” They go onto say that the AI factory requires a state-of-the-art data platform. Without this, they say that “a traditional IT custom built process takes orders of magnitude more time and cost and becomes a nightmare to maintain and update. (Page 72). Even worse without a data platform you often make data scientist put their time and energy into data munging. So much so that some talk about data scientists being really data janitors.?
Typical Data Issues
The data leaders that we talk to—CIOs and CDOs--think of data like oil. Like oil, they say that data by itself isn’t usable. It isn’t unusable unless it's refined. As well, it is expensive to store when it is not used. Data leaders claim that dirty data is extremely common across all industries. The question they ask is do business leaders want data equivalent of kerosene or diesel fuel? Both come from refinement, but you have to know what you want from data. Their position here is very similar to what Tom Davenport said many years ago, “you can’t be really good at analytics without really good data. (Analytics at Work, Tom Davenport, Page 23)” A key element of doing this as you will discover is building a data platform. Let’s now take a look at what is required to do so.
Establishing Data Pipelines
Iansiti and Lakhani believe that organizations building data capabilities start by establishing a ‘data pipelines’. They claim that by establishing state-of-the-art data platform, organizations can power the AI factory that digital businesses require. Specifically, an effective data platform enables data to flow through APIs via publish and subscribe framework—to be clear there can be internal only catalogs and partner catalogs as well. For example, “DBS Bank offers its outside partners access to more than 200 API enabled digital components (Designed for Digital, MIT Press, page 107).?
Iansiti and Lakhani suggest that the massive amount of data captured by users, suppliers, partners, and employees is extremely valuable and should not be stored in ad hoc fashion. Instead, this data needs governance and security. For this reason, they believe it is essential that organizations build a secure, a centralized system for careful data security and governance, defining appropriate checks and balances on access and security, inventorying assets, and providing all users necessarily protections.
A key element of the data management today is the concept of a data pipeline. The data pipeline completes several key functions. These include gathering, inputting, cleaning, integrating, and safeguarding data in systematic, sustainable, and scalable way. Let’s dig into each of these:
Gather—This typically is about integrations between internal or external systems.
Clean—This is about making data of sufficient quality for analytics. As such, it focuses on accuracy, consistency, timeliness, validity, and completeness of data
Normalize—This is eliminating redundancy/inconsistency within data sources
Integrate—This is about aggregating data from multiple sources inside and outside an organization
Safeguard—This is about governing and controlling access to data across all data transoms.?
With these achieved, the goal is to make data clean, consistent, and available to applications. Marco and Karim believe that what is wanted is a self-serve, data supermarket. Here data is aggregated, cleaned, refined, processed, and made available through consistent interfaces. And then it uses a publish-subscribe methodology via APIs to make itself available. Analyst firms call integration services, data pipelines, semantics, and API management for integrated data delivery a ‘data fabric’. The goal is attaining reusable and augmented data. They see the combination of data management and integration technology, architecture design and services delivered across multiple deployment and orchestration platforms as a digital business accelerant.??
Above is a view into what a system for data management looks like. Data comes in as events, messages, steams, and batch. This data is from operational systems, 3rd party sources, new edge data sources. In turn, its subscribers include operational models, data sources, reporting, and self-service. If it is known and prepared data, it can be used immediately for data modeling and digital products after it has been discovered. Where is unknown data, it needs whether structured or unstructured to run through the data pipeline and to be presented as an API.
Accelerating Data Pipelines Using Boomi
While Boomi does not provide every element of a data fabric or data pipelines, we believe that the Boomi Atomsphere Platform can be an accelerant to organizations working to ensure data readiness across business silos and the supporting data fabric. Given this, how does the Atompshere platform accelerate the establishment of a data fabric and data pipelines?
领英推荐
Discovery and Catalog
Given the above, the Boomi Data Platform starts with catalog and discovery. Users must be able to find the data they’re interested in regardless of where it is located— on-premises, in the cloud, in SQL databases, or NoSQL stores and irrespective of its form, be it structured or semi-structured. To assist here, Boomi creates a robust metadata business glossary and search supported by Natural Language Processing (NLP)??that returns a comprehensive view of datasets, jobs, and workflow schedules.?
ETL, Data Preparation, Data Quality
Once data is discovered, it can be moved using the platform’s modern ETL technology to a data warehouse or data lake or in-memory database. At this point, the Boomi uses a recommendation engine to automatically cleanse, enrich, normalizes, and transform data seamlessly to prepare data for use by the business. This automatically creates profiles, categorizes, and tags for data to make it even more discoverable in search results. At the heart of the prep tools is the data preparation AI engine, which recommends data cleansing and normalizing tasks and predicts how a user want to join multiple data sets from various sources to gain business insights.
At the same time, understanding where data comes from (its provenance and lineage) is essential to determining the validity of the data for each application or analysis. The next level of comprehension delivered in the Boomi??Platform is making it easy for users to rate the quality of a data set.?
To search for, trust, and analyze data relationships more effectively, organizations need models and systems that can store data relationships and make them discoverable. Knowledge graphs, network-based representations, and graph databases specialize in these capabilities, enabling business users, analysts, and AI applications to navigate relationships found in implicit and explicit data. Boomi’s Enterprise Knowledge graph can encode relationships between files, and go beyond the traditional data catalogs that just mine metadata. Plus, users can recommend improvements in metadata descriptions so the next person to use the data set doesn’t have to decipher a meaningless attribute description to understand if it’s relevant to their analysis. Data lineage is a core feature of the governance pillar. Understanding where data sets were created and how derived data sets were transformed helps every user establish confidence about the accuracy of the data. The open architecture of the Boomi Platform makes integration with third-party lineage solutions easy.
Normalize and Integrate
This involves putting together the shared nouns—the data quality logic. The goal is to get rid of duplication and to achieve a golden record regardless of data source. To do this, the Boomi data platform allows you to look at the fields and establish validation and enrichment rules. This includes matching, classification, and governance. With embedded data stewardship processes, you have a process to deal with data elements that do not conform to rules. Much of this is accomplished using matching rules but other data quality rules can be built and used with an integration process to invoke a 3rd party API. An example is to an address validation.
Data Stewardship and Safeguard
Data stewards can use the Boomi Platform to select any data attribute in any data set and then select a function to mask that data. Similar control can be applied to row-level data. This allows a user to see personally identifiable information (PII) on some records but not others. This is especially important for compliance with regulations such as the EU General Data Protection Regulation (GDPR). At the same time, the?Boomi Platform enables?data steward to automate detection of more than 30 data types, including personally identifiable information (PII), credit card numbers, Social Security numbers, phone numbers, FICO scores, and URLs. The platform’s AI technology can be trained to automatically discover other common data types as required by data stewards. With governance rules, this data can be protected throughout its life cycle using embedded privacy enabling technology including masking. Many industries—for example—have defined what personally identified information is. For example, HIPAA has 18 identifiers of PII. Being able to automatically identify and control risk exposure points can reduce business risk.
Data APIs
The Boomi data platform allows APIs to be seamlessly created for data once it has been refined through the data pipeline. This as suggested by Marco and Karim is through a publish and subscribe modality. Depending upon whether the data provided is being monetized or not, APIs are placed in a catalog for internal and external consumption.
Boomi Data Ops Services
The goal of DataOps is to enable agile and repeatable data readiness across the data fabric and emerging data pipelines. To do this, Boomi helps organizations deploy agile practices for data management to support increasingly diverse use cases and use case requirements. The goal is to enabling organizations to address their data challenges through agile, collaborative, and change-management-friendly approaches to building and managing data.
Specifically, Boomi helps organizations kick start their enterprise data management strategies by providing ensuring data readiness for data governance, master data management, application integration, data integration, meta data management, data catalog, data warehousing, data analytics, business intelligence, and data operations. Part of achieving this at Boomi is a DataOps structure including data roles and responsibilities. These link upward to broader strategies and be supported by a data management strategy.
Boomi provides the key building block to deliver DataOps that establishes data management capabilities, this links with other building blocks to align with business and technology strategies. Following are the building blocks and data architecture capabilities.
Manage Data at the Speed of Business
Historically, data pipelines have been completely put together by customers. Each component was a piece of installable code with no sharable best practices. For this reason, it has taken organizations years to assemble their data platforms. And for many organizations their platforms lack agility and scalability. As organizations reconsider where they put their applications and data (on-premises, private cloud, public cloud, or hybrid cloud), they should consider whether they will continue to persist their legacy middleware and business intelligence software. These cost a small fortune to build and maintain. It is time for a single, modern data capabilities.?
About Boomi
Boomi is the pioneer and one of the enterprise integration platform as a services (iPaaS) leaders.??One of the leading data integration, orchestration and transformation solution in the market. Trusted by more than 20,000 customers globally for its speed, ease-of-use, and lower total cost of ownership, the Boomi AtomSphere Platform is cloud-native, unified, scalable, open, secure, and intelligent. As the pioneer of cloud-based iPaaS and in fueling the intelligent use of data, Boomi simplifies and streamlines our customers’ ability to deliver Integrated Experiences that are underpinned by trusted data, and connect applications, processes, and devices for better human engagement and?accelerated business outcomes.