登录查看更多内容

Data: Essential for Enterprises

Ashok Palani

Senior Technology Leader ? Business Transformation + Product, Design & Engineering

发布日期: 2024年6月30日

Imagine data as the lifeblood of your organization, a vast reservoir of untapped potential. But just like raw water, data alone is undrinkable, useless without refinement. Embark on a journey through the intricate waterworks of data management, where each stage transforms this raw resource into a refreshing stream of knowledge, insights, and ultimately, wisdom.

The Essentials

Data (raw water): is unprocessed, has no meaning and is the representation of facts as text, numbers, graphics, images, sound, or video.

Information (clean water): is data in context – without context, data is meaningless.

Knowledge (bottled water): is information in perspective, integrated into a viewpoint from patterns (trends), information (assumptions) and experience (relationships).

Insights (sparkling water): are knowledge that has been analyzed and has led to new understanding.

Wisdom (distilled water): is the most refined form of knowledge and has led to deep understanding.

The Data Lifecycle

Creation – generation (rainwater collection) involves the creation of raw, unprocessed information from various sources (website, sensor readings, transactional data from a business) and capture (storing rainwater) involves collecting and storing the generated data.

Storage – warehousing (reservoir) involves organized storage of processed data for reporting and analysis, lakes (natural lake) involves raw, unorganized storage of data in its original format and archiving (ice formation) involves storing rarely used data for long-term preservation.

Processing – transformation (filtering and purifying water) involves converting data to a usable format, like removing errors or converting units, enrichment (adding minerals to water) involves combining data with additional information for more insights, aggregation (merging different streams of water) involves combining data from different sources to create summaries, cleaning (removing debris) involves fixing errors and inconsistencies in the data and integration (connecting different bodies of water) involves combining data from different systems or sources into a unified view.

Analysis – descriptive (examining the properties of water) involves summarizing and describing the key features of the data; diagnostic (investigating why water level dropped in reservoir) involves identifying the root cause of a problem or issue in the data; predictive (forecasting weather based on current conditions) involves using data to predict future trends or outcomes and prescriptive (recommending how much water to release from dam) involves suggesting the best course of action based on data analysis.

Visualization (creating maps and charts to represent water flow) involves presenting data in a visual format to make it easier to understand.

Reporting (summary of water quality and usage) involves generating reports to communicate findings and insights.

Sharing (transferring water between reservoirs) involves sharing data between different systems or organizations.

Collaboration (multiple users working together to monitor and manage a water system) involves allowing multiple users to access and work on the same data.

Governance (regulations and policies governing water usage) involves establishing and enforcing rules for managing and using data.

Security (building a dam to prevent flooding) involves implementing measures to protect data from unauthorized access or misuse.

Retention (deciding how long to keep water in a reservoir) involves determining how long data should be stored.

Disposal (draining or evaporating water no longer needed) involves safely destroying data that is no longer required.

The Data Tools

Data Analytics, Intelligent Decisions & Knowledge Based Actions?(water filter): cleans the data and makes it useful for making decisions.

Data Access and Delivery (water pipe): delivers the data to where it needs to go.

Data Quality (purity of water): is important to make sure that the data is accurate and reliable.

Data Integration (water treatment plant): cleans and combines data from different sources.

Data Security?(water lock): protects the data from unauthorized access.

Data Modeling?(water map): shows the relationships between different data points.

Transactional Data?(running water): is constantly flowing and changing.

Analytical Data?(still water): is collected and stored for analysis.

Metadata?(water labels): provides information about the data.

Master Data?(water source): is the foundation for all other data.

Reference Data?(water quality classifications at the reservoir): provides information about different data sources.

Data Characteristics?(water properties): describe the data in terms of its size, type, and format.

Datastore and Database Technologies?(water storage systems): store the data and make it accessible.

Data Migration?(rerouting water): is the process of moving data from one datastore or database to another.

Data Lifecycle Management?(water management): is the process of managing the data from its creation to its disposal.

Systems Engineering & Administration?(water engineering & management): is the process of designing, building, and maintaining the systems that store and manage the data.

Canonical

Data Types (water, ice, steam): each data type (string, integer, date) has its own unique properties and can be used to power different types of machines.

Grammar?(rules): governing how we use data rules for how to put together data to form data structures.

Domain Values?(fresh, salt, brackish): domain value has its own unique properties and can be used for different purposes.

RegEx Patterns?(types of filters): used to clean water can remove specific characters from text.

Connector

Enterprise Resource Planning: ERP?(types of water pipes): acts as a central hub, ensuring data integrity and consistency across different functions within an organization.

Hadoop?(large reservoir of water): breaks down large datasets into smaller chunks, distributes them across a cluster of computers, and then combines the results to generate valuable outputs.

Protocols

Simple Object Access Protocol: SOAP (water bottle or water hose): is a messaging protocol that uses XML to define messages and their structure and is often used to communicate between different applications.

Representational State Transfer: REST? (river): is an architectural style for designing web services and is based on the use of HTTP verbs to represent different actions, such as GET, POST, PUT, DELETE.

X12?(dam): is a standard for exchanging electronic data between businesses and is used in a variety of industries, such as healthcare, manufacturing, and retail.

Financial Information eXchange: FIX?(water wheel): is a standard for exchanging financial data between trading partners and is used in a variety of financial markets, such as the stock market and the bond market.

Society for Worldwide Interbank Financial Telecommunication: SWIFT?(canal): is like a secure network for transmitting financial messages between banks, with a standardized set of protocols and message formats.

Open Database Connectivity: ODBC?(faucet): is like a common language for accessing databases, allowing applications to connect to and query various databases using a single API.

Native Application Programming Interface: API?(well): is like a specific dialect for communicating with a particular software, providing a direct and efficient way to access the functionality of the software.

Formatting

Extensible Markup Language: XML?(river): is a structured format that flows in a specific order where each piece of data has a specific name and location that makes it easy to find and understand the data.

JavaScript Object Notation: JSON?(lake): is a less structured format that can be arranged in any way where each piece of data has a name, but its location is not fixed and makes it more flexible than XML, but also more difficult to understand.

Comma Separated Value: CSV?(stream): is a very simple format that stores data in rows (single piece of data) and columns (attribute of that data) that makes it easy to read and understand, but it is not as flexible as other formats.

Fixed Length?(bucket): is a format that stores data in fixed-length fields where each field is a specific size, and each piece of data must fit into its corresponding field that makes it very efficient, but it is also inflexible.

Unstructured Text?(waterfall): is a format that does not have any structure where the data can be arranged in any way, and there is no way to know what each piece of data represents that makes it very difficult to read and understand, but it is also very flexible.

Binary (gasoline): is a format that is not human-readable that is a series of 1s and 0s representing data that makes it very efficient, but it is also very difficult to use.

Event Processing

Routing?(river): is the process of directing events to different destinations and events can be routed based on their type, source, or other criteria.

Simple Event Processing: SEP?(dam): used to filter events, aggregate events, or perform simple operations on events.

领英推荐

Data Pipelines by the Numbers: Key Metrics to Track

Muhammad Ishtiaq Khan 5 个月前

Data Observability: a comprehensive guide

Samuel Desseaux 6 个月前

Fundamentals of Data Analysis: A Breakdown of the Four…

Diogo Ribeiro 5 个月前

Complex Event Processing: CEP?(hydroelectric power plant): used to identify patterns in events, correlate events, or perform complex operations on events.

Transformation

Data Mapper?(water pipe): takes in raw data and transforms it into a format that can be used by other applications.

Data Libraries?(reservoir): store data that can be used by data mappers and other applications.

Utilities?(water treatment plants): clean and process data so that it can be used by data mappers and other applications.

Transformation Rules (water filters): specify how data should be transformed.

Cross-Referencing

Source Keys?(source of water like river or well): original data that is being used.

Master Keys?(master valve): used to control the flow of data from the source keys to the target keys.

Target Keys?(reservoir or city): are the data that is being used for analysis or reporting.

Cross-walks?(pipelines): are used to connect the source keys and the target keys.

Lineage?(water history): tracks the flow of data from the source keys to the target keys.

Flow

Data Flow Logic?(pipes): rules of the data flow logic determine how the data is processed and where it goes.

Machine Learning?(machine): that learns how to filter water data is used to train the machine learning model, and the model is then used to make predictions.

Pattern Detection?(learns to identify patterns in the water): algorithm is used to find patterns in the data, and the patterns are then used to make predictions.

Validation

XSD Schema Validation?(water filter): specifies the rules that the XML document must follow, and the XSD schema validation tool checks to see if the XML document meets those rules.

Schematron and other rules-based validation?(fuel gauge): specify the requirements that the XML document must meet, and the rules-based validation tool checks to see if the XML document meets those requirements.

Profiling and Reconciliation

Column?(water pipe): is the smallest unit of data in a database and can be used to store data of a specific type, such as numbers, text, or dates.

Table?(reservoir): collection of columns that are related to each other.

Cross-Table?(map): is a table that shows the relationship between two or more tables.

Cross-System Profiling?(comparing water levels in reservoir): process of comparing data from different systems.

Quality Metrics Reporting?(monitoring water quality): process of tracking the quality of data over time.

Access Management

Identity & Entitlements Management?(water distribution system): ensures that the right people have access to the right resources.

Secure Token?(key to a lock): provides access to a resource by verifying the identity of the user.

Directory Services?(map of the water system): provides a central repository of information about users, groups, and resources.

Protection

Format Preserving Encryption & Key Management?(encrypting water): type of encryption that preserves the format of the data being encrypted; encrypted data will still look like the original data, even though it is no longer readable; is often used to encrypt sensitive data, such as credit card numbers and social security numbers.

Cleansing and Standardization

Data Quality Controls?(remove impurities): data cleansing is the process of identifying and correcting inaccurate, incomplete, irrelevant, or duplicated data; data standardization is the process of converting data into a common format.

Monitoring and Management

Data Quality Monitoring, Exception Analysis & Handling?(process of filtering water): data needs to be monitored to remove errors and inconsistencies; exception analysis can be compared to the process of identifying and investigating data leakage; exceptions need to be handled to prevent further data damage.

Access and Delivery

Data Warehousing (large lake): A central repository for structured, historical data used for reporting and analysis. It's like a massive library for data.

Data Lakes (natural lake): A vast storage space for raw, unprocessed data in various formats (structured, semi-structured, and unstructured). It's like a large lake that collects water (data) from different sources.

Data Marts (smaller pond): Subsets of a data warehouse focused on specific business functions or departments. Think of them as smaller, specialized libraries within the main one.

Extract, Transform, Load: ETL (filtering, purifying and storing water in a reservoir): The process of extracting data from various sources, transforming it into a consistent format, and loading it into a target system.

Data Pipelines (series of pipes and filters that purify and process water): A series of steps that transform and move data from source to destination.

Data Virtualization (map): Creating a unified view of data from disparate sources without physically moving it. It's like having a virtual map that shows you where all the water sources (data) are located without needing to collect them in one place.

Data Streaming (river): The continuous flow of data in real-time, like a river constantly supplying fresh water. It enables real-time analysis and decision-making.

Cascading Style Sheets: CSS?(water fountain design): determines the appearance and layout of web pages, controlling fonts, colors, spacing, and other visual elements.

Micro Front End: MFE (neighborhood water towers): independent frontend applications that work together to form a larger web application, each responsible for a specific section or feature.

Microservices (underground pipe network): small, independent services that work together to form a larger application, each responsible for a specific function.

Governance

Data Architecture Management?(water supply system): ensures that the data is properly organized and accessible.

Data Development?(water treatment plant): ensures that the data is clean and accurate.

Database Operations Management?(water distribution system): ensures that the data is delivered to the right people at the right time.

Data Security Management?(water security system): ensures that the data is protected from unauthorized access.

Master Data Management: MDM?(water reservoir): ensures that there is a single source of truth for the data.

Data Warehousing/Big Data for Business Intelligence?(water tower): stores the data for later analysis.

Enterprise Content Management: ECM and Document Management Systems: DMS?(water sprinkler system): helps to manage the flow of data.

Metadata Management?(water meter): tracks the usage of data.

Data Quality Management: DQM?(water quality system): ensures that the data is of high quality.

Just as water is essential for life, data is the lifeblood of modern organizations. By understanding the intricate processes and tools involved in data management, we can unlock its full potential, turning a deluge of raw information into a wellspring of knowledge that empowers us to make informed decisions and thrive in the digital age.

要查看或添加评论，请登录

Ashok Palani的更多文章

Cloud Computing Decoded: From Restaurants to Cities, Simplifying Tech with Everyday Analogies

2025年3月22日

Cloud Computing Decoded: From Restaurants to Cities, Simplifying Tech with Everyday Analogies

Cloud computing, often perceived as complex, mirrors everyday systems from restaurants to cities, simplifying its…
The Services Offering: Data's Value Proposition

2024年12月11日

The Services Offering: Data's Value Proposition

Services drive value by optimizing the intersection of hardware, software, and workloads. From project-based consulting…
The Workload Environment: Data's Driving Force

2024年12月9日

The Workload Environment: Data's Driving Force

Workloads are the lifeblood of digital success, seamlessly powering data, insights, and operations. Across 98…
The Software Ecosystem: Data's Digital Foundation

2024年12月6日

The Software Ecosystem: Data's Digital Foundation

In a data-driven era, software is the catalyst of transformation, weaving together the threads of innovation and…
The Hardware Estate: Data's Physical Foundation

2024年12月4日

The Hardware Estate: Data's Physical Foundation

In a data-driven world, hardware is the invisible force fueling innovation. It bridges the physical and digital realms,…
Application Layer: Where Digital Delights Are Served

2024年7月8日

Application Layer: Where Digital Delights Are Served

Imagine your computer network as a vast and intricate digital post office, where information is carefully packaged…
Presentation Layer: Making Data Understandable

2024年7月7日

Presentation Layer: Making Data Understandable

Imagine mailing not just letters, but entire libraries and symphony orchestras across the digital landscape. That's the…
Session Layer: Opening and Closing the Communication Lines

2024年7月6日

Session Layer: Opening and Closing the Communication Lines

Ever wondered how your digital messages traverse the vast expanse of the internet, arriving intact and in order? Think…
Transport Layer: Ensuring Reliable Data Transportation

2024年7月5日

Transport Layer: Ensuring Reliable Data Transportation

Ever wondered how your emails, videos, and online chats make their way across the vast digital landscape? It's like a…
Network Layer: Finding the Fastest Route for Your Data

2024年7月4日

Network Layer: Finding the Fastest Route for Your Data

Imagine your computer network as a bustling postal service, with data packets as letters zipping across the digital…

See all articles

Data: Essential for Enterprises

Ashok Palani

Senior Technology Leader ? Business Transformation + Product, Design & Engineering

领英推荐

Ashok Palani的更多文章

社区洞察

其他会员也浏览了

The Data Paralysis Trap – Are You Into One?

10 Mistakes Every Data Analyst Should Avoid

5 big data statistical analysis methods

Data Pigmentation: Unlocking the Hidden Colors of Quality and Insight

Should You Be Excited About Data Classification Market 24.8% CAGR?

Data Strategy: Unleashing the power of enterprise data

Driving better insights through better data - What, where and how.

Driving better insights through better data - What, where and how.

8 Steps to Data Analysis: A Detailed Guide

How to Develop a Data Analysis Plan?

领英推荐

Ashok Palani的更多文章

Cloud Computing Decoded: From Restaurants to Cities, Simplifying Tech with Everyday Analogies

The Services Offering: Data's Value Proposition

The Workload Environment: Data's Driving Force

The Software Ecosystem: Data's Digital Foundation

The Hardware Estate: Data's Physical Foundation

Application Layer: Where Digital Delights Are Served

Presentation Layer: Making Data Understandable

Session Layer: Opening and Closing the Communication Lines

Transport Layer: Ensuring Reliable Data Transportation

Network Layer: Finding the Fastest Route for Your Data

社区洞察

其他会员也浏览了

The Data Paralysis Trap – Are You Into One?

10 Mistakes Every Data Analyst Should Avoid

5 big data statistical analysis methods

Data Pigmentation: Unlocking the Hidden Colors of Quality and Insight

Should You Be Excited About Data Classification Market 24.8% CAGR?

Data Strategy: Unleashing the power of enterprise data

Driving better insights through better data - What, where and how.

Driving better insights through better data - What, where and how.

8 Steps to Data Analysis: A Detailed Guide

How to Develop a Data Analysis Plan?