ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Data Pipeline

Nazir Ahammad Syed

Data Architect | AWS | Snowflake Cloud | Python | DevOps | Data warehouse | Automation Expert

å‘å¸ƒæ—¥æœŸ: 2024å¹´8æœˆ15æ—¥

A data pipeline refers to a series of processes that involve ingesting, moving, and transforming raw data from various sources to a designated destination. Typically, the data at this destination is utilized for purposes such as analysis, machine learning, or other business functions.

1.? Data Pipeline?Architecture

A data pipelineâ€™s architecture consists of four main three components:?

1.1 Data?Providers.

1.2 Data?Processing (ETL/ELT).

1.2.1 Orchestration Process.

1.3 Target/Data?Consumers.

1.1 Data?Providers

Common data sources are...

On-Premises?source systems -?application databases, APIs, Applications servers files from an SFTP server
Cloud?-?AWS, Azure, GCP
SaaS?-?Salesforce, workday data
Streaming/Edge?â€“?Machine data, Connect device data, logs?

1.2 ?Data?Processing (ETL/ELT)

Data processing refers to the transformations (Change Data Capture - CDC) that need to be applied to the data within the pipeline. This typically involves cleaning, filtering, and applying business-specific logic to the data.

1.2.1 Orchestration Process

Based on applications and business needs different types of tools are used.

For example:

- For simple pipelines scheduling. Cron schedule can be used.

- For advanced or complexed pipelines, workflow (Airflow, Control-Mâ€¦) based orchestrator are more appropriate.

Orchestration can be classified into two main types: batch processing (the most common) and real-time processing.

1.3 ?Target/Data?Consumers

The target where we send/deliver our data. Most common data targets are databases or data storage areas designed for analytics, such as a data warehouse or data lake.

Ramesh Kemidi

Senior SQL/BI Developer

7 ä¸ªæœˆ

Thanks for sharing.!

èµž

å›žå¤

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Nazir Ahammad Syedçš„æ›´å¤šæ–‡ç«

Data Warehouse

2024å¹´8æœˆ27æ—¥

Data Warehouse

A data warehouse (DW or DWH), also referred to as an enterprise data warehouse (EDW), is a system designed forâ€¦

1 æ¡è¯„è®º
Data Mart

2024å¹´8æœˆ26æ—¥

Data Mart

A data mart is a specialized access pattern within data warehouse environments, designed to retrieve client-facingâ€¦
Operational Data Store (ODS)

2024å¹´8æœˆ26æ—¥

Operational Data Store (ODS)

An operational data store (ODS) serves as a key component for operational reporting and provides data to the enterpriseâ€¦
Online Transaction Processing (OLTP)

2024å¹´8æœˆ25æ—¥

Online Transaction Processing (OLTP)

Online Transaction Processing (OLTP) is a type of database system designed for managing transaction-orientedâ€¦

2 æ¡è¯„è®º

Data Pipeline

Nazir Ahammad Syed

Data Architect | AWS | Snowflake Cloud | Python | DevOps | Data warehouse | Automation Expert

Nazir Ahammad Syedçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Data powers the workflow How Confluent Kafka and ServiceNow workflow Datafabric is fuelling modern day data architectures

ServiceNow Workflow Data Fabric | Unlocking the Power of Unified Automation

(2/3) Message Patterns in Enterprise Integration Patterns (EIP) - Advanced Patterns

The Canonical Data Model (CDM) in IT Enterprise Architecture: Enhancing Integration, Governance, and Emerging Trends

Splunk OpsBridge integration with ZigiOps

Data Model Evolution For Future PLM Platforms

Transforming Data Management with Teamcenter Classification

Salesforce Data Cloud-Triggered Flow Explained

Informatica's Intelligent Data Management Cloud (IDMC)

Unlocking Data Silos: A Guide to Boomi Integration for Data Connectivity

Nazir Ahammad Syedçš„æ›´å¤šæ–‡ç«

Data Warehouse

Data Mart

Operational Data Store (ODS)

Online Transaction Processing (OLTP)

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Data powers the workflow How Confluent Kafka and ServiceNow workflow Datafabric is fuelling modern day data architectures

ServiceNow Workflow Data Fabric | Unlocking the Power of Unified Automation

(2/3) Message Patterns in Enterprise Integration Patterns (EIP) - Advanced Patterns

The Canonical Data Model (CDM) in IT Enterprise Architecture: Enhancing Integration, Governance, and Emerging Trends

Splunk OpsBridge integration with ZigiOps

Data Model Evolution For Future PLM Platforms

Transforming Data Management with Teamcenter Classification

Salesforce Data Cloud-Triggered Flow Explained

Informatica's Intelligent Data Management Cloud (IDMC)

Unlocking Data Silos: A Guide to Boomi Integration for Data Connectivity

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†