Data Pipeline

Data Pipeline

A data pipeline refers to a series of processes that involve ingesting, moving, and transforming raw data from various sources to a designated destination. Typically, the data at this destination is utilized for purposes such as analysis, machine learning, or other business functions.

1.? Data Pipeline?Architecture

A data pipeline’s architecture consists of four main three components:?

1.1 Data?Providers.

1.2 Data?Processing (ETL/ELT).

1.2.1 Orchestration Process.

1.3 Target/Data?Consumers.

?

1.1 Data?Providers

Common data sources are...

  • On-Premises?source systems -?application databases, APIs, Applications servers files from an SFTP server
  • Cloud?-?AWS, Azure, GCP
  • SaaS?-?Salesforce, workday data
  • Streaming/Edge?–?Machine data, Connect device data, logs?

?

1.2 ?Data?Processing (ETL/ELT)

Data processing refers to the transformations (Change Data Capture - CDC) that need to be applied to the data within the pipeline. This typically involves cleaning, filtering, and applying business-specific logic to the data.

1.2.1 Orchestration Process

Based on applications and business needs different types of tools are used.

For example:

- For simple pipelines scheduling. Cron schedule can be used.

- For advanced or complexed pipelines, workflow (Airflow, Control-M…) based orchestrator are more appropriate.

Orchestration can be classified into two main types: batch processing (the most common) and real-time processing.

1.3 ?Target/Data?Consumers

The target where we send/deliver our data. Most common data targets are databases or data storage areas designed for analytics, such as a data warehouse or data lake.

Ramesh Kemidi

Senior SQL/BI Developer

7 个月

Thanks for sharing.!

赞
回复

要查看或添加评论,请登录

Nazir Ahammad Syed的更多文章

  • Data Warehouse

    Data Warehouse

    A data warehouse (DW or DWH), also referred to as an enterprise data warehouse (EDW), is a system designed for…

    1 条评论
  • Data Mart

    Data Mart

    A data mart is a specialized access pattern within data warehouse environments, designed to retrieve client-facing…

  • Operational Data Store (ODS)

    Operational Data Store (ODS)

    An operational data store (ODS) serves as a key component for operational reporting and provides data to the enterprise…

  • Online Transaction Processing (OLTP)

    Online Transaction Processing (OLTP)

    Online Transaction Processing (OLTP) is a type of database system designed for managing transaction-oriented…

    2 条评论

社区洞察

其他会员也浏览了