ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

AI/Machine Learning And Data Pipelines

Madhu Raman

Leader Automations Business | AI Agents, Cloud Services, Machine Learning

å‘å¸ƒæ—¥æœŸ: 2018å¹´2æœˆ25æ—¥

Data Pipelines are the arteries that bring fresh and cleansed data to your AI/Machine Learning engine's heart. If you are a Data-driven AI/Machine Learning Practitioner you are already familiar with one or more of the following open sourced frameworks that help with Data Pipelines: Linkedin Azkaban, Spotify Luigi, Pinterest Pinball, or Airbnb Airflow.

If you are beginning this journey you should take a look at this excellent article by Robert Chang AirBnB. Also, check out this talk by Maxime Beauchemin where he discusses how to use Airflow to author workflows as Directed Acyclic Graphs (DAGs) of tasks.

So what is a Data Pipeline DAG? Visually, a node in a graph represents a Pipeline task, and an arrow represents the dependency of one Pipeline task on another. Given that data only needs to be computed once on a given task and the computation then carries forward, the graph is directed and acyclic. This is why Airflow jobs are commonly referred to as â€œDAGsâ€ -Directed Acyclic Graphs

One of the cool things about Airbnbâ€™s open-sourced tool Airflow is its UI. It helps visualize and enable management of complex Data Pipelines. It allows any users to use (Python) code as configuration to visualize a Pipeline's DAG . The author of a Data Pipeline must define the structure of dependencies among tasks in order to visualize them.

As noted in this thoughtful article:

"Code as a workflow also allows you to reuse parts of DAGâ€™s if you need to, reducing code duplication and making things simpler in the long run. This reduces the complexity of the overall system and frees up developer time to work on more important and impactful tasks"

Making Data Pipelines simpler is a key focus of the AWS managed service AWS Glue. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog.

Once cataloged, your data is immediately searchable, queryable, and available for further wrangling activity. AWS Glue generates the code to execute your data transformations and data loading processes.

How do you deal with your Data Pipelines today? Do share your thoughts on how you see this evolving - drop me a note privately or via the comment section below.

About the Author:

Madhu cherishes the opportunity to learn and collaborate; he has three decades of experience on how to nurture the emergence of beachhead market ideations worldwide. Note that what is expressed by Madhu here is of his own interest and is in no way reflective of his employer.

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Madhu Ramançš„æ›´å¤šæ–‡ç«

Rise of the Agentic Enterprise: Autonomous Operations with Intelligent Observability

2025å¹´3æœˆ25æ—¥

Rise of the Agentic Enterprise: Autonomous Operations with Intelligent Observability

Reinventing Observability Across enterprises globally, Automation Centers of Excellence have historically built theirâ€¦
Agentic AI: Transforming Enterprise Automation Beyond Simple Productivity Gains

2025å¹´3æœˆ12æ—¥

Agentic AI: Transforming Enterprise Automation Beyond Simple Productivity Gains

Disclaimer: Views expressed in this article are personal and are not the opinions of my employer, Amazon Web Servicesâ€¦

2 æ¡è¯„è®º
AI Agent Security for Automation Executives

2025å¹´2æœˆ25æ—¥

AI Agent Security for Automation Executives

The Dawn of Autonomous Enterprise. For enterprise automation, Day 1 of the AI agent revolution is unfolding, and withâ€¦

2 æ¡è¯„è®º
AWS Machine Learning Stack Update

2020å¹´12æœˆ10æ—¥

AWS Machine Learning Stack Update

What new AWS #MachineLearning Stack services have been added by Amazon Web Services? Here is an update as of Decemberâ€¦
AI/Machine Learning and forecasting

2018å¹´12æœˆ8æ—¥

AI/Machine Learning and forecasting

This article is about Amazon Forecast a fully-managed time series forecasting service that helps customers leverageâ€¦
AI/Machine Learning and contextual personalization

2018å¹´12æœˆ6æ—¥

AI/Machine Learning and contextual personalization

This article introduces Amazon Personalize a fully-managed Machine Learning service that supports use cases thatâ€¦
Deploy Intelligent Robotic Applications

2018å¹´11æœˆ28æ—¥

Deploy Intelligent Robotic Applications

Some of you reached out in response to my post about Amazon Web Services announcing AWS RoboMaker at re:Invent. Theâ€¦

1 æ¡è¯„è®º
Custom Natural Language Processing

2018å¹´11æœˆ20æ—¥

Custom Natural Language Processing

Without Machine Learning skills you can use Natural Language Processing and use custom entities and classification onâ€¦
AI, Machine Learning, and IoT

2018å¹´11æœˆ10æ—¥

AI, Machine Learning, and IoT

The intersection of AI, Machine Learning, and IoT presents new opportunities to create value for your businessâ€¦
AI/Machine Learning And Facial Micro-Expression Detection

2018å¹´2æœˆ9æ—¥

AI/Machine Learning And Facial Micro-Expression Detection

The use of AI/Machine Learning in Affective computing--systems that can recognize, detect, and respond to humanâ€¦

See all articles

AI/Machine Learning And Data Pipelines

Madhu Raman

Leader Automations Business | AI Agents, Cloud Services, Machine Learning

Madhu Ramançš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

ML Algorithms usage Part1: Understanding the usage of Linear and Logistic Regression in Data Science, ML, AZ ML and Gen AI

Machine Learning for Developers (ML4Devs Newsletter, Issue 1)

Exploring Data with AI: A Beginner's Guide to Data Analysis and Machine Learning

Visualisation from Json object using Tool

From Data to Insights: Building Machine Learning Models with Low-Code Tools

Harnessing Generative AI for Automated Superset Report Generation

â€˜Democratisingâ€™ Data with Databricksâ€™ Generative AI Code Assistant

5 Skills you should know to Master Machine Learning

06.06.2022 Executive Data Bytes â€“ How Explainable Should Your Data Science Solution Be?

Data Modeling for AI and ML Infrastructure setup

Madhu Ramançš„æ›´å¤šæ–‡ç«

Rise of the Agentic Enterprise: Autonomous Operations with Intelligent Observability

Agentic AI: Transforming Enterprise Automation Beyond Simple Productivity Gains

AI Agent Security for Automation Executives

AWS Machine Learning Stack Update

AI/Machine Learning and forecasting

AI/Machine Learning and contextual personalization

Deploy Intelligent Robotic Applications

Custom Natural Language Processing

AI, Machine Learning, and IoT

AI/Machine Learning And Facial Micro-Expression Detection

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

ML Algorithms usage Part1: Understanding the usage of Linear and Logistic Regression in Data Science, ML, AZ ML and Gen AI

Machine Learning for Developers (ML4Devs Newsletter, Issue 1)

Exploring Data with AI: A Beginner's Guide to Data Analysis and Machine Learning

Visualisation from Json object using Tool

From Data to Insights: Building Machine Learning Models with Low-Code Tools

Harnessing Generative AI for Automated Superset Report Generation

â€˜Democratisingâ€™ Data with Databricksâ€™ Generative AI Code Assistant

5 Skills you should know to Master Machine Learning

06.06.2022 Executive Data Bytes â€“ How Explainable Should Your Data Science Solution Be?

Data Modeling for AI and ML Infrastructure setup

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†