AI/Machine Learning And Data Pipelines
Image Credit: CC0 (Adapted / Pixabay)

AI/Machine Learning And Data Pipelines

Data Pipelines are the arteries that bring fresh and cleansed data to your AI/Machine Learning engine's heart. If you are a Data-driven AI/Machine Learning Practitioner you are already familiar with one or more of the following open sourced frameworks that help with Data Pipelines: Linkedin Azkaban, Spotify Luigi, Pinterest Pinball, or Airbnb Airflow.

If you are beginning this journey you should take a look at this excellent article by Robert Chang AirBnB. Also, check out this talk by Maxime Beauchemin where he discusses how to use Airflow to author workflows as Directed Acyclic Graphs (DAGs) of tasks.

So what is a Data Pipeline DAG? Visually, a node in a graph represents a Pipeline task, and an arrow represents the dependency of one Pipeline task on another. Given that data only needs to be computed once on a given task and the computation then carries forward, the graph is directed and acyclic. This is why Airflow jobs are commonly referred to as “DAGs” -Directed Acyclic Graphs

One of the cool things about Airbnb’s open-sourced tool Airflow is its UI. It helps visualize and enable management of complex Data Pipelines. It allows any users to use (Python) code as configuration to visualize a Pipeline's DAG . The author of a Data Pipeline must define the structure of dependencies among tasks in order to visualize them.

As noted in this thoughtful article:

"Code as a workflow also allows you to reuse parts of DAG’s if you need to, reducing code duplication and making things simpler in the long run. This reduces the complexity of the overall system and frees up developer time to work on more important and impactful tasks"

Making Data Pipelines simpler is a key focus of the AWS managed service AWS Glue. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog.

Once cataloged, your data is immediately searchable, queryable, and available for further wrangling activity. AWS Glue generates the code to execute your data transformations and data loading processes.

How do you deal with your Data Pipelines today? Do share your thoughts on how you see this evolving - drop me a note privately or via the comment section below.

About the Author:

Madhu cherishes the opportunity to learn and collaborate; he has three decades of experience on how to nurture the emergence of beachhead market ideations worldwide. Note that what is expressed by Madhu here is of his own interest and is in no way reflective of his employer.

要查看或添加评论,请登录

Madhu Raman的更多文章

  • Agentic AI: Transforming Enterprise Automation Beyond Simple Productivity Gains

    Agentic AI: Transforming Enterprise Automation Beyond Simple Productivity Gains

    Disclaimer: Views expressed in this article are personal and are not the opinions of my employer, Amazon Web Services…

    2 条评论
  • AI Agent Security for Automation Executives

    AI Agent Security for Automation Executives

    The Dawn of Autonomous Enterprise. For enterprise automation, Day 1 of the AI agent revolution is unfolding, and with…

    2 条评论
  • AWS Machine Learning Stack Update

    AWS Machine Learning Stack Update

    What new AWS #MachineLearning Stack services have been added by Amazon Web Services? Here is an update as of December…

  • AI/Machine Learning and forecasting

    AI/Machine Learning and forecasting

    This article is about Amazon Forecast a fully-managed time series forecasting service that helps customers leverage…

  • AI/Machine Learning and contextual personalization

    AI/Machine Learning and contextual personalization

    This article introduces Amazon Personalize a fully-managed Machine Learning service that supports use cases that…

  • Deploy Intelligent Robotic Applications

    Deploy Intelligent Robotic Applications

    Some of you reached out in response to my post about Amazon Web Services announcing AWS RoboMaker at re:Invent. The…

    1 条评论
  • Custom Natural Language Processing

    Custom Natural Language Processing

    Without Machine Learning skills you can use Natural Language Processing and use custom entities and classification on…

  • AI, Machine Learning, and IoT

    AI, Machine Learning, and IoT

    The intersection of AI, Machine Learning, and IoT presents new opportunities to create value for your business…

  • AI/Machine Learning And Facial Micro-Expression Detection

    AI/Machine Learning And Facial Micro-Expression Detection

    The use of AI/Machine Learning in Affective computing--systems that can recognize, detect, and respond to human…

  • Top 5 AI/Machine Learning Capability Gaps

    Top 5 AI/Machine Learning Capability Gaps

    McKinsey & Company Michael Chui, James Manyika, and Mehdi Miremadi have written a very topical article on AI/Machine…

    1 条评论

社区洞察

其他会员也浏览了