登录查看更多内容

Building a Machine Learning Pipeline – Exploration and Data Processing

Ankush Seth

CTO @ Mi Analyst | Helping businesses accelerate growth and efficiency with Gen AI

发布日期: 2019年8月27日

In this three-part blog series, we are going to explore how to build a machine learning pipeline (defined below). Each part will cover one key area of the pipeline building process. I am a big believer in hands-on learning, and therefore as part of this journey the entire end-to-end process will be exemplified by building a functional machine learning pipeline. Some experience using machine learning models or frameworks like PyTorch, as well as experience in building web services will come in handy during the practical portion of this series.

So lets get started…

What is a machine learning pipeline ?

Organizations trying to leverage AI or machine learning capabilities generally find themselves in a conundrum of where to begin. They have all this structured and unstructured data and are not sure how to connect it and utilize it. Sometimes this data is available in a data lake and other times spread over multiple data sources. In addition, knowing how different elements can be stitched together to reach the end goal of an intelligent insight can look like a daunting task. A machine learning pipeline is a step-wise approach to making sure one is able to derive value from the already available data. Afterall just having data and/or the machine learning model are no-good if there is no efficient way of connecting all the elements, and utilizing the magic that AI promises to deliver. In general, a pipeline provides a way to derive insights either in a batch or real-time manner from the available data.

The entire machine learning workflow/pipeline building process can be broken down into three major phases -

1) Exploration and Data processing

2) Modeling (Will be covered in Part 2)

3) Deployment (Will be covered in Part 3)

Exploration and Data processing

This is the initial preparation phase where we get the data ready for use. This phase can further be broken down into –

1) Gathering the Data – Organizing the data as files on the file system or database of some sort.

2) Exploration and Sanitization – This involves exploring and visualizing the data to map out the most interesting features within the data set. As part of sanitization we want to remove any outliers or errors that may potentially skew/create bias within the model.

3) Transformation – This step involves transforming the data (normalization, translation, encoding, etc.) so that it can be used to train the model.

In summary a successful machine learning journey starts off with the right data and making sure it is ready for consumption.

In the next article we will cover the modeling phase of machine learning workflow. After we cover all the phases we will go over a practical example of building the entire workflow using industry standard tools like Jupyter Notebooks, PyTorch, Anaconda, AWS API Gateway, etc. I will publish another article on how to get your machine learning environment setup on your laptop. Happy learning!

要查看或添加评论，请登录

Ankush Seth的更多文章

The Invisible Threat: How Prompt Injection and Leakage Undermine Security in LLM Applications

2024年2月28日

The Invisible Threat: How Prompt Injection and Leakage Undermine Security in LLM Applications

Large Language Models (LLMs) and their capabilities have been in the limelight in recent times. Specifically…

1 条评论
Gemini Vs GPT-4 - Battle of the titans

2023年12月11日

Gemini Vs GPT-4 - Battle of the titans

GPT-4 has been the State of the Art (SOTA) model in recent times for a number of generative AI use-cases but now we…

2 条评论
Integrated Gradients — Interpreting the LLM decision making process

2023年10月11日

Integrated Gradients — Interpreting the LLM decision making process

Large Language Models have attracted a lot of attention in recent times. Through the likes of ChatGPT these models have…
Understanding Back Propagation in human terms

2023年9月27日

Understanding Back Propagation in human terms

Deep learning neural networks and their fundamental building block, the perceptron, serve as a mathematical model…
Building a Machine Learning Pipeline – Deployment

2019年9月23日

Building a Machine Learning Pipeline – Deployment

Welcome Back! Hope you enjoyed the previous two articles on building a machine learning pipeline (Part 1, Part 2 for…
Building a Machine Learning Pipeline – Modeling

2019年9月4日

Building a Machine Learning Pipeline – Modeling

Welcome back everyone. Let’s dive into the Modeling aspect of the machine learning workflow.

See all articles

Building a Machine Learning Pipeline – Exploration and Data Processing

Ankush Seth

CTO @ Mi Analyst | Helping businesses accelerate growth and efficiency with Gen AI

Ankush Seth的更多文章

社区洞察

其他会员也浏览了

A Deep Dive into Ensemble Algorithms and Combining Multiple Models

How Does Data Science, Machine Learning, And Artificial Intelligence Overlap?

MLJAR AutoML

Unlock the Power of Machine Learning in Data Science & AI

Strategies for Improving Machine Learning Algorithms: Tips & Tricks

Knowledge graphs for Machine Learning are so cool !

Machine learning as a competitive advantage

XGboost

Feature Engineering in Machine Learning - Part 04

Machine Learning Algorithms

Ankush Seth的更多文章

The Invisible Threat: How Prompt Injection and Leakage Undermine Security in LLM Applications

Gemini Vs GPT-4 - Battle of the titans

Integrated Gradients — Interpreting the LLM decision making process

Understanding Back Propagation in human terms

Building a Machine Learning Pipeline – Deployment

Building a Machine Learning Pipeline – Modeling

社区洞察

其他会员也浏览了

A Deep Dive into Ensemble Algorithms and Combining Multiple Models

How Does Data Science, Machine Learning, And Artificial Intelligence Overlap?

MLJAR AutoML

Unlock the Power of Machine Learning in Data Science & AI

Strategies for Improving Machine Learning Algorithms: Tips & Tricks

Knowledge graphs for Machine Learning are so cool !

Machine learning as a competitive advantage

XGboost

Feature Engineering in Machine Learning - Part 04

Machine Learning Algorithms