登录查看更多内容

Few Machine Learning Topics

Ganesha Swaroop B

|17+ yrs exp Software Testing|Author| Mentor|Staff SDET|Technical Writer|Technology Reasearcher|Java|Pytest|Python|Allure|ExtentReports|BDD|Jenkins|SME|Self Taught Data Science and ML Engineer

发布日期: 2024年8月31日

Hello Everyone,

Its been a while i was going through understanding the need for ML engineers to create a ML pipeline and what it is all about?. To simplify this confusion here are some explaination that come from my understanding of what and why it is necessary to have a ML pipeline other than a Data Pipeline and who uses them in specific within the Data Science Team.

What is ML PipeLine?

An ML Pipeline is means basically the process of :

Gathering Structured raw Data from Data Warehouse
Data Extraction (Involves a little bit of data cleaning operations)
Feature Engineering
Model Training and Testing
Deploying the Model into production from the staging environment

In order to do the above tasks individually one has to understand that Data Engineers would have designed and automated the Data Pipeline which happens to provide and store structured data in data warehouses within the Big Data Infrastructure. For example some companies might be using Snowflake as a Data Warehouse or Some other companies might be using Athena as their Data Warehouse for storing huge volumes of business data. Irrespective of which one they use the important thing to focus here is the same data from these servers are shared to Data Scientists, Data Analysts and ML Engineers but the problem definition for each individual team will be different.

So ML Engineers start off with gathering the raw data that is provided to them from the Server access and using programming skills they tend to filter the data into structured data frames using the pandas libraries to give them Raw data that can be used for resolving a particular business issue with the product / implement a certain intelligence feature into the product.

ML models or Machine Learning Models can be classified into 3 types of learning models that include:

Classical Learning - SVM, SVD, PCA, Linear Regression could be some of the learning models used to classify and do the analysis part of data given and then the best model is chosen.
Ensemble Learning - XGBoost, ADA Boost could be one of the learning models used to improve the prediction capability of the designed ML Model.
Reinforced Learning - Logistics Regression can be one of the learning models used by ML engineers to design a ML Model.

The reason you need these learning models is to be able to learn from the given set of business data and then be able to predict the result after the model is trained and tested with new set of business data.

After this is done ML engineers next try to conduct what is called a cross validation process that helps the ML engineer to understand which of the data classification ML Method suits the classification best out of the available set of methods. This ensures that the ML engineer need not have to ask some one to tell him/her about which ML method should he/she use for the given raw data. When the ML engineer feeds the Extracted dataframe into most of the ML methods like Linear Regression, Polynomial Regression, Lasso Ridge Regression, Logistics Regression, Support Vector Machines, Singular Vector Dimensionality, Random Forest, Decision Trees, LDA, T-sne, PCA or others the resulting analysis will show the best model that suits the extracted data and offers the best analysis.

After this is completed and the ML model identified the ML engineer tries to build the model into a program that implements the Model parameters and performs the analysis as a solution to the business problem. After designing the model it is time to train the model on training set of data and then test the prediction capability of the designed model. Now the decision to use Supervised learning techniques or Unsupervised Learning Techniques or Ensemble learning or Reinforced Learning technique depends on the model designed and then if it is supervised learning then we have to have 2 sets of data one for Training the model and another set for Testing the model however the testing data will be a subset of training data. If we use the Reinforced Learning technique then the training set data will be entirely different from the testing set data.

ML Models may use Neural Networks but not necessary to use neural networks all the time. There can be some ML Models without neural networks also.

Once the above set is completed it is necessary to fine tune your model by changing the parameters accordingly and then Deploy this successful model into production environment where this model can start working for the product or for the business.

ML engineers use Libraries like Pandas, Keras, Pytorch or TensorFlow to design the ML model and hence one must learn these libraries and how to write programs using these libraries. Usually these models are written on cloud environment using the google collab in most of the cases.

Can we Automate ML Pipelines?

领英推荐

Machine Learning Algorithms Every Data Scientist…

Quantum Analytics NG 9 个月前

How This Machine Learning Engineer Is Carving His Own…

Chick-fil-A Corporate Support Center 3 年前

5 quick but proven tips to implement machine learning…

Naveen Joshi 6 年前

Yes we can automate ML Pipelines using certain configuration libraries in Python and it becomes necessary because building an ML model takes time and more importantly classification of the data and extraction of data from database takes a lot of time and hence it makes sense to try and automate the ML pipeline once after it is fully developed and deployed in production.

How is ML Pipeline different from Data Pipeline?

ML pipeline only concentrates on fitting the best ML model into the raw data of the business while Data Pipeline concentrates on capturing the business data, Extracting only specific business data needed to analyze the business data and Load it to a Data Warehouse for the possibility of designing AI models that help improving a business based on Data.

What is Supervised and Unsupervised Learning?

Supervised learning is a ML technique where there are ML methods by which one can train a designed ML model to learn its response to the data being fed in situations where ML engineer knows both the input and the output.

However Unsupervised learning is a ML technique where ML engineer may know the input data however he cannot know the exact output predicted by the ML model.

What are the Classification ML Methods?

One of the important tasks of ML engineer is to be able to gather correct data that can resolve the problem specified by implementation and designing of a suitable AI ML model and therefore classification of data becomes that much more important. Therefore there are several data analysis and data classification methods in ML such as SVM, Random Forest, Descision Trees, Naive Bayes and others to assist a ML engineer in this regard. However these techniques can also be used by a Data Scientist to analyze data for a different perspective.

What is Data PipeLine?

Data Pipeline is front end work where a Data Engineer good in programming writes a program that can automate the process of capturing the business data from various sources across a platform and then ensure that the flow of data is un interupted to the Data Scientists and Data Analysts so as to help their business analysis. This can happen on a cloud infrastructure or on Local premises also.

Data Pipeline can use AWS or Azure or GCP services along with python programs to automate this process. Data Engineer stops when the pipeline is ready and working fine at all times and makes sure that he/she monitors this data flow at all times.

I hope this gives a fair idea as to how ML engineers role differs from that of Data Scientist and Data Analyst and Data Engineer.

Thanks,

Swaroop

Amarendra Pani

"Senior QA Lead with 10+ Years of Experience | Mastering Excellence in Software Quality & Elevating Team Dynamics"

6 个月

The future of technology. Great to share it .Ganesha Swaroop B Thanks much buddy.

1 次回应

Monika A.

6 个月

?? Machine Learning is transforming industries! Did you know that 97% of businesses use AI to boost productivity? Embracing ML can lead to smarter decisions and greater efficiency. Don't get left behind! #AIRevolution

查看更多评论

要查看或添加评论，请登录

Ganesha Swaroop B的更多文章

How is the work of Data Science Team members impact the Sales of a Business Product?

2024年7月28日

How is the work of Data Science Team members impact the Sales of a Business Product?

Hi Everyone, While my exploration into understanding Data's impact on Business based software product sales continues i…
Understanding about Why you need python and Spark SQL when working with PySpark.

2024年7月25日

Understanding about Why you need python and Spark SQL when working with PySpark.

Hi Everyone, As i am exploring more about what is data and how data is ingested under Big Data i am able to understand…
What are the things to learn when AWS cloud services are used for a Data Engineering Project?

2024年7月19日

What are the things to learn when AWS cloud services are used for a Data Engineering Project?

Hi Everyone, As i was trying to figure out how does a data pipeline get created on Cloud environment I was able to get…

3 条评论
What and who uses the Python Libraries like Pandas, NumPy, SciPy, Jupyter Notebook in relation to Data Science and the roles of its members?

2024年7月16日

What and who uses the Python Libraries like Pandas, NumPy, SciPy, Jupyter Notebook in relation to Data Science and the roles of its members?

Hi Everyone, So there is a lot more to Data Science as a field than just knowing about python and a bit of SQL queries…
What is the use of SQL, Power BI, Snowflake, PySpark, Apache Spark, JupyterLab, Spark SQL, Python in Data Science?

2024年7月16日

What is the use of SQL, Power BI, Snowflake, PySpark, Apache Spark, JupyterLab, Spark SQL, Python in Data Science?

Hi Everyone, Data Science is a vast field that encompasses a lot of things. It would be difficult to put all things at…
What is Data Scientist, Data Engineer and Data Analyst?

2024年7月15日

What is Data Scientist, Data Engineer and Data Analyst?

Hi Everyone, Today with the rainy whether i wanted to shed a little knowledge about Data Science and its other…

2 条评论
Diving into the uses of Python

2024年6月6日

Diving into the uses of Python

After a lot of thought i felt it could help a few people if i could give some information as to how and where Python…
A little more detail about using Pytest with Selenium:

2024年5月20日

A little more detail about using Pytest with Selenium:

Basically Pytest is a library that can be integrated with Selenium Library to automate regression test cases and just…
Understanding about Java script based Frameworks

2024年1月4日

Understanding about Java script based Frameworks

There are 2 types of Frameworks when it comes to Information Technology. This happens with every language and platform…
Why and What are the things to concentrate for the future of SDET/Software Testing?

2023年12月31日

Why and What are the things to concentrate for the future of SDET/Software Testing?

Hi Everyone, First let us understand "Why" you must concentrate on upgrading in certain areas for the future of SDET…

See all articles

Few Machine Learning Topics

Ganesha Swaroop B

|17+ yrs exp Software Testing|Author| Mentor|Staff SDET|Technical Writer|Technology Reasearcher|Java|Pytest|Python|Allure|ExtentReports|BDD|Jenkins|SME|Self Taught Data Science and ML Engineer

领英推荐

Ganesha Swaroop B的更多文章

社区洞察

其他会员也浏览了

Machine Learning is an Iterative Process

MLOps for Data Scientists

Data Transformation Challenges: Master the Art of Data Partitioning for Ultimate AI and ML Training Success!

Building a Machine Learning Data Pipeline: Best Practices & Strategies

5 Best Machine Learning APIs for Data Science

Data Cleaning and Transformation for Machine Learning

ML Algorithms usage Part1: Understanding the usage of Linear and Logistic Regression in Data Science, ML, AZ ML and Gen AI

Machine Learning for Developers (ML4Devs Newsletter, Issue 1)

KDnuggets 16:n07: Deep Learning For Everyone; Amazon Machine Learning

Who should be doing machine learning at your company?

领英推荐

Ganesha Swaroop B的更多文章

How is the work of Data Science Team members impact the Sales of a Business Product?

Understanding about Why you need python and Spark SQL when working with PySpark.

What are the things to learn when AWS cloud services are used for a Data Engineering Project?

What and who uses the Python Libraries like Pandas, NumPy, SciPy, Jupyter Notebook in relation to Data Science and the roles of its members?

What is the use of SQL, Power BI, Snowflake, PySpark, Apache Spark, JupyterLab, Spark SQL, Python in Data Science?

What is Data Scientist, Data Engineer and Data Analyst?

Diving into the uses of Python

A little more detail about using Pytest with Selenium:

Understanding about Java script based Frameworks

Why and What are the things to concentrate for the future of SDET/Software Testing?

社区洞察

其他会员也浏览了

Machine Learning is an Iterative Process

MLOps for Data Scientists

Data Transformation Challenges: Master the Art of Data Partitioning for Ultimate AI and ML Training Success!

Building a Machine Learning Data Pipeline: Best Practices & Strategies

5 Best Machine Learning APIs for Data Science

Data Cleaning and Transformation for Machine Learning

ML Algorithms usage Part1: Understanding the usage of Linear and Logistic Regression in Data Science, ML, AZ ML and Gen AI

Machine Learning for Developers (ML4Devs Newsletter, Issue 1)

KDnuggets 16:n07: Deep Learning For Everyone; Amazon Machine Learning

Who should be doing machine learning at your company?