Few Machine Learning Topics
Ganesha Swaroop B
|17+ yrs exp Software Testing|Author| Mentor|Staff SDET|Technical Writer|Technology Reasearcher|Java|Pytest|Python|Allure|ExtentReports|BDD|Jenkins|SME|Self Taught Data Science and ML Engineer
Hello Everyone,
Its been a while i was going through understanding the need for ML engineers to create a ML pipeline and what it is all about?. To simplify this confusion here are some explaination that come from my understanding of what and why it is necessary to have a ML pipeline other than a Data Pipeline and who uses them in specific within the Data Science Team.
What is ML PipeLine?
An ML Pipeline is means basically the process of :
In order to do the above tasks individually one has to understand that Data Engineers would have designed and automated the Data Pipeline which happens to provide and store structured data in data warehouses within the Big Data Infrastructure. For example some companies might be using Snowflake as a Data Warehouse or Some other companies might be using Athena as their Data Warehouse for storing huge volumes of business data. Irrespective of which one they use the important thing to focus here is the same data from these servers are shared to Data Scientists, Data Analysts and ML Engineers but the problem definition for each individual team will be different.
So ML Engineers start off with gathering the raw data that is provided to them from the Server access and using programming skills they tend to filter the data into structured data frames using the pandas libraries to give them Raw data that can be used for resolving a particular business issue with the product / implement a certain intelligence feature into the product.
ML models or Machine Learning Models can be classified into 3 types of learning models that include:
The reason you need these learning models is to be able to learn from the given set of business data and then be able to predict the result after the model is trained and tested with new set of business data.
After this is done ML engineers next try to conduct what is called a cross validation process that helps the ML engineer to understand which of the data classification ML Method suits the classification best out of the available set of methods. This ensures that the ML engineer need not have to ask some one to tell him/her about which ML method should he/she use for the given raw data. When the ML engineer feeds the Extracted dataframe into most of the ML methods like Linear Regression, Polynomial Regression, Lasso Ridge Regression, Logistics Regression, Support Vector Machines, Singular Vector Dimensionality, Random Forest, Decision Trees, LDA, T-sne, PCA or others the resulting analysis will show the best model that suits the extracted data and offers the best analysis.
After this is completed and the ML model identified the ML engineer tries to build the model into a program that implements the Model parameters and performs the analysis as a solution to the business problem. After designing the model it is time to train the model on training set of data and then test the prediction capability of the designed model. Now the decision to use Supervised learning techniques or Unsupervised Learning Techniques or Ensemble learning or Reinforced Learning technique depends on the model designed and then if it is supervised learning then we have to have 2 sets of data one for Training the model and another set for Testing the model however the testing data will be a subset of training data. If we use the Reinforced Learning technique then the training set data will be entirely different from the testing set data.
ML Models may use Neural Networks but not necessary to use neural networks all the time. There can be some ML Models without neural networks also.
Once the above set is completed it is necessary to fine tune your model by changing the parameters accordingly and then Deploy this successful model into production environment where this model can start working for the product or for the business.
ML engineers use Libraries like Pandas, Keras, Pytorch or TensorFlow to design the ML model and hence one must learn these libraries and how to write programs using these libraries. Usually these models are written on cloud environment using the google collab in most of the cases.
Can we Automate ML Pipelines?
领英推荐
Yes we can automate ML Pipelines using certain configuration libraries in Python and it becomes necessary because building an ML model takes time and more importantly classification of the data and extraction of data from database takes a lot of time and hence it makes sense to try and automate the ML pipeline once after it is fully developed and deployed in production.
How is ML Pipeline different from Data Pipeline?
ML pipeline only concentrates on fitting the best ML model into the raw data of the business while Data Pipeline concentrates on capturing the business data, Extracting only specific business data needed to analyze the business data and Load it to a Data Warehouse for the possibility of designing AI models that help improving a business based on Data.
What is Supervised and Unsupervised Learning?
Supervised learning is a ML technique where there are ML methods by which one can train a designed ML model to learn its response to the data being fed in situations where ML engineer knows both the input and the output.
However Unsupervised learning is a ML technique where ML engineer may know the input data however he cannot know the exact output predicted by the ML model.
What are the Classification ML Methods?
One of the important tasks of ML engineer is to be able to gather correct data that can resolve the problem specified by implementation and designing of a suitable AI ML model and therefore classification of data becomes that much more important. Therefore there are several data analysis and data classification methods in ML such as SVM, Random Forest, Descision Trees, Naive Bayes and others to assist a ML engineer in this regard. However these techniques can also be used by a Data Scientist to analyze data for a different perspective.
What is Data PipeLine?
Data Pipeline is front end work where a Data Engineer good in programming writes a program that can automate the process of capturing the business data from various sources across a platform and then ensure that the flow of data is un interupted to the Data Scientists and Data Analysts so as to help their business analysis. This can happen on a cloud infrastructure or on Local premises also.
Data Pipeline can use AWS or Azure or GCP services along with python programs to automate this process. Data Engineer stops when the pipeline is ready and working fine at all times and makes sure that he/she monitors this data flow at all times.
I hope this gives a fair idea as to how ML engineers role differs from that of Data Scientist and Data Analyst and Data Engineer.
Thanks,
Swaroop
"Senior QA Lead with 10+ Years of Experience | Mastering Excellence in Software Quality & Elevating Team Dynamics"
6 个月The future of technology. Great to share it .Ganesha Swaroop B Thanks much buddy.
?? Machine Learning is transforming industries! Did you know that 97% of businesses use AI to boost productivity? Embracing ML can lead to smarter decisions and greater efficiency. Don't get left behind! #AIRevolution