MLOps Training Logs: Day 4
Apoorv Mishra
Game Dev Generalist | Programmer | Technical Artist | Unreal Engine | C++ | Unity | C# | AR/VR/XR | Android | Windows | Cloud Computing
Machine Learning:
Thus far we know, a "model" is the brain of the program that it uses to make judgments/predictions based on the data it has been fed. It is like we humans learn from our experiences. The program takes experience from the data it is fed, and the process of gaining experience is called "training".
The model is a mathematical function. There are different models for machine learning, but the most basic, where all beginners start, is the Linear Regression.
Till now we knew that the formula that our model uses is: y = cx. Where y is the dependent variable, x is the predictor and c is the weight, also known as the coefficient. This kind of model is fine for ideal data, for example:
But this is an ideal data set, where each x is related to each y by the same coefficient. In reality, the datasets are more complex, and such a formula fails badly. Also, on plotting the line for y = cx, the line always passes through the origin, stating that for x = 0, y always must be 0. This is not true for many cases. To handle this problem, we were introduced to another term, biases.
Biases are constants that tell us what would be the value of y if x was 0. Biases also add on to the product of weight and x to give a more accurate result, as each x, or feature has some weightage on the result and the result is also biased towards certain features. I say "each x" because for each y, or dependent variable, there may be many "x"s that it may depend on, for example, results of a student depend on many more factors, than just duration of studies.
The machine uses a hit and trial method for the calculation of weights and biases, that we'll learn in further classes.
Salary Estimator App using ML and Python
After getting to know some basics about ML and python, its time to build a very simple user application using what we learn, because we are "MLOps". This app will be called "Salary Estimator" and will help the HR of a company judge what salary can be offered based on years of experience. Of course, this is not practical, but it is a fun learning exercise.
To start with, we used a dataset that contained two columns, one for "years of experience" and the other for "salary". The dataset was less ideal than the one shown above. We used the LinearRegression model from sklearn.linear_model library. We trained the model on the dataset and exported the trained model to a new file using the dump function of the joblib module from the sklearn.externals library. We need to export the trained model so that we do not need to train it every time we run the program, and we can quickly just get the predictions.
Next, we create a new python program, where we load the trained model using the load function of the same python module that was used to dump the trained model. Then, we take input from the user and use the model.predict() function to predict the expected salary, and print the prediction as output.
Image Processing and Computer Vision:
Just like humans eyes help us with vision, that feeds the image data into our brain, the computer's or our program's brain, i.e, the model, can also be fed images by the vision of the computer, i.e, the webcam, so that it may be trained on the image data. Computer vision has other uses as well.
To begin with, we can use the cv2.VideoCapture() function from the cv2 library. This function can be used to read input from a video file, an image sequence, or the webcam. For reading from a file, pass the filename as an argument, or pass an int (0, 1, 2, etc.) for selecting the camera, if only one camera is connected, or in a laptop, for the in-built camera, pass 0 as the argument. This function returns a <VideoCapture object>, that constantly reads from the camera.
Once we have an object that is capturing videos from the webcam, it can be used for anything. Today we used the read() function of the <VideoCapture object> to get a snapshot from the webcam. Then we performed basic functions like cropping, changing parts of the picture, moving one part of the picture to others, etc. already covered in the previous class.
We also used the resize() function of the cv2 library for resizing a part of the image.
Python:
The variables in python are called references. In order to save memory and increase efficiency, python gives ids to data items. This means that when we have two variables having the same data item, they are both pointing to the same location. For example:
This means that a list/array has to be explicitly copied to another using a copy function if we don't want the changes made in the copied list to be visible in the original list as well. There are many ways to explicitly copy in python.
RedHat8 + Python3 Training(1):
The most important part of this training was, from the beginning, it was made clear, the technology we know is not as important as our core concepts. We do not want to become users of technology, we want to become developers of technology, and for that, we need to know the background working of the technology or tools we are using, so that we may develop new solutions ourselves.
Also, it is important to have a good programming knowledge, also, you should have a good knowledge of operating systems, because after all, we need operating systems to run our programs.
All operating systems have two ways to interact: GUI(Graphical User Interface) and CLI(Command Line Interface).
Behind the scenes, all the GUI is a program. That program can be run from the command-line interface. So, each program is also a command.
which command can be used to find the program file of a command.
Hackers are geniuses that know everything about a program.
If we can browse the data of the RAM, we can access some very sensitive information.
Cookies contain sensitive data, like sessions, etc. and must be protected.
cd command to go to a directory.
Pipelining( | ) can be used to pass the output of one command as input to another command. ex: date | espeak-ng will make the speaker of the system speak out the date.
RedHat8 + Python3 Training(2):
Installing RedHat on virtual box.
Installing OS on a system is called Bare Metal. Using software like Virtual Box, it is called virtualization.
You can't run another command on a terminal while one is running.
"Ctrl + c" terminates the program running on the terminal.
"Ctrl + z" only pause the currently running program and put in the program. jobs command can be used to see the background programs. fg(stands for foreground) command can be used to resume a paused program. Use & command to run a program in the background without pausing it.
echo command can be used to print any string to the terminal. To tell the echo command to treat the passed string as a command, put the string in backquotes(` `). Example: echo `date` will print the output of the date command. echo command always adds a line break after the string, to avoid that, use "echo -n". Use "echo -e" to enable escape sequences, that aren't supported by default.
man command can be used to know detailed information about a command.
To execute multiple commands at once, put the commands in a file and save it. Then use command bash to run the file. Syntax: bash <filename>.
Using the above commands along with a while loop to make a program that shows live time.
Program:
while : ; do echo -n -e "`date +%T` `sleep 1`\b\b\b\b\b\b\b\b\b" ; done &
Ideas
Trip Advisor:
Make a database of a number of people and their travel preferences and their favorite tourist spots. The preferences are the features or predictor variables and their favorite tourist spot is the y or the dependant variable. Train a model on this dataset. Use this model to suggest tourist places to people based on their travel preferences.