MLOps Training Logs: Day 10
Apoorv Mishra
Game Dev Generalist | Programmer | Technical Artist | Unreal Engine | C++ | Unity | C# | AR/VR/XR | Android | Windows | Cloud Computing
In the last class, we started eliminating features on the basis of the p-value. But, we didn't really know what p-value was.
We often read in places that a certain study shows this or something of that sort. These studies are studies done by scientists. But, not every study gets accepted. In order for a study to be accepted, it needs to be below a certain probability of wrongness to be accepted by the scientific society, and hence, by the world. This probability is known as the p-value. The standard set by the world is 5% or 0.05. This means that a study is accepted if it is correct 95% times, or, only 5% times wrong. We need not concern ourselves with the logic behind this 5%.
Till now all the work that we'd been doing of training an OLS model, checking the weights, removing the features, etc. was done by us programmatically on the Jupyter notebook. But, this process can be made fairly simple using a graphical tool called Gretl. It is a tool that allows us to load our raw data, separate the dummy variables, train our data, and find out all the summary using the OLS model, just as we did before, with the press of a few buttons.
If you ever asked someone the difference between machine learning and deep learning, or why one is preferred over the other, you'd probably get an answer something about accuracy, complexity, etc. The reality is, they are the same, at least at the core. Both try to find the correct weights while trying to reduce the loss function. The difference is in the implementation. While simple machine learning algorithms like linear regression and KNN are known as the traditional method of machine learning, deep learning is known as the new approach to machine learning. This approach of deep learning was needed mainly because of three reasons:
- The accuracy of the traditional models stops increasing after a point. At this point, even lots of training may only increase the accuracy very slightly. Whereas, the deep learning algorithms continue to grow.
- The traditional models are not ideal for huge datasets.
- Last, but maybe the most important difference, it automatically does the work of feature selection, which makes it greatly efficient.
Face Recognition:
We used the LBPH model supplied with the cv2 module. Like any model, it needs a dataset to train on, which was made automatically by running a script that clicked our picture at regular intervals. The photos were changed to grayscale, as that is how machine learning is always done with pictures. Haarcascade_frontalface_default model was used to detect the face. The faces were stored in a folder and a dataset was created by adding labels to the photos. Then it was simply trained, just like any other model, using model.fit(). The model.predict() function returns a tuple having the label and a confidence value. The confidence value is a measure of how confident the model is that the face it sees matches with the label given by it.
This program has many practical applications, mostly for security reasons. But it can be used for fun too, like playing songs according to the person sitting in front of it.