Waste Segregation -> Convolution Neural Network.
creative market

Waste Segregation -> Convolution Neural Network.


Challenge

Waste management is a crucial concern in India. 

There is no automated waste segregation strategy at everyday household as well as Industry employs semi-automated machines for segregation. Hand-operated segregation of waste is deleterious to labor's health, Therefore an adequately automated, low cost and user-friendly segregation system is need of an hour.

Web App

Action

Instead of directly going for the robotic arm to sense the pressure and detect the waste. I tried classifying waste images using a well-known Deep Learning algorithm CNN. Deep Learning requires negligible domain expertise. You see I ain't a waste worker. But, It's a supervised learning algorithm so Data Engineer must have a look at waste images. :)

In the following, I'll brief up CNN and leave the mathematics as an out of scope for this article. This could be your good go-to guide for implementing CNN in Keras.

model plot
model summary

Convolution Neural Network

  • Convolution Neural Network is most commonly used for analyzing visual imagery. They're based on shared-weights architecture. This reduces the number of units in the network and produces fewer learning parameters. Thus, reducing the chance of overfitting the model. Also, it will be less complex than a fully connected network.

How are they good for images? How do they learn the feature of an image? Who tells them the features such as the shape of Banana different than an Apple?

  • I get this question so oftentimes, next time I'll just share the link to this Article. So, CNN finds patterns in an image by convoluting over them. In short, convolutions involve going through the image and applying a filter to find some patterns. In the first few layers of CNNs, the network can identify lines and corners, then pass these patterns down through neural net and start recognizing more complex features as they get deeper. This property makes CNN's really good at identifying objects in images.

How many Convolution layers, neurons in a layer are enough?

  • The number of Hidden Neurons and the number of Layers in a Convolution Neural Network is a problem specific. The complex features it has to recognize, the layers and neurons have to be increased.
What is weight-sharing?

CNN has multiple layers. Weight sharing happens across the receptive field of the neurons (filters) in a particular layer. Weights are the numbers within each filter. So essentially we are trying to learn a filter. These filters act on a certain receptive field/small section of the image. When the filter moves through the image, the filter does not change. The idea being, if an edge is important to learn in a particular part of an image, it is important in other parts of the image too.

Further, In the formation of a CNN architecture Pooling layers and Activation functions are crucial to define.
  • Pooling layer partitions the input image into a set of non-overlapping rectangles and, for each such sub-region, outputs a value. The intuition is that the exact location of a feature is less important than its rough location relative to other features. The Max Pooling outputs the maximum value of the sub-region. While Global Average Pooling outputs the average value of the region which is suitable for feeding into our dense output layer. These layers reduce spatial dimensions by reducing the parameters and subsequent computation requirements but not the depth.
  • Activation functions are used to introduce non-linearity into the neural network helping it to learn more complex functions. Without which the neural network would be only able to learn linear function which is a linear combination of its input data. An activation function is a function in an artificial neuron that delivers an output based on inputs.
  • In other words, a value is passed through a function that squashes the value into a range. The most commonly used activation function is the ReLu activation function. It takes an input ‘x’ and returns ‘x’ if it is positive else return 0. The reason the ReLu function is used because it is really cheap to perform. There are some Hyper-parametrized ReLu as well such as Leaky ReLu, Exponential ReLu to prevent saturation or zero gradients which stops the learning.
  • Another commonly used activation function for multi-class classification is Softmax. It is a combination of multiple sigmoids. Sigmoid returns value between 0 & 1, which can be treated as probabilities of a data point belonging to a particular class. Thus sigmoid is widely used for binary classification problems. But, the softmax function can be used for multiclass classification problems
  • You may read upon more information on non-linear activation functions like TanH commonly used in RNN, LSTM, and GRU to have a better understanding.
Optimizers
  • In deep learning, we use optimization algorithms to train the neural network by optimizing the cost function J.
cost function
  • The value of cost function is the mean of the loss between the predicted value y’ and actual value y. The value y’ is obtained during the forward propagation step and makes use of the Weights W and biases b of the network. With the help of optimization algorithms, we minimize the value of Cost Function J by updating the values of the trainable parameters W and b.
Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
  • Adam is different from classical stochastic gradient descent. Stochastic gradient descent maintains a single learning rate (termed alpha) for all weight updates and the learning rate does not change during training. A learning rate is maintained for each network weight (parameter) and separately adapted as learning unfolds.
  • Instead of adapting the parameter learning rates based on the average first moment (the mean) as in RMSProp, Adam makes use of the average of the second moments of the gradients (the uncentered variance). Specifically, the algorithm calculates an exponential moving average of the gradient and the squared gradient, and the parameters beta1 and beta2 control the decay rates of these moving averages.
  • The initial value of the moving averages and beta1 and beta2 values close to 1.0 (recommended) result in a bias of moment estimates towards zero. This bias is overcome by first calculating the biased estimates before then calculating bias-corrected estimates.
  • Similarly, if you're a beginner start with Vanilla Gradient Descent, Stochastic GD, Then, Nesterov & Momentum based GD. Eventually, RMS Prop and Adam, Nadam.
Learning rate is commonly defined as 1e-4 equals to 1*10^-4

Loss Functions

  • The cross-entropy error function is often used for classification problems when outputs are interpreted as probabilities of membership in an indicated class.

Flatten()

  • In between the convolutional layer and the fully connected layer, there is a 'FlattenlayerFlattening transforms a two-dimensional matrix of features into a vector that can be fed into a fully connected neural network classifier.

Dense Layers

  • The combination of dense layers forms a multi-layer perceptron which can be used for classification as well as regression. At the end of CNN, we add some dense layers for classification upon the learned features by CNN layers.

Regularization - Dropouts, prevent overfitting.

  • It turns off the input variable to the next layer. This Dropout is applied to each element or cell within the feature maps. (After the max-pooling layer)
  • SpatialDropout - This drops out entire feature maps from the convolutional layer which is then not used during pooling. (Before max-pooling layer)

Callbacks - Monitor the training of the Deep Neural Network, fix bugs quickly and build better models.

  • Early stopping - Early stopping prevents overtraining of our model by terminating the training process if it’s not really learning anything. Reduce overfitting.
  • Model Checkpoint - This callback helps to save the model into H5/ HDF5 format recursively after every successful epoch. We may monitor validation accuracy or loss.

Result

result graph

Bias-Variance Trade-off

  • Bias - Algorithm tendency to consistently learn the wrong thing by not taking into account all the information in the data. (underfitting)
How much accuracy of the algorithm changes with respect to change in input data.
  • Variance - Algorithm tendency to learn random things irrespective of the real signal by fitting highly flexible models that follow the error/noise in data too closely. (overfitting)
How sensitive the algorithm is to the chosen input data.
  • We need to have a trade-off between Bias and Variance. Also, read Simple & Complex Model for more details.

After successfully classifying the waste images, I moved on to explore another case study to actually count the different trash bags.

Github TrashNet

Color Segmentation

color segmentation
  • RGB to HSV: Every image is digitally stored as a matrix of Red Green Blue with a range in between 0-255. We convert the image into the Hue Saturation Value color space as giving a clear boundary value in RGB is difficult.
  • Masking: For an individual color to count, created a mask which means only that color in the image is highlighted.
  • Contours: After that converted image into grayscale applied corner (canny) edge detection, some morphological transformations, created contours around it using convex hull.
  • Counting: At last, counted the number of contours of a particular color in an image. Thus, marking the boundary of each Trash Bag.

Github ColorSegmentation

Industrial UseCase

trashbin


IoT powered Trash bins


trashbin2


Ref. https://www.youtube.com/watch?v=YonjK2emAFk

No alt text provided for this image
bin-e2
trash streets
Detecting / Counting trashbags in street

Other CNN Applications

  • Classify anything
HotDog/Not a HotDog to PavBhaji/Not a Pav bhaji

As my favorite television series is Silicon Valley HBO. I was inspired by JinYang's classifier of HotDog & Not a HotDog. Similarly, I was given a task of binary classifying PavBhaji and Not_PavBhaji.

pavBhaji
My apologies for making you feel ravenous during a technical article.
  • "Transfer Learning: Inductive transfer focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. For example, the knowledge gained while learning to recognize HotDog could apply when trying to recognize PavBhaji." - Wiki
  • Therefore, you may use models trained by big 4s like ResNet, Inception, VGG16, MobileNet on vast datasets like ImageNet.

Kaggle: There were some issues using binary functions. Therefore, everything in the notebook is done as multi-class classification.

Next Steps

Mask-RCNN

RCNN
  • The Mask RCNN can help in better counting of different trashbags using Dataset. Further, It can assist in segmenting Dirty-Clean Area for Dirty Streets Location Reporting from a Mobile App by citizens as an Informant to keep the city clean.
  • Multiple waste objects in a single image using object detection algorithms such as YOLO.

Human Activity Recognition

Pretrained model on Kinect Dataset.

All the above CNN was using 2D Kernels/Filters. Now, with the help of 3D-CNN, we can recognize people throwing waste in a restricted area or a private property (Action).

Gif - Pretrained model on Kinect Dataset.

References -

Soma harshithkumar

Actively looking for SDE role || DSA java || Problem solver || Ex-Intern@Infosys SpringBoard ||Enthusiastic in Artificial intelligence and machine learning

1 年

can you provide the dataset link that is used?

回复

要查看或添加评论,请登录

Vasant Vohra的更多文章

  • Swiss-Life ????

    Swiss-Life ????

    12 Lessons from Living in Switzerland for 12 Months ?? Switzerland is Clean: The air feels fresher, the streets are…

    2 条评论
  • System Design - Conference Management

    System Design - Conference Management

    Goals of this article: To share the knowledge and wisdom I've gained through working on a web application that is…

  • EuroPython2022

    EuroPython2022

    Overview This is an article to share my wonderful experience participating in the EuroPython Conference in Dublin…

    2 条评论
  • Technical Solutions for Healthcare, Transportation, Mortgage Industries...

    Technical Solutions for Healthcare, Transportation, Mortgage Industries...

    one needs to solve problems which have a bigger impact in others life 2021 Indico.UN Largest Events, and conferencing…

    1 条评论
  • 7 steps for code reviewing via C3.

    7 steps for code reviewing via C3.

    Code reviewing is a fun activity within the team. Similar to taking a chilled beer with colleagues, the code review…

    3 条评论
  • Every developer must know...

    Every developer must know...

    Hello, my dear change-makers of society. In this article, I try to briefly explain the 5 basic SOLID Principles, every…

    3 条评论
  • Do you really know SCRUM?

    Do you really know SCRUM?

    Fail Fast, Learn Fast, Feedbacks I guess we all know about SCRUM, nearly every software company is being Agile and…

    4 条评论
  • WeighBridge-Indian Trucks ALPR

    WeighBridge-Indian Trucks ALPR

    Challenge Evident from the image on the right, Licence Plates in Indian trucks have variations. It's easy for humans to…

    1 条评论
  • Docker

    Docker

    I have been developing projects related to Augmented Intelligence as well as updating some of them on Github. As the…

  • Software Engineers support to ease COVID-19.

    Software Engineers support to ease COVID-19.

    Over the past few days, I've been thinking about how to contribute my skills and abilities to save not the world but…

    1 条评论

社区洞察

其他会员也浏览了