DKM Differentiable K-Means Clustering Layer for Neural Network Compression
DKM casts forth K-means clustering as an attention problem, and then joint optimisation of the DNN parameters and clustering centroids is enabled. Unlike prior experiments, which relied on additional regularisations and parameters, DKM-based compression fixes the original loss function and model architecture.
DNN or deep neural networks have shown extraordinary capability in performing many cognitive tasks. They have demonstrated superhuman performance on many cognitive tasks. An uncompressed and completely-trained DNN is ordinarily used for inference on the server side, and the user experience is enhanced by on-device inference.
All you need to do is reduce the latency and keep the data of the user on the device. Many such on-device platforms are powered by batteries; hence they are constrained for power. So a DNN needs to be efficient power-wise. The stringent use of power resources will reduce the computing budget and also decrease the storage overhead.
What are the solutions for efficient management of the power of DNN?
There are multiple solutions to this one:
Another method has been shown to deliver a high ratio of DNN compression. The set of weights having a few shareable weight values is clustered into various groups. This is based on the popular method of K-means clustering. After the clustering of weights, the model size is shrunk to store indices in 2 bits, 4 bits and so on. This depends on the number of clusters. It has a lookup table with an integer, or rather a whole number, than floating-point values.
领英推荐
How to design a compact DNN architecture using K-Means clustering?
Designing a compact DNN architecture is easy. You just need to enable weight-clustering together where the clustering could provide the best solution for efficient on-device inference. But the existing model compression approaches can’t compress it completely.
The DNN is already a small DNN like MobileNet. This means that maybe we can presume that the model has not become redundant. On the other hand, the data is presumably necessary because the mathematical model itself has no significant redundancy. We can guess that such limitation comes from the fact that weight-clustering through a K-means algorithm.
Both weight-cluster assignment and updating of weights aren’t completely optimised with the help of a brain upgrade using a target task. The basic complexity in using k-means clustering for weight-sharing is done because both weights correspond to k-means centroids. So, these centroids are free to move. An ordinary K-means clustering by using fixed observations is hard for Neural Programming.
We can use differentiable K-means clustering to enable train-time weight-clustering for compressing the model, which can be used for deep learning. This helps K-means clustering to serve as a layer in generic activation. This helps to show the state-of-the-art results on both computer vision and NLM (Natural Language Model) tasks. That is how K-means clustering is used in Neural Network Compression.
In this regard, E2E Networks has some exciting solutions for you. Especially, E2E Auto Scale and E2E Linux Smart Dedicated 3rd Generation Solutions help use K-Means clustering to compress all your DNN workloads and help in optimizing their performance.