The Real Impact of Pruning Neural Networks

The Real Impact of Pruning Neural Networks

Neural network pruning has (at least) two impacts:

First, a pruned model is necessarily a subset of its un-pruned parent. This means that the pruned model has a strict subset of the weights of its parent and therefore less expressive capacity. As an analogy, recall the function?f(x)=mx+b?for simple linear regression. If we prune one of the parameters, for example the intercept, we are left with?g(x)=mx. Anything expressible by?g(x)?is equally expressible by?f(x) ?(setting?b=0). But, there are some things which can be expressed by ((f(x))) that cannot be expressed by?g(x). While a pruned model must be?less expressive?than its parent, it is not necessarily the case that a pruned model must be less accurate than its parent. In our analogy, for example, if the data we were representing naturally had an intercept of 0, then both models would be equally accurate. In practice it is hard (maybe impossible) to know how accurate a pruned model can (or will) be until it is pruned and tested.

Second, since a pruned model contains a strict subset of the weights of its un-pruned parent, it is necessarily the case that less computation must be done to compute the pruned network’s output than its parent. As a result of requiring less computation, the inference speed of a pruned model is at least as fast (usually faster) than its parent. There are many subtle factors which impact how much (if any) speed-up will be seen including: how much of the network was pruned (the more that is pruned the more likely it is to see a larger speed-up), how well the network’s computation is parallelized (pruning can impact positively or negatively the parallelization efficiency), I/O speeds (if it takes longer to load data than to do the computation then doing less computation will not produce a speed-up because it doesn’t affect the I/O speed), etc.

In the following sections, we demonstrate the real affects of training and iteratively pruning a ResNet18 model on a dataset containing images of vehicles–the task being to identify the predominant color of the vehicle.

Minimal Loss of Model Accuracy

First we examine the impacts on the accuracy of our model as we train and iteratively prune.

Figure 6?shows the validation accuracy over time as we perform this iterative procedure. The full model is updated through 40,000 iterations (batches of data) achieving an accuracy of just over 95% on the validation dataset. We then prune approximately 5% of all the convolutional kernels in the model and continue training. After the initial pruning little or no accuracy is lost and the model continues to learn. We continue to prune approximately 5% of the model (removing approximately 5% of the convolutional kernels) and immediately fine-tune the model for 8000 batches. Note that, after the initial pruning operations, the accuracy (if lost at all) is recovered significantly quicker than after 8000 iterations–indicating that would could have achieved similar results with less fine tuning. By the final few pruning iterations we performed, the initial drop is accuracy becomes more pronounced, although in every case the model is able to recover to at least 95% accuracy.

No alt text provided for this image

Figure 6: The validation accuracy of a ResNet18 model as it is trained. The first 40,000 batches are training a full model. Then, pruning iterations begin. Each pruning iteration consists of removing approximately 5% of the convolutional kernels from the existing ResNet18 model and then fine-tuning (re-training) for 8000 additional batches.

Reduction in Model Size

In?Figure 6?we estimated the percentage of convolutional kernels removed from the model. Note, this estimate is rough for two reasons: first, we can only remove an integer number of kernels (so we almost never remove precisely 5% of the model), and second, we are actually attempting to remove 5% of the current model at each step, so as the model shrinks, 5% of the current model is a decreasing fraction of the original model. But, removing entire kernels is not the only impact on the model. As previously discussed, when we remove some kernels, the remaining kernels in subsequent layers shrink (to accept inputs with fewer channels). In?Figure 7?we illustrate the relative impact on model size as the pruning occurs. This model size is measured by examining the memory (disk space) required to store the model (its learned parameters). Since the exact model size is less important, we show here the size of pruned models relative to the full model (a ratio of pruned model to full model). This demonstrates that we can remove almost 80% (79.1%) of the models learned parameters without negatively impacting the predictive performance.

No alt text provided for this image

Figure 7: The relative model size as the model is initially trained (full model) then iteratively pruned. Note, the size referred to here is the relative size of the learned weights and is directly proportional to the number of learned weights in the pruned model. Specifically, the final model has 20.9% the number of learned parameters as the full model, or, in other words, almost 80% of the learned weights were removed (pruned) from the full model.

Improvement in Inference Time

While the direct impacts of a smaller model size may not be obvious (e.g. smaller memory footprint requiring less power), pruning a model also has a positive effect on the speed at which the model can make predictions.?Figure 8?illustrates how much additional throughput the model can produce, specifically, our model which has been pruned by almost 80%–with no loss in accuracy–can make almost 65% more predictions per second than the full model.

No alt text provided for this image

Figure 8: As the model shrinks, inference time improves and more images can be processed in a fixed amount of time. With nearly 80% of the model pruned (and no noticeable loss in validation accuracy) we observe almost a 65% increase in the number of images that can be processed per second.

Conclusion

Pruned models are (almost always) better–in about every way that actually matters–than their full, over-parameterized counterparts. Pruned models are less likely to overfit data; pruned models (may) maintain all or most of the accuracy of the full model; and, pruned models require less memory, less energy, and inference faster.

About The Author

Travis Johnston is a Senior Data Scientist at Striveworks.?He holds a PhD in Mathematics from the University of South Carolina.?Before joining Striveworks in the beginning of 2021, he was a Postdoctoral Researcher at the University of Delaware, and a Staff Research Scientist at Oak Ridge National Lab.?He is the author/co-author of many scientific publications in mathematics, machine learning, and high performance computing and has mentored many undergraduate and graduate students as well as several postdocs.

________________________________________________________________________

This is a snippet of an article that first appeared on the Striveworks blog at https://www.striveworks.com/blog/pruning-resnets-for-fun-and-profit.

要查看或添加评论,请登录

Striveworks的更多文章

社区洞察

其他会员也浏览了