Stochastic Rounding
When comes to digital signal, NN has the same liking as our ears.
Rounding a number is a very common operation in DSP, in ML it has a specially useful purpose which is to reduce the parameter precision, thereby increase the computation throughput. To optimize for this purpose, various rounding techniques are employed and one popular one is Stochastic Rounding (SR). When applying it during lowering parameter precision, NN model’s performance could be largely maintained, so same performance for much less computation, great!
The concept of SR is an interesting one, most noticeably, individual rounding result is no longer deterministic, but probabilistic. Take rounding 0.3 to nearest integer as example, conventional round-to-nearest (RTN) method would yield 0, 100% of the time. In contrast, the result SR method produces would has 70% probability of being 0 and 30% being 1. Two key points here:
“Interesting yet makes good sense”, was my feeling when I first came across it. As you can extrapolate, with SR 0.5 would yield a result with 50% chance being 0 and 50% chance being 1. Following are two plots illustrate RTN’s and (a sample of) SR’s output over input range [0, 1.0] .
领英推荐
On the SR plot, input 0~0.3 mostly yield 0 with a small chance of jumping to 1, and vice versa for input above 0.6. The middle range has even probability between 0 or 1. In fact, there is a even simpler operation mode of SR, it is let’s not even worry about the fractional part, just discard it and round up or down randomly :-o
Yes, even pure random can be considered as rounding.
Finally, let’s try SR on a real signal and see what it looks like, reducing number precision from float to int8. Again, side by side with RTN for comparison.
Those with music engineering background might already recognize something familiar, SR is just like dithering! A technique employed to make recording more pleasant to our ears when convert to lower resolution.
When comes to digital signal, NN has the same liking as our ears????
Retired
3 个月Interesting - seems akin to dithering in digital audio, making better use of the available quantization levels