PruneBERT: A high efficient version of Bert up to 97% saving in original parameters
They propose the use of movement pruning, deterministic first-order weight pruning method that is more adaptive to pre-trained model fine-tuning. It leads to significant improvements in high sparsity regimes.
Magnitude pruning can be seen as utilizing zeroth-order information (absolute value) of the running model. and they focus on movement pruning methods where importance is derived from first-order information so instead of selecting weights that are far from zero, they retain connections that are moving away from zero during the training process.
- Paper https://lnkd.in/egcA38H
- GitHub https://lnkd.in/ewRsbVD