PruneBERT: A high efficient version of Bert up to 97% saving in original parameters

PruneBERT: A high efficient version of Bert up to 97% saving in original parameters

They propose the use of movement pruning, deterministic first-order weight pruning method that is more adaptive to pre-trained model fine-tuning. It leads to  significant improvements in high sparsity regimes.

 Magnitude pruning can be seen as utilizing zeroth-order information (absolute value) of the running model. and they focus on movement pruning methods where importance is derived from first-order information so  instead of selecting weights that are far from zero, they retain connections that are moving away from zero during the training process.

要查看或添加评论,请登录

Ali Abbaszadeh的更多文章

社区洞察

其他会员也浏览了