登录查看更多内容

Is soft-max cross-entropy the best for categorical outcomes in deep learning?

Tao Ying

Senior Artificial Intelligence Expert at PingAn Technology

发布日期: 2018年6月14日

I have been touching deep learning and TensorFlow for a while. Amazingly, the majority of textbooks or lectures take soft-max_cross-entropy algorithm as the lost function for categorical outcome as if it is the default and the best. To my mathematical intuition, the nonlinearity of Log will interfere with the optimal cost calculation by biasing toward extreme values and leading to some optimization oscillation. So I set forth to design a new lost function based on cosine similarity with grade 2 normalization. This function will regard each outcome dimension equally,and does not bias toward extremity as in cross-entropy algorithm.

Here is the result on MNIST dataset with a usual CNN algorithm as in many deep learning textbooks. What I do is just replace the soft-max_cross-entropy with the lost function I wrote. Each function has been tested for five times. The averages of five results were used as the final result:

1)tf.nn.softmax_cross_entropy_with_logits: 0.98408 (error rate: 0.01592)

2)tf.nn.softmax_cross_entropy_with_logits_v2: 0.98622 (error rate: 0.01378)

3)tf.nn.sigmoid_cross_entropy_with_logits: 0.98276 (error rate: 0.01724)

4)My lost function based on Cosine similarity: 0.99178(error rate: 0.00822)

Here is my lost function code, entire python script is available by request:

def costFunction(logits, labels):

inter=tf.reshape(tf.reduce_sum(tf.square(tf.add(logits,0.0001)),1),[tf.shape(logits)[0],1])

s_out=tf.div (logits,tf.sqrt(inter))

cost = tf.negative(tf.reduce_sum(tf.multiply(s_out, labels)))

return cost

cost =costFunction(logits=pred,labels=y)

From the results we can see that my function will reduce the error rate almost by half compared with those cross_entropy functions from TensorFlow package. I have tried this on simple one-layer NN on MNIST other than CNN and my function also gave the best result. I welcome and encourage other researchers to test this function on other data sets.

If I didn’t make any fatal mistakes, the results can prove that simple but better lost functions really exist other than soft-max_cross-entropy. I am curious why so many textbooks and lectures take soft-max_cross-entropy as default for categorical outcome without giving any alternatives. Perhaps, these authors have been intimidated by the name of Shannon. Maybe, in this huge tide of AI, most people are busy catching it other than thinking of it, not mention improving it.

Is soft-max cross-entropy the best for categorical outcomes in deep learning?

Tao Ying

Senior Artificial Intelligence Expert at PingAn Technology

更多精彩文章

社区洞察

其他会员也浏览了

My Review on Deep learning Book "The Deep Learning with Keras Workshop"

Training, Validation & Accuracy in PyTorch

Keras: Training on Large Datasets That Don’t Fit In Memory

AI Newsletter January 7,2023

Unlock The Mysteries Of Keras

Deep Learning Essentials

Understanding Deep Neural Networks Training Course

Deep Learning: GANs and Variational Autoencoders training

Understanding the Role of Keras: The High-Level Neural Networks API

Exploring PyTorch: A Deep Learning Framework

胶囊网络的简化理解与类比

2018年8月30日

Supplement in Google's EHR paper supports my points

2018年7月12日

Real or hypes? Google is using 46 billion data points to predict the medical outcomes of hospital patients

2018年1月30日

大数据辅助临床决策新突破--具有“中国特色”的肿瘤化疗不良反应预测模型

2017年10月20日

为什么人工智能不能取代人类医生

2017年4月12日

如何判断一个医疗人工智能项目

2017年1月5日

临床数据中心建设的基本原则

2014年9月9日