We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
输入x的方差越大, exp(x)的方差就越大,所以softmax后得到的向量分布更加趋近于0或者1,softmax的梯度如下 因此其梯度更加趋近于0
缩小x的方差可以限制softmax的两级(01)分化趋势,从而减缓梯度消失
Activity