Understand Cosine Similarity Softmax: A Beginner Guide – Machine Learning Tutorial

By | September 28, 2020

Cosine Similarity Softmax is an improvement of traditional softmax function. In this tutorial, we will intorduce it for machine learning beginners.

Softmax function

Softmax function is widely used in text classification. As to text classification problem, softmax function can be difined as:

softmax function for text classification

where K is the number of classification.

From the the equation of softmax function: P(y = i|X), we can find:

The value of P is determined by the dot product of vector wi and X.

Cosine Similarity Softmax

Suppose the lenght of W and X is 1, b = 0

Y = W•X



cosine similarity softmax can be:

Cosine Similarity Softmax equation 1

if we only suppose b=  0, we also can define P(y=i|X),

where cosθi = wi•X/||wi||*||X||

However, cosθ∈[-1,1], in order to improve the performance of cosine similarity softmax, we can update it to:

Cosine Similarity Softmax equation 2

S is a hyper parameter, you can set the value by your own situation.

S can be 2, 4, 6 or 32, 64


Leave a Reply

Your email address will not be published. Required fields are marked *