Tutorial Example

Understand Cosine Similarity Softmax: A Beginner Guide – Machine Learning Tutorial

Cosine Similarity Softmax is an improvement of traditional softmax function. In this tutorial, we will intorduce it for machine learning beginners.

Softmax function

Softmax function is widely used in text classification. As to text classification problem, softmax function can be difined as:

where K is the number of classification.

From the the equation of softmax function: P(y = i|X), we can find:

The value of P is determined by the dot product of vector wi and X.

Cosine Similarity Softmax

Suppose the lenght of W and X is 1, b = 0

Y = W•X

    =||w||*||X||*cosθ

    =cosθ

cosine similarity softmax can be:

if we only suppose b=  0, we also can define P(y=i|X),

where cosθi = wi•X/||wi||*||X||

However, cosθ∈[-1,1], in order to improve the performance of cosine similarity softmax, we can update it to:

S is a hyper parameter, you can set the value by your own situation.

S can be 2, 4, 6 or 32, 64