We have known cosine similarity softmax function, which is a softmax function with hyperparameter. In this tutorial, we will discuss this topic.
Cosine similarity softmax is defined as:
Understand Cosine Similarity Softmax: A Beginner Guide
Where S is the hyperparameter.
The effect of S to softmax function
As to hyperparameter S, the value of it can be: S > 0
If S = 1, P(y=i|X) can be equal to traditional softmax.
We will use a softmax function to discuss the effect of S.
As to softmax:
The graph of softmax(x) is:
In this equation, x is the hyperparameter.
From the equation, we can find:
If x = 1
softmax(0.5, x = 1) > softmax(0.2, x = 1)
If 0 < x < 1.
softmax(0.5, x = 1) > softmax(0.5, x)
It means the value of softmax(0.5, x) decreases, softmax(0.2, x) increases.
The lowe value of softmax becomes larger.
If x > 1
softmax(0.5, x = 1) < softmax(0.5, x)
It means the value of softmax(0.5, x) increases, softmax(0.2, x) decreases.
The large value of softmax becomes much larger.
However, as to softamx:
The graph of it is:
From the graph, we also can find:
If x = 1
softmax(0.2, x = 1) > softmax(0.5, x = 1)
If 0 < x < 1.
softmax(0.2, x ) > softmax(0.2, x = 1)
It means the value of softmax(0.2, x) increases, softmax(0.5, x) decreases.
The lower value of softmax becomes larger.
If x > 1
softmax(0.2, x = 1) >softmax(0.2, x)
It means the value of softmax(0.2, x) decreases, softmax(0.5, x) increases.
The large value of softmax becomes much larger.
To summarize:
If S > 1, it is benefit to big probability.
If 0 < S < 1, it is benefit to small probability.