An Explanation of Softmax Function with Hyperparameter

We have known cosine similarity softmax function, which is a softmax function with hyperparameter. In this tutorial, we will discuss this topic.

Cosine similarity softmax is defined as:

Where S is the hyperparameter.

The effect of S to softmax function

As to hyperparameter S, the value of it can be: S > 0

If S = 1, P(y=i|X) can be equal to traditional softmax.

We will use a softmax function to discuss the effect of S.

As to softmax:

The graph of softmax(x) is:

In this equation, x is the hyperparameter.

From the equation, we can find:

If x = 1

softmax(0.5, x = 1) > softmax(0.2, x = 1)

If 0 < x < 1.

softmax(0.5, x = 1) > softmax(0.5, x)

It means the value of softmax(0.5, x) decreases, softmax(0.2, x) increases.

The lowe value of softmax becomes larger.

If x > 1

softmax(0.5, x = 1) < softmax(0.5, x)

It means the value of softmax(0.5, x) increases, softmax(0.2, x) decreases.

The large value of softmax becomes much larger.

However, as to softamx:

The graph of it is:

From the graph, we also can find:

If x = 1

softmax(0.2, x = 1) > softmax(0.5, x = 1)

If 0 < x < 1.

softmax(0.2, x ) > softmax(0.2, x = 1)

It means the value of softmax(0.2, x) increases, softmax(0.5, x) decreases.

The lower value of softmax becomes larger.

If x > 1

softmax(0.2, x = 1) >softmax(0.2, x)

It means the value of softmax(0.2, x) decreases, softmax(0.5, x) increases.

The large value of softmax becomes much larger.

To summarize:

If S > 1, it is benefit to big probability.

If 0 < S < 1, it is benefit to small probability.