Flax provides a variety of Activation functions that can be used when building a neural netowrk. In this article of the Flax Basics series, we will explore some of the most used Activation functions.

Setup

First, install Flax and import needed modules

!pip install -q flax

import numpy as np
import jax
import jax.numpy as jnp
import flax
from flax import linen as nn
import matplotlib.pyplot as plt

Set seed for reproducibility

seed = 123
key = jax.random.PRNGKey(seed)

Activations

Initialize an array to use for x-axis values

x = np.linspace(-10, 10, 100, dtype=np.float32)

ReLU

Rectified Linear Unit (ReLU) is among the most popular activation functions, usually used in hiddren layers of a neural network. It is defined as follows

$$ \mathrm{relu}(x) = \max(x, 0) $$

y = np.asarray(nn.relu(x))

Let's apply the relu function on our test array and visualize the output.

plt.plot(x, y)
plt.legend(['relu'], loc='upper left')
plt.show();

PReLU

Parametric Rectified Linear Unit (PReLU) is ReLU with a lernable parameter.

To used for testing, we need the following helper function (becuase PReLU is a nn.Module class).

def prelu(alpha=0.01):
    prelu = nn.PReLU(param_dtype=jnp.float32, negative_slope_init=alpha)
    def call(x):
        variables = prelu.init(key, x)
        return prelu.apply(variables, x)
    return call

Let's apply the prelu function on our test array usig different values for the slope parameter and visualize the outputs.

alphas = [-1.0, -0.1, -0.01, 0.01, 0.1, 1.0]
legends = [str(alpha) for alpha in alphas] 
for alpha in alphas:
    y = prelu(alpha)(x)
    plt.plot(x, y)

plt.legend(legends, loc='lower right')
plt.show();

ELU

Exponential Linear Unit (ELU) activation function similarly to ReLU is used in hidden layers but can output negative values. It is defined by the following formula:

$$ \mathrm{elu}(x) = \begin{cases} x, & x > 0\\ \alpha \left(\exp(x) - 1\right), & x \le 0 \end{cases} $$

Let's apply the elu function on our test array usig different values for the $\alpha$ parameter and visualize the outputs.

alphas = [-1.0, -0.1, -0.01, 0.01, 0.1, 1.0]
legends = [str(alpha) for alpha in alphas] 
for alpha in alphas:
    y = nn.elu(x, alpha)
    plt.plot(x, y)

plt.legend(legends, loc='lower right')
plt.show();

CELU

Continuously-differentiable Exponential Linear Unit (CELU) was proposed in this paper. It is continuously differentiable version of ELU. It is defined with the following formula:

$$ \mathrm{celu}(x) = \begin{cases} x, & x > 0\\ \alpha \left(\exp(\frac{x}{\alpha}) - 1\right), & x \le 0 \end{cases} $$

Let's apply the celu function on our test array usig different values for the $\alpha$ parameter and visualize the outputs.

alphas = [-1.0, -0.1, -0.01, 0.01, 0.1, 1.0]
legends = [str(alpha) for alpha in alphas] 
for alpha in alphas:
    y = nn.celu(x, alpha)
    plt.plot(x, y)

plt.legend(legends, loc='lower right')
plt.show();

GELU

Gaussian Error Linear Units (GELUs) is another variation of ReLU. It was first proposed in this paper. It is defined by the following formula:

$$ \mathrm{gelu}(x) = \frac{x}{2} \left(1 + \mathrm{erf} \left( \frac{x}{\sqrt{2}} \right) \right) $$

The calculation of GELU can be approximated with the following formula: $$ \mathrm{gelu}(x) = \frac{x}{2} \left(1 + \mathrm{tanh} \left( \sqrt{\frac{2}{\pi}} \left(x + 0.044715 x^3 \right) \right) \right) $$

Let's apply the gelu function on our test array and visualize the output.

y = np.asarray(nn.gelu(x))
plt.plot(x, y)
plt.legend(['gelu'], loc='upper left')
plt.show();

GLU

Gated Linear Unit (GLU) is used mostly in Gated CNNs for natural language processing applications. It is based on sigmoid function and defined with the following formula:

$$ GLU(a,b)=a⊗σ(b) $$

y = np.asarray(nn.glu(x))
plt.plot(y)
plt.legend(['glu'], loc='upper left')
plt.show();

Sigmoid

The sigmoid activation function is mainly used as the last activation function of a binary classifier because it output a value between 0 and 1 that can be used as a probability. It is defined as follows:

$$ \mathrm{sigmoid}(x) = \frac{1}{1 + e^{-x}} $$

Let's apply the sigmoid function on our test array and visualize the output.

y = np.asarray(nn.sigmoid(x))
plt.plot(x, y)
plt.legend(['sigmoid'], loc='upper left')
plt.show();

Log sigmoid

Log sigmoid as the name suggests applies a log function to the output of a sigmoid. It is defined by the following formula:

$$ \mathrm{log\_sigmoid}(x) = \log(\mathrm{sigmoid}(x)) = -\log(1 + e^{-x}) $$

Let's apply the log_sigmoid function on our test array and visualize the output.

y = np.asarray(nn.log_sigmoid(x))
plt.plot(x, y)
plt.legend(['log sigmoid'], loc='upper left')
plt.show();

Softmax

The softmax is usually used as the last activation layer of a multi-class classifier. This is because the output of a Softmax is probability distribution over the different classes. It is defined by the following function

$$ \mathrm{softmax}(x) = \frac{\exp(x_i)}{\sum_j \exp(x_j)} $$

Let's apply the softmax function on our test array and visualize the output.

y = np.asarray(nn.softmax(x))
plt.plot(x, y)
plt.legend(['softmax'], loc='upper left')
plt.show();

Note: because our test array is from -10 to 10, the output of softmax reaches its max for the higher value of x (i.e. 10)

Log Softmax

Log Softmax as the name suggests applies a log function to the output of a Softmax. It is defined by the following formula:

$$ \mathrm{log\_softmax}(x) = \log \left( \frac{\exp(x_i)}{\sum_j \exp(x_j)} \right) $$

Let's apply the log_softmax function on our test array and visualize the output.

y = np.asarray(nn.log_softmax(x))
plt.plot(x, y)
plt.legend(['log softmax'], loc='upper left')
plt.ylim((-25, 10))
plt.show();

Soft sign

The Soft sign activation outputs values between -1 and 1. It is defined by the following formula

$$ \mathrm{soft\_sign}(x) = \frac{x}{|x| + 1} $$

Let's apply the softsign function on our test array and visualize the output.

y = nn.soft_sign(x)
plt.plot(x, y)
plt.legend(['soft sign'], loc='upper left')
plt.show();

Softplus

The Softplus activation is a smooth version of the ReLu. It is defined by the following formula:

$$ \mathrm{softplus}(x) = \log(1 + e^x) $$

Let's apply the softplus function on our test array and visualize the output.

y = nn.softplus(x)
plt.plot(x, y)
plt.legend(['soft plus'], loc='upper left')
plt.show();

Swish (SiLU)

Sigmoid Linear Unit (SiLU) was first proposed in this paper. SiLU is based on sigmoid and is defined by the following formula:

$$ \mathrm{silu}(x) = x \cdot \mathrm{sigmoid}(x) = \frac{x}{1 + e^{-x}} $$

Let's apply the swish function on our test array and visualize the output.

y = nn.swish(x)
plt.plot(x, y)
plt.legend(['swish'], loc='upper left')
plt.show();

Custom activation

If all the available activation functions does not work for you. Flax makes it easy to define custom ones.

For example, let's implement the leaky relu activation function which is defined by the following formula:

$$ \mathrm{leakyrelu}(x) = \begin{cases} x, & x > 0\\ \alpha \cdot x, & x \le 0 \end{cases} $$

class LeakyReLU(nn.Module):
    alpha : float = 0.1

    def __call__(self, x):
        return jnp.where(x > 0, x, self.alpha * x)

Let's apply the Leaky ReLU activation function that we just defiend on our test array and visualize the output.

y = LeakyReLU()(x)
plt.plot(x, y)
plt.legend(['leaky relu'], loc='upper left')
plt.show();

That's all folks

I hope you enjoyed this article, feel free to leave a comment or reach out on twitter @bachiirc.