Swish (Silu) Activation Function in TensorFlow: An Introduction – TensorFlow Tutorial

By | March 22, 2022

In this tutorial, we will introduce how to implement swish or silu activation function in tensorflow. There are some tips you should notice.

Swish (Silu) activation function

This function is defined as:

\(f(x)=x⋅sigmoid(βx)\)

The graph of it may be:

Swish (Silu) Activation Function in TensorFlow: An Introduction - TensorFlow Tutorial

The first derivations are:

swish function first derivation

How to implement swish in tensorflow?

If you are using tensorflow 2.0, you can use tf.nn.silu() to compute, however, if you are using tensorflow 1.0, you can use tf.nn.swish().

As to tf.nn.swish(), it only can compute x * sigmoid(x).

However, we also can build swish function as follows:

def silu(x, theda = 1.0):
    return x * tf.nn.sigmoid(theda *x)

We will use an example to evaluate it.

import tensorflow as tf
import numpy as np

g = tf.Variable(tf.truncated_normal([5, 10], stddev=0.1), name="S")

g1 = tf.nn.swish(g)
def silu(x, theda = 1.0):
    return x * tf.nn.sigmoid(theda *x)

g2 =silu(g)
init = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
with tf.Session() as sess:
    sess.run([init, init_local])
    np.set_printoptions(precision=4, suppress=True)
    a = sess.run(g1)
    b = sess.run(g2)
    print(a)
    print(b)

Run this code, we will get:

[[-0.078   0.0335  0.0191  0.0617 -0.0464 -0.0279  0.0534  0.003   0.0769
   0.0233]
 [ 0.1009 -0.031   0.0492  0.025   0.0719  0.0536  0.0074  0.0611  0.0198
  -0.0523]
 [-0.0166 -0.029  -0.0513 -0.0181 -0.0227 -0.0599 -0.0526  0.0456  0.0662
   0.0024]
 [-0.0127  0.0219  0.0654 -0.0276  0.0015  0.0697 -0.0611  0.0101 -0.0409
  -0.0018]
 [-0.0873 -0.043   0.0551  0.0507 -0.0393 -0.0582  0.0226  0.0496 -0.0262
  -0.0333]]
[[-0.078   0.0335  0.0191  0.0617 -0.0464 -0.0279  0.0534  0.003   0.0769
   0.0233]
 [ 0.1009 -0.031   0.0492  0.025   0.0719  0.0536  0.0074  0.0611  0.0198
  -0.0523]
 [-0.0166 -0.029  -0.0513 -0.0181 -0.0227 -0.0599 -0.0526  0.0456  0.0662
   0.0024]
 [-0.0127  0.0219  0.0654 -0.0276  0.0015  0.0697 -0.0611  0.0101 -0.0409
  -0.0018]
 [-0.0873 -0.043   0.0551  0.0507 -0.0393 -0.0582  0.0226  0.0496 -0.0262
  -0.0333]]

We will find g1 = g2.