Why Add Bias Regularization in Deep Learning Model – Keras Tutorrial

By | August 19, 2021

Regularization can prevent overfitting, we usually add l2 regularization for weights in a deep learning model. Here is the tutorial:

Multi-layer Neural Network Implements L2 Regularization in TensorFlow – TensorFLow Tutorial

However, can we add l1 or l2 regularization to bias?

We usually do not apply regularization to bias terms,  there are some explains:

Explain 1:

Usually weight decay is not applied to the bias terms

http://deeplearning.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

Explain 2:

Overfitting usually requires the output of the model to be sensitive to small changes in the input data (i.e. to exactly interpolate the target values, you tend to need a lot of curvature in the fitted function). The bias parameters don’t contribute to the curvature of the model, so there is usually little point in regularising them as well.

https://stats.stackexchange.com/questions/153605/no-regularisation-term-for-bias-unit-in-neural-network

Bias regularization in Keras

We usually do not apply L1/L2 regularization to bias, but there is bias regularization in Keras.

Look at this function: tf.keras.layers.Conv2D().

We can find we can apply a regularization to bias. Why?

By adding bias regularization to final loss function, we can make the model output function pass through (or have an intercept closer to) the origin (zero point).

When we use bias regularization?

For example, in face recognition, we usually use amsoft loss function. In order to compute $$cos\theta$$, we suppose the output pass through the origin. We can bias regularization in this situation.