Regularization can prevent overfitting, we usually add l2 regularization for weights in a deep learning model. Here is the tutorial:
However, can we add l1 or l2 regularization to bias?
We usually do not apply regularization to bias terms, there are some explains:
Usually weight decay is not applied to the bias terms
Overfitting usually requires the output of the model to be sensitive to small changes in the input data (i.e. to exactly interpolate the target values, you tend to need a lot of curvature in the fitted function). The bias parameters don’t contribute to the curvature of the model, so there is usually little point in regularising them as well.
Bias regularization in Keras
We usually do not apply L1/L2 regularization to bias, but there is bias regularization in Keras.
Look at this function: tf.keras.layers.Conv2D().
We can find we can apply a regularization to bias. Why?
By adding bias regularization to final loss function, we can make the model output function pass through (or have an intercept closer to) the origin (zero point).
When we use bias regularization?
For example, in face recognition, we usually use amsoft loss function. In order to compute \(cos\theta\), we suppose the output pass through the origin. We can bias regularization in this situation.