The Channel-wise squeeze-excitation module (SE module) has achieved a great success in both computer vision and speech processing fields. In this tutorial, we will introduce it for beginners.
SE module is proposed in paper: Squeeze-and-Excitation Networks. The structure of it looks like:
The standard SE module uses two fully connected layers to learn the importance of different channels by first compressing and then expanding the full average channel vector to obtain channel level weights.
Here is the detail:
Effect of SE module
The SE module can use a sigmoid function to compute the attention score for different channels. We also can use it on time sequence or frequency to get which time or frequency is important.