# Understand Gated Self-Attention for Beginners – Deep Learning Tutorial

By | March 24, 2021

Gated Self-Attention is an improvement of self-attention mechanism. In this tutorial, we will discuss it for deep learning beginners.

## Gated self-attention

Gated self-attention contains two parts: Gated and self-attention

Gated is a sigmoid function, for example:

$$g_t = sigmoid(W[h_t,s_t])$$

Here we can fuse $$h_t$$ and $$s_t$$ as follows:

$$u_t = g_t \cdot h_t + (1-g_t) \cdot s_t$$

Here $$h_t$$ or $$s_t$$ can be computed by self-attention.

Moreover, you also can concatenate $$h_t$$ and $$s_t$$ to get $$u_t$$.

Feature Fusion: Pointwise Addition Or Concatenate Vectors? – Deep Learning Tutorial

Meanwhile, if the number of features is bigger than 2, you can use self-attention to compute the weight of each feature.

## When to use gated self-attention?

If you plan to fuse two features, you can use gated function to apply different weights for them.

Here is an example:

Understand Gated End-to-End Memory Networks – Deep Learning Tutorial