# KL Divergence: Forward or Reverse – DKL(p||q) or DKL(q||p) – Which Selected?

By | May 29, 2019

The KL Divergence could be computed as follows:

where P(X) is the true distribution we want to approximate, Q(X) is the approximate distribution.

However, there are two kinds of KL Divergences: Forward and Reverse.

KL Divergence Forward: DKL(p(x)||q(x))

KL Divergence Reverse: DKL(q(x)||p(x))

KL Divergence can be used as a loss function to minimize in deep learning model. However, which KL Divergence shoud be selected in our application?

The difference between DKL(p(x)||q(x)) and DKL(q(x)||p(x)).

As to DKL(p(x)||q(x)), we must know:

when p(x) = 0, DKL(p(x)||q(x))= 0; q(x) = 0, DKL(q(x)||p(x)) =∞

Let’s see some visual examples.

There are two probability distributions  and

As to KL Divergence Forward, we use Q(x) to approximate P(x).  The right part of P(x) > 0

At this part, to make the kl value is minimum, the final Q(x) may be:

As to KL Divergence Reverse, we use P(x) to approximate Q(x).  The right part of Q(x) =0

So the value of P(x) in right part will not be changed, Kl value in this area is o no matter what value of P(x) .

To make KL value is minimum, the P(x) will be like