In this post, we see Han’s inequality for relative entropy (KL divergence). Han’s inequality is used to derive entropy tensorization, and inequalities such as binary and Gaussain log-sobolev inequality are derived using entropy tensorization.

Han’s Inequality

Han’s Inequality states [ H(X_{1}, \ldots, X_{n}) \leq \frac{1}{n-1} \Sigma_{i=1}^{n} H(X_{1}, \ldots, X_{i-1}, X_{i+1}, \ldots, X_{n}), \label{eq:1}\tag{1} ] where $H(X)$ is the Shannon entoropy, $-\Sigma P \log P$, $P = p(x)$. Note that $X_{1}, \ldots, X_{n}$ do not need to be independent.

Proof

We use two basic property of the conditional entropy: [ H(X,Y) = H(X|Y) + H(Y) ] and [ H(X) \leq H(X|Y). ] From the first one, we obtain the chain rule of entropy: [ H(X_{1}, \ldots, X_{n}) = H(X_{1}) + H(X_{2}|X_{1}) + H(X_{3}| X_{1},X_{2}) + \cdots + H(X_{n}| H(X_{1}, \ldots, X_{n-1}). ]

Using the first one yields [ H(X_{1}, \ldots, X_{n}) = H(X_{1}, \ldots, X_{i-1}, X_{i+1}, \ldots, X_{n}) + H(X_{i} | X_{1}, \ldots, X_{i-1}, X_{i+1}, \ldots, X_{n}), ] and applying the second inequality to the second term in the RHS tells us that [ H(X_{i} | X_{1}, \ldots, X_{i-1}, X_{i+1}, \ldots, X_{n}) \leq H(X_{i} | X_{1}, \ldots, X_{i-1}). ] Summing $n$ inequalities for different $i$, we have [ n H(X_{1}, \ldots, X_{n}) \leq \Sigma_{i=1}^{n} H(X_{1}, \ldots, X_{i-1}, X_{i+1}, \ldots, X_{n}) + H(X_{1}) + H(X_{2}|X_{1}) + H(X_{3}| X_{1},X_{2}) + \cdots + H(X_{n}| H(X_{1}, \ldots, X_{n-1}). ] Applying the chain rule of entropy, [ n H(X_{1}, \ldots, X_{n}) \leq \Sigma_{i=1}^{n} H(X_{1}, \ldots, X_{i-1}, X_{i+1}, \ldots, X_{n}) + H(X_{1}, \ldots, X_{n}), ] and thus it becomes the statement, [ (n-1) H(X_{1}, \ldots, X_{n}) \leq \Sigma_{i=1}^{n} H(X_{1}, \ldots, X_{i-1}, X_{i+1}, \ldots, X_{n}). ] The source of the inequality is the relationship $H(X) \leq H(X|Y)$, meaning that conditioning decreases entropy.

Han’s inequality for KL divergence

Symbols with superscript $i$ such as $x^{i}$ and $P^{i}$ denote $n-1$ dimensional vectors or distributions marginalized over the $i$-th element, meaning that the $i$-th element is excluded. Han’s inequality for KL divergence states [ D(Q \parallel P) \geq \frac{1}{n-1} \Sigma_{i=1}^{n} D(Q^{i} \parallel P^{i}) \label{eq:2}\tag{2} ] or equivalently, [ D(Q \parallel P) \leq \Sigma_{i=1}^{n} (D(Q \parallel P) - D(Q^{i} \parallel P^{i})). \label{eq:3}\tag{3} ]

Proof

Denoting $\Sigma_{x \in \mathcal{X}^{n}}$ by $\Sigma$ and $- \Sigma Q \log Q$ by $H_{Q}(X)$, we have [ D(Q \parallel P) = \color{green}{- \Sigma Q \log P} - H_{Q}(X) ] and [ D(Q^{i} \parallel P^{i}) = \color{blue}{- \Sigma_{x^{i} \in \mathcal{X}^{n-1}} Q^{i} \log P^{i}} - H_{Q^{i}}(X^{i}) ] where $H_{Q^{i}}(X^{i}) = - \Sigma_{x^{i} \in \mathcal{X}^{n-1}} Q^{i} \log Q^{i}$. Since Han’s inequality (Eq.$\,$($\ref{eq:1}$)) tells us [ - H_{Q}(X) \leq - \frac{1}{n-1} \Sigma_{i=1}^{n} H_{Q^{i}}(X^{i}), ] if [ \color{green}{- \Sigma Q \log P} \geq \frac{1}{n-1} \Sigma_{i=1}^{n} (\color{blue}{- \Sigma_{x^{i} \in \mathcal{X}^{n-1}} Q^{i} \log P^{i}}), ] we can say that Eq.$\,$($\ref{eq:2}$) holds. Indeed, it turns out that equality holds in the above, as we will see. Using the product property of $P$, i.e., $P = P^{i}P_{i}$ and $P=\prod_{i=1}^{n}P_{i}$, we have [ \color{green}{- \Sigma Q \log P} = - \Sigma Q(\log P^{i} + \log P_{i}) = - \frac{1}{n} \Sigma_{i=1}^{n} \Sigma Q(\log P^{i} + \log P_{i}). ] Since $ \Sigma_{i=1}^{n} \log P_{i} = \log \prod_{i=1}^{n}P_{i} = \log P, $ we have [ \color{green}{- \Sigma Q \log P} = - \frac{1}{n} \Sigma_{i=1}^{n} \Sigma Q(\log P^{i} - \frac{1}{n} \Sigma Q \log P). ] Rearranging, we have [ \color{green}{- \Sigma Q \log P} = - \frac{1}{n-1}\Sigma_{i=1}^{n} \Sigma Q \log P^{i}. ] Since $\Sigma Q = \Sigma_{x^{i} \in \mathcal{X}^{n-1}} \Sigma_{x_{i} \in \mathcal{X}^{1}} Q^{i} Q_{i}$, we have [ \color{green}{- \Sigma Q \log P} = \frac{1}{n-1} \Sigma_{i=1}^{n} (\color{blue}{- \Sigma_{x^{i} \in \mathcal{X}^{n-1}} Q^{i} \log P^{i}}). ]

We will derive eontropy tensorization using Eq.$\,$($\ref{eq:3}$) in the next post.