Hypercontractivity is the notion that connects to Gaussian logarithmic Sobolev inequality and Data processing inequality. In this post we will see Hypercontractivity begining by describing contractivity as preparation. This post is based on the lecture by Himanshu Tyagi.

Lp Norm

For 1p and any function f:XR, the Lp norm of f is fp=(X|f(x)|pdx)1p=E[f(x)p]1p.

Markov Kernel

Consider two random variable x and y with joint distribution pxy. Letting f:XR, we define g:YR such that g(y):=E[f(x)|y=y], where y is the realization of y. The channel is denoted by W. The W is the operator that converts f to g as g=Wf, and it corresponds to Markov kernel. W can be thought of as a transition matrix with an innumerable number of rows and columns. The following figure illustrates how W works.

Narkov Kernel

It would be ideal if we could get f(y) directly, but what we can observe is a signal passsing through some noisy channel. The noise in the channel is decided by Px|y and depends on y. Thus, g(y)=Wf(y)=Expx|y=y[f(x)].

Contraction for Lp Norm

In the above setting, W is contraction for Lp norm, that is, for 1p, (1)Wfpfp. The proof begins with Wfpp=Ex,ypxy[Expx|y=y[f(x)|y=y]p]. By Jensen’s inequality, the RHS is bounded as Ex,ypxy[Expx|y=y[f(x) p|y=y]]. Because the inner conditional expectation is canceled by the outer expectation, the above becomes =Ex,ypxy[f(x) p]]=fpp.

Hypercontractivity

While the contraction written as Eq.(1) is the conversion between p-th norm and p-th norm, Hypercontractivity allows us to connect p-th norm to q-th norm where q<p. A joint distribution of x and y, Pxy, is (p,q)-hypercontractivity for 1<qp<, if (2)Wfpfq  fLq(x). Since Lp norm is non-decrease function of p, Eq.(2) is stronger than Eq.(1) when q<p.

Two-function hypercontractivity

Eq.(2) holds if and only if (3)Ex,ypxy[f(x)g(y)]fqgp  gLp(y), where p is Hölder conjugate of p, i.e., p=pp1.

From Eq.(2) to Eq.(3)

We first see the proof of the direction from Eq.(2) to Eq.(3). As Ex,ypxy[f(x)g(y)]=Ex,ypxy[Wf(y)g(y)], by applying Hölder’s inequality, the RHS is bounded as fpgp. Since we are assuming Eq.(2), the above is bounded as fqgp, saying that Eq.(3) is true.

From Eq.(3) to Eq.(2)

Suppose Eq.(3) holds any g and consider fLp(x). Then, Ex,ypxy[Wf(y)p]=Ex,ypxy[Wf(y) Wf(y)p1]=Ex,ypxy[Expx|y=y[f(x)] Wf(y)p1]. By cancellation by outer expectation over pxy similar to that seen above, the above becomes (4)=Ex,ypxy[f(x) Wf(y)p1]. Defining g(y):=Wf(y)p1, we first derive g Lp(y). Since g(y)pp=Ex,ypxy[(Wf(y)p1)p]=Ex,ypxy[Wf(y)p1pp1]=Ex,ypxy[Wf(y)p], applying the contractivity, i.e., Eq.(1), yields Ex,ypxy[Wf(y)p]Ex,ypxy[f(x)p]<, where the last inequality comes from the assumption of fLp(x), stating g Lp(y).

Since Wf(y)p1 Lp(y), we can apply the assumption of Eq.(3) to Eq.(4), yielding Ex,ypxy[f(x) Wf(y)p1]fqWfp1p. The LHS is simply Ex,ypxy[Wf(y)p]=Wfpp. Regarding the RHS, we know that Wfp1p=(|Wfp1|p)1p=(|Wf|(p1)p)1p=(|Wf|p)1p=(|Wf|p)1ppp=Wfppp. Putting them togather, we obtain WfppfqWfppp. It is equal to Wfppppfq, and since ppp=ppp1p=1, it becomes Wfpfq. The proof of the direction from Eq.(2) to Eq.(3) is done.

In this post, we saw the 1-dimensional case. We will see the extension to the n-dimensional case in the next post.