Notes

Jul 2, 2024
Gaussian Equivalence Principle

When we apply an entrywise nonlinear function $\sigma$ to a matrix $X$, the resulting matrix $\sigma(X)$ no longer has independent entries. However, if $X$ is a Wigner matrix such as Gaussian Orthogonal Ensemble (GOE), then, in the large-$N$ limit, the spectrum of $\sigma(X)$ is equivalent to that of a simple...
Jul 1, 2024
Gaussian Orthogonal Ensemble

As a preparation for the Gaussian Equivalence Principle (GEP), we see the Gaussian Orthogonal Ensemble (GOE) and its key property, orthogonal invariance, in this post.
May 25, 2024
Paper reading: "Action Matching: Learning Stochastic Dynamics from Samples"

Learning dynamical systems from data is a central problem across physics, biology, and machine learning. The paper “Action Matching: Learning Stochastic Dynamics from Samples” (Neklyudov et al., 2023) proposes a method to recover time-evolving dynamics from static snapshots. Unlike traditional approaches requiring full trajectories, Action Matching (AM) operates on independently...
May 24, 2024
Paper reading: "Score-Based Generative Modeling with Critically-Damped Langevin Diffusion"

This post aims to give an intuitive walk-through of the Critically-Damped Langevin Diffusion (CLD) approach to score-based generative modeling. We’ll build up from first principles—mass, forces, and noise—to show how adding momentum and tuning damping can dramatically speed up sampling.
May 21, 2024
Hutch++: Efficient Stochastic Trace Estimation

Estimating the trace of large matrices efficiently is a key primitive in many machine learning tasks—e.g., computing divergence in continuous normalizing flows or approximating log-determinants in Gaussian processes. Hutchinson’s estimator uses random probes to give unbiased trace estimates but requires $O\bigl(1/\varepsilon^2 \bigr)$ matrix–vector multiplications to achieve relative error $\varepsilon$. Hutch++...
Jan 26, 2023
Tensorization of Hypercontractivity

Hypercontractivity in $1$-dimension was described in the previous post. Hypercontractivity can be extended to the $n$-dimensional, which process is called $\textit{tensorization}$. Defining $s_{p}(x,y) := \inf \{\frac{q}{p}: 1 \lt q \leq p\}$ with $(p,q)$-hypercontractivity $P_{xy}$, the tensorization says \[ s_{p}(x^{n},y^{n}) = \max_{1 \leq i \leq n} \{ s_{p}(x_{i},y_{i}) \} \]...
Jan 25, 2023
Hypercontractivity of Markov Kernel

Hypercontractivity is the notion that connects to Gaussian logarithmic Sobolev inequality and Data processing inequality. In this post we will see Hypercontractivity begining by describing contractivity as preparation. This post is based on the lecture by Himanshu Tyagi.
Dec 17, 2022
Robustness of Tukey Median

The Tukey median introduced in the previous post robustly estimates the true mean, i.e., the mean of the distribution before contaminated by outliers, or sample corruption. Specifically, letting $S \subset \mathbb{R}^{d}$ be an $\epsilon$-corrupted set of samples of size $n$ from $\mathcal{N}(\mu, \text{I}_{d})$, where $\epsilon \lt \frac{1}{6}$, there exists...
Dec 16, 2022
Introduction of Tukey Median

In 1-dimension, the median is more robust against the presence of outliers than the mean. A generalization of the 1-dimensional median to high dimensions is known as Tukey median; intuitively, it is a point that is a median in every projection. We will see the robustness of Tukey median in...
Oct 19, 2022
Gibbs Variational Formula and Variational Formula for Divergence

Other than the entropy method, the transportation method is another method that controls the logarithm of the moment generating function. In this post we introduce Gibbs variational formula, which is used to derive the transportation lemma.
Sep 21, 2022
Entropy Tensorization

Entropy tensorization, or the sub-additivity property of the entropy, is one of the essencial components to derive binary log-sobolev inequality and Gaussain log-sobolev inequality. Entropy tensorization is derived from the relationship between entropy and relative entropy through the tilting measure we saw in this post and Han’s inequality for...
Sep 20, 2022
Han's Inequality for Relative Entropy

In this post, we see Han’s inequality for relative entropy (KL divergence). Han’s inequality is used to derive entropy tensorization, and inequalities such as binary and Gaussain log-sobolev inequality are derived using entropy tensorization.
Sep 18, 2022
Herbst’s Argument and Entropy Method

With the notion of the tilting and its connection to the relative entropy introduced in this post, we will see the entropy method in this post. As described in this post, when a random raviable $Z$ satisfies \[ \psi_{Z - \mathop{\mathbb{E}}\left \lbrack Z \right \rbrack }(\lambda) \leq \frac{...
Sep 17, 2022
Derivative of Log Moment Generating Function as Tilting Measure

The first and second derivertive of log moment generating function for $Z$, $\psi(\lambda)$, can be expressed with exponential tilting by $ \frac {e^{\lambda Z}} {\mathop{\mathbb{E}} \left \lbrack e^{\lambda Z} \right \rbrack } $ introduced in this post, and from this the convexity of $\psi(\lambda)$ is shown. Using the fact...
Sep 16, 2022
Tilting Measure and Relative Entropy

We saw in this post that if the logarithm of the moment generating function $ \psi_{Z - \mathop{\mathbb{E}}\left \lbrack Z \right \rbrack }(\lambda) = \log \mathop{\mathbb{E}} \exp \left (\lambda (Z - \mathop{\mathbb{E}} \left \lbrack Z \right \rbrack) \right ) $ for a random raviable $Z$ satisfies \[ \psi_{Z -...
Sep 1, 2022
Chernoff Bound

What is concentration? At the beginning of Boucheron’s book and lecture series by Sudeep Kamath, it is explained by the following quotes from Talagrand: A random variable that smoothly depends on the influence of many independent random variables satisfies Chernoff type bounds and is essentially constant. Letting $f:...
Jan 29, 2022
Gaussian Shift Bound

Another adversarial machine learning paper “Certified Adversarial Robustness via Randomized Smoothing” builds a theory based on the notion of the half space. The paper introduced a technique referred to as randomized smoothing that transforms any classifier into a new one that admits a provable $\ell_2$ robustness certificate. Its...
Jan 28, 2022
Application of Gaussian Isoperimetric Inequality to Machine Learning

We show an application of Gaussian isoperimetric inequality to machine learning. Specifically, we will take up the paper “Adversarial Examples Are a Natural Consequence of Test Error in Noise”. As its title states,the paper argues that “the existence of adversarial examples follows naturally from the fact that our models...

Jan 27, 2022
Lipschitz Concentration and Gaussian Isoperimetric Inequality

We show an application of Gaussian isoperimetric inequality to concentration inequalities of Lipschitz functions. Gaussian Isoperimetric Inequality and its derivation are described in this post. Suppose $ f: \mathcal{X} \mapsto \mathbb{R}$ is $L$-Lipschitz function where $\mathcal{X} \subset \mathbb{R}^n$, for all $x,y \in \mathcal{X}$, \[ | f(x) - f(y) |...

Jan 26, 2022
Intuition behind Gaussian isoperimetric inequality

This post is about some of the intuition behind the theorem of Gaussian Isoperimetric Inequality that I have.

Notes

Cayman Blog is a Jekyll theme for GitHub Pages, based on Cayman theme