Wednesday, April 25, 2007

Homework #5 & Talagrand

Solutions are posted.

Regarding Question #2: I referred to this as "Talagrand's Lemma", and that's how I personally think of it, having learned it first as Proposition 2.2 in Michel Talagrand's ingenious paper How much are increasing sets positively correlated?.

I think that, really, it should count as a folklore result. For example, in arithmetic combinatorics I think it's sometimes called Chang's Theorem. It is also the pith of the very technical proof of Bourgain's Theorem on the Fourier tails of boolean functions (not available online, but appearing in a paper of Khot and Naor). Bourgain uses a different proof, but doesn't quite derive it explicitly. The different proof relies on hypercontractivity and is extremely short:

Given $f : \{-1,1\}^n \to \mathbb{R}$, let $L$ be the degree-1 part of $f$. Then by Plancherel, $W_1(f) = \langle f, L \rangle \leq \|f\|_{q'} \|L\|_{q}$ for any $1/q + 1/q' = 1$, by Holder. Use hypercontractivity on $\|L\|_q$, noting that it has degree 1. One gets $\|L\|_q \leq \sqrt{q-1} \|L\|_2 = \sqrt{q-1} \sqrt{W_1(f)}$. On the other hand, we have $\|f\|_{q'} = E[|f|^{q'}]^{1/q'} \leq p^{1/q'} = p^{1 - 1/q}$ where $p = E[|f|]$, since $|f| \leq 1$ and $q' \geq 1$. Putting it together, we get $\sqrt{W_1(f)} \leq p^{1-1/q} \sqrt{q - 1}$, which implies $W_1(f) \leq p^2 \cdot q(1/p)^{2/q}$. Now we just need to choose $q$ to balance $q$ and $(1/p)^{2/q}$; the optimum choice is $q = \Theta(\log(1/p))$, and this completes the proof.

Tuesday, April 24, 2007

Randomized DT complexity of monotone graph properties

I posted the lecture notes for today; they contain a bit more information about Yao's Conjecture ($f$ a monotone graph property $\Rightarrow R(f) \geq \Omega(v^2)$).

Edinburgh

Just got back from a workshop on Geometry and Algorithms in Edinburgh. Was there any Fourier analysis? But of course! What would geometry and algorithms be without it?

Unfortunately, though, it's hard to get it into one's talk in 50 minutes, since one has to cover the geometry and algorithms parts. Guy Kindler mentioned some Fourier analysis in connection with UGC-hardness for Max-Cut. Assaf Naor talked a bit about his upcoming magnum opus with Manor Mendel on "Metric Cotype" which will appear in the Annals of Math. That paper is pretty much all Fourier analysis for functions $f : {\mathbb{Z}_m}^n \to \mathbb{R}$, but with $m > 2$. And Robi Krauthgamer came closest to actually doing some Fourier analysis in his slides, while talking about his work on the communication complexity and embeddability of Edit Distance. Unfortunately, his paper (joint with Alexandr Andoni) doesn't seem to be available yet.

I did my best to get there while talking about the parallel repetition/foams stuff I talked about at the ACO Seminar in March, but no dice. I do think the Fourier analysis we used in that result was pretty cute though; basically: (1) Talagrand's Lemma from Homework 5 Problem 2; and (2) the reversed form of the Hypercontractive Inequality, namely:

If $f : \{-1,1\}^n \to \mathbb{R}^{\geq 0}$ and $-\infty < q \leq p \leq 1$, then $\|T_\rho f\|_q \geq \|f\|_p$ so long as $0 \leq \rho \leq \sqrt{(1-p)/(1-q)}$.


You know -- just like the usual inequality, but reversed.

Friday, April 20, 2007

Homework #5

Hope you enjoyed Avrim on Tuesday and a day off on Thursday.
Please post any questions or comments about Homework #5 here in the comments.

Sunday, April 08, 2007

Homework #4 graded

The average was 35/40, with a high of 38. As usual, pretty much everybody knew what was going on.

The most popular questions seemed to be #1 and #2, even though those were definitely the longest (especially #1!). #5 is definitely the fastest, if you see how to do it.

Homework #4

Solutions are posted on the course web page.

Thursday, April 05, 2007

Geometry of correlated Gaussians

In class we were discussing the geometry of $\rho$-correlated, $n$-dimensional Gaussians; call them $\vec{g}$, $\vec{h}$.

We have that $\| \vec{g} \|^2 = \sum_{i=1}^n g_i^2$. This is the sum of $n$ very well-behaved independent random variables with mean $1$ and variance $2$, so its distribution value will be $n \pm O(\sqrt{n})$ with very high probability. (To be more accurate, say $n \pm O(\sqrt{n} \log n)$ with probability at least $1 - n^{-O(1)}$.) Hence $\|\vec{g}\|$ will be $\sqrt{n}(1 + O(1/\sqrt{n})$ with high probability. Since $\vec{h}$ is also distributed as a standard $n$-dimensional Gaussian, the same is true of $\vec{h}$.

Now imagine $\vec{g}$ is fixed and we choose $\vec{h}$ to be $\rho$-correlated to $\vec{g}$. Then $\vec{g} \cdot \vec{h} = \sum_{i=1}^n g_i h_i = \sum_{i=1}^n (\rho g_i^2 + \sqrt{1-\rho^2} g_i g_i')$, where the $g_i'$ random variables are independent standard normals. I.e., the dot product is $\rho \|\vec{g}\|^2$ plus an independent one-dimensional normal, $\sqrt{1-\rho^2} \cdot N(0, \sum_i g_i^2)$. Since $\|\vec{g}\|^2 = \sum_i g_i^2 = n \pm O(\sqrt{n})$ with high probability, the dot product is $\rho n \pm O((\rho + \sqrt{1-\rho^2})\sqrt{n})$ with high probability.

So the cosine of the angle between $\vec{g}$ and $\vec{h}$ will be $(\rho n \pm O(\sqrt{n}))/(n \pm O(\sqrt{n})) = \rho \pm O(1/\sqrt{n})$, and hence the angle will be $\arccos \rho \pm O(1/\sqrt{n})$ with high probability (assuming $\rho$ is treated as a constant in $(0,1)$).

Now this doesn't really yet prove that $\vec{g}$ and $\vec{h}$ are distributed like a random pair on the surface of the $\sqrt{n}$-radius sphere with angle $\arccos \rho$. We just know that their angle will be $\arccos \rho$ and they will both be on the sphere. We should really look at the distribution of $\vec{h} - \rho \vec{g}$. But this is just a scaled $n$-dimensional Gaussian, so its distribution is spherically symmetric. I guess this more or less justifies the overall claim.
Actually, I believe that $\vec{h} - \rho \vec{g}$ will be close to orthogonal to $\rho \vec{g}$ with high probability. The dot product between the two will be distributed like $\sqrt{1-\rho^2} \cdot N(0, \sum_i g_i^2)$, which will probably be on the order of $\sqrt{1-\rho^2}\sqrt{n}$. However the product of the lengths of the two vectors will be like $\rho \sqrt{1-\rho^2} n$. So the cosine of the angle between them will be like $1/(\rho \sqrt{n})$, and the angle will be close to $90$ degrees.

Wednesday, April 04, 2007

Approximating not-nice functions by nice functions

In the last class I asserted that it was possible to approximate certain not-nice functions like $1_{t < t_0}$ and $|t|$ by $B$-nice functions. For example, I asserted that for all $t_0 \in \mathbb{R}$ and $0 < \lambda < 1/2$ there exists a function $\Delta_{t_0,\lambda} : \mathbb{R} \to \mathbb{R}$ which is $O(1/\lambda^4)$-nice and approximates the $t_0$-step-function in the following sense: $\Delta_{t_0, \lambda}(t) = 1$ for $t < t_0 - \lambda$; $\Delta_{t_0, \lambda}(t) = 0$ for $t > t_0 + \lambda$; and, $0 \leq \Delta_{t_0, \lambda}(t) \leq 1$ for $|t - t_0| \leq \lambda$.

There was a question in class as to whether this could really be possible. Specifically, if $\Delta_{t_0, \lambda}$ is $0$ for all $t > t_0 + \lambda$, and it's smooth, then it has all derivatives equal to $0$ for all $t > t_0 + \lambda$. Shouldn't this make it $0$ everywhere?

Well, not so. It's true that for any $t > t_0 + \lambda$, all derivatives are $0$. But Taylor's theorem does not force such a function to be $0$ everywhere. Remember that Taylor (for smooth functions) just says that for any $r \in \mathbb{N}$, $f(x + \epsilon) = f(x) + \epsilon f'(x) + \frac{\epsilon^2}{2!} f''(x) + \cdots + \frac{\epsilon^{r-1}}{(r-1)!} f^{(r-1)}(x) \epsilon^{r-1} + \frac{\epsilon^r}{r!} f^{(r)}(y)$ for some $y \in [x, x + \epsilon]$.

For a concrete example, consider the function $\phi(t)$ which is $\exp(-1/(1-t^2))$ for $|t| < 1$, and is $0$ for $|t| \geq 1$. It's easy to see that $\phi(t)$ is smooth on $(-1,1)$ and it's not hard to check that its derivatives, of all orders, at $\pm 1$ (when approached from inside) are $0$.

In fact, functions like $\phi$ (known as bump functions or mollifiers) are what one uses to create functions like $\Delta_{t_0, \lambda}$ -- essentially, by taking $\Delta * \phi_\lambda$, where $\Delta_{t_0}$ denotes the actual discontinuous step function, $*$ denotes convolution, and $\phi_\lambda$ denotes a compressed version of $\phi$, viz., $\phi_(t/\lambda)/\lambda$.