Well, the course is over -- thanks to everyone for coming and making it a fun affair; I really appreciate it. Don't forget to fill in the course evaluation forms.
Have a good summer.
Friday, May 04, 2007
Thursday, May 03, 2007
Open Problems
The final set of lecture notes are posted -- they contain open problems. Some are definitely harder than others; some are definitely more interesting than others. If you have any more questions about them, feel free to comment. Also, if you are interested in working on any of them, let me know! If you are looking for an easyish-to-think-about problem, one that you can think about idly with a pen and paper while riding the bus, I might suggest the problem of extending the decision tree/influences inequality to real-valued functions.
I probably missed half a dozen better open problems out there. I may add to this list as they occur to me. To paraphrase Mitch Hedberg, these are the open problems I could think of today.
I probably missed half a dozen better open problems out there. I may add to this list as they occur to me. To paraphrase Mitch Hedberg, these are the open problems I could think of today.
Homework #5 graded
Homework 5 is graded. The average was 90%, with two people getting 39/40.
Most problems were done pretty well; interestingly, only one person attempted #7, even though it has the shortest solution after #3. A number of people did #2 but no one really got part (b) exactly correctly. The issue there is that you have a real-valued (not integer-valued) random variable and you know that it has exponentially small tails; you want to show that the contribution to its $L_1$-norm from large numbers is exponentially small. There is a battle here, between, the contribution of $|u|$ and the probability $O(e^{-u^2/2})$, which of course the exponentially small quantity is going to win; still, one has to integrate, as shown in the solutions.
Also, nobody got #6(c) quite correct. The Tribes function is not exactly balanced; its empty Fourier coefficient is just known to be $\pm O(\log n / n)$. Thus $\mathbb{N}\mathbb{S}_{1/2 - \gamma}(T)$ will always be at most $1/2 - \Omega(\log^2 n / n^2)$, which is why the condition $\gamma \geq 1/n$ is necessary.
I'm also pleased to report that the overall homework average was 83%. Good job, y'all.
Most problems were done pretty well; interestingly, only one person attempted #7, even though it has the shortest solution after #3. A number of people did #2 but no one really got part (b) exactly correctly. The issue there is that you have a real-valued (not integer-valued) random variable and you know that it has exponentially small tails; you want to show that the contribution to its $L_1$-norm from large numbers is exponentially small. There is a battle here, between, the contribution of $|u|$ and the probability $O(e^{-u^2/2})$, which of course the exponentially small quantity is going to win; still, one has to integrate, as shown in the solutions.
Also, nobody got #6(c) quite correct. The Tribes function is not exactly balanced; its empty Fourier coefficient is just known to be $\pm O(\log n / n)$. Thus $\mathbb{N}\mathbb{S}_{1/2 - \gamma}(T)$ will always be at most $1/2 - \Omega(\log^2 n / n^2)$, which is why the condition $\gamma \geq 1/n$ is necessary.
I'm also pleased to report that the overall homework average was 83%. Good job, y'all.
Tuesday, May 01, 2007
Szemeredi's Regularity Lemma, full version
The lecture notes for today are up; they include a full proof of the Szemeredi Regularity Lemma for $\mathbb{F}_2^n$ and also the associated result about testing "triangle"-freeness of subsets of $\mathbb{F}_2^n$.
Wednesday, April 25, 2007
Homework #5 & Talagrand
Solutions are posted.
Regarding Question #2: I referred to this as "Talagrand's Lemma", and that's how I personally think of it, having learned it first as Proposition 2.2 in Michel Talagrand's ingenious paper How much are increasing sets positively correlated?.
I think that, really, it should count as a folklore result. For example, in arithmetic combinatorics I think it's sometimes called Chang's Theorem. It is also the pith of the very technical proof of Bourgain's Theorem on the Fourier tails of boolean functions (not available online, but appearing in a paper of Khot and Naor). Bourgain uses a different proof, but doesn't quite derive it explicitly. The different proof relies on hypercontractivity and is extremely short:
Given $f : \{-1,1\}^n \to \mathbb{R}$, let $L$ be the degree-1 part of $f$. Then by Plancherel, $W_1(f) = \langle f, L \rangle \leq \|f\|_{q'} \|L\|_{q}$ for any $1/q + 1/q' = 1$, by Holder. Use hypercontractivity on $\|L\|_q$, noting that it has degree 1. One gets $\|L\|_q \leq \sqrt{q-1} \|L\|_2 = \sqrt{q-1} \sqrt{W_1(f)}$. On the other hand, we have $\|f\|_{q'} = E[|f|^{q'}]^{1/q'} \leq p^{1/q'} = p^{1 - 1/q}$ where $p = E[|f|]$, since $|f| \leq 1$ and $q' \geq 1$. Putting it together, we get $\sqrt{W_1(f)} \leq p^{1-1/q} \sqrt{q - 1}$, which implies $W_1(f) \leq p^2 \cdot q(1/p)^{2/q}$. Now we just need to choose $q$ to balance $q$ and $(1/p)^{2/q}$; the optimum choice is $q = \Theta(\log(1/p))$, and this completes the proof.
Regarding Question #2: I referred to this as "Talagrand's Lemma", and that's how I personally think of it, having learned it first as Proposition 2.2 in Michel Talagrand's ingenious paper How much are increasing sets positively correlated?.
I think that, really, it should count as a folklore result. For example, in arithmetic combinatorics I think it's sometimes called Chang's Theorem. It is also the pith of the very technical proof of Bourgain's Theorem on the Fourier tails of boolean functions (not available online, but appearing in a paper of Khot and Naor). Bourgain uses a different proof, but doesn't quite derive it explicitly. The different proof relies on hypercontractivity and is extremely short:
Given $f : \{-1,1\}^n \to \mathbb{R}$, let $L$ be the degree-1 part of $f$. Then by Plancherel, $W_1(f) = \langle f, L \rangle \leq \|f\|_{q'} \|L\|_{q}$ for any $1/q + 1/q' = 1$, by Holder. Use hypercontractivity on $\|L\|_q$, noting that it has degree 1. One gets $\|L\|_q \leq \sqrt{q-1} \|L\|_2 = \sqrt{q-1} \sqrt{W_1(f)}$. On the other hand, we have $\|f\|_{q'} = E[|f|^{q'}]^{1/q'} \leq p^{1/q'} = p^{1 - 1/q}$ where $p = E[|f|]$, since $|f| \leq 1$ and $q' \geq 1$. Putting it together, we get $\sqrt{W_1(f)} \leq p^{1-1/q} \sqrt{q - 1}$, which implies $W_1(f) \leq p^2 \cdot q(1/p)^{2/q}$. Now we just need to choose $q$ to balance $q$ and $(1/p)^{2/q}$; the optimum choice is $q = \Theta(\log(1/p))$, and this completes the proof.
Tuesday, April 24, 2007
Randomized DT complexity of monotone graph properties
I posted the lecture notes for today; they contain a bit more information about Yao's Conjecture ($f$ a monotone graph property $\Rightarrow R(f) \geq \Omega(v^2)$).
Edinburgh
Just got back from a workshop on Geometry and Algorithms in Edinburgh. Was there any Fourier analysis? But of course! What would geometry and algorithms be without it?
Unfortunately, though, it's hard to get it into one's talk in 50 minutes, since one has to cover the geometry and algorithms parts. Guy Kindler mentioned some Fourier analysis in connection with UGC-hardness for Max-Cut. Assaf Naor talked a bit about his upcoming magnum opus with Manor Mendel on "Metric Cotype" which will appear in the Annals of Math. That paper is pretty much all Fourier analysis for functions $f : {\mathbb{Z}_m}^n \to \mathbb{R}$, but with $m > 2$. And Robi Krauthgamer came closest to actually doing some Fourier analysis in his slides, while talking about his work on the communication complexity and embeddability of Edit Distance. Unfortunately, his paper (joint with Alexandr Andoni) doesn't seem to be available yet.
I did my best to get there while talking about the parallel repetition/foams stuff I talked about at the ACO Seminar in March, but no dice. I do think the Fourier analysis we used in that result was pretty cute though; basically: (1) Talagrand's Lemma from Homework 5 Problem 2; and (2) the reversed form of the Hypercontractive Inequality, namely:
You know -- just like the usual inequality, but reversed.
Unfortunately, though, it's hard to get it into one's talk in 50 minutes, since one has to cover the geometry and algorithms parts. Guy Kindler mentioned some Fourier analysis in connection with UGC-hardness for Max-Cut. Assaf Naor talked a bit about his upcoming magnum opus with Manor Mendel on "Metric Cotype" which will appear in the Annals of Math. That paper is pretty much all Fourier analysis for functions $f : {\mathbb{Z}_m}^n \to \mathbb{R}$, but with $m > 2$. And Robi Krauthgamer came closest to actually doing some Fourier analysis in his slides, while talking about his work on the communication complexity and embeddability of Edit Distance. Unfortunately, his paper (joint with Alexandr Andoni) doesn't seem to be available yet.
I did my best to get there while talking about the parallel repetition/foams stuff I talked about at the ACO Seminar in March, but no dice. I do think the Fourier analysis we used in that result was pretty cute though; basically: (1) Talagrand's Lemma from Homework 5 Problem 2; and (2) the reversed form of the Hypercontractive Inequality, namely:
If $f : \{-1,1\}^n \to \mathbb{R}^{\geq 0}$ and $-\infty < q \leq p \leq 1$, then $\|T_\rho f\|_q \geq \|f\|_p$ so long as $0 \leq \rho \leq \sqrt{(1-p)/(1-q)}$.
You know -- just like the usual inequality, but reversed.
Friday, April 20, 2007
Homework #5
Hope you enjoyed Avrim on Tuesday and a day off on Thursday.
Please post any questions or comments about Homework #5 here in the comments.
Please post any questions or comments about Homework #5 here in the comments.
Sunday, April 08, 2007
Homework #4 graded
The average was 35/40, with a high of 38. As usual, pretty much everybody knew what was going on.
The most popular questions seemed to be #1 and #2, even though those were definitely the longest (especially #1!). #5 is definitely the fastest, if you see how to do it.
The most popular questions seemed to be #1 and #2, even though those were definitely the longest (especially #1!). #5 is definitely the fastest, if you see how to do it.
Thursday, April 05, 2007
Geometry of correlated Gaussians
In class we were discussing the geometry of $\rho$-correlated, $n$-dimensional Gaussians; call them $\vec{g}$, $\vec{h}$.
We have that $\| \vec{g} \|^2 = \sum_{i=1}^n g_i^2$. This is the sum of $n$ very well-behaved independent random variables with mean $1$ and variance $2$, so its distribution value will be $n \pm O(\sqrt{n})$ with very high probability. (To be more accurate, say $n \pm O(\sqrt{n} \log n)$ with probability at least $1 - n^{-O(1)}$.) Hence $\|\vec{g}\|$ will be $\sqrt{n}(1 + O(1/\sqrt{n})$ with high probability. Since $\vec{h}$ is also distributed as a standard $n$-dimensional Gaussian, the same is true of $\vec{h}$.
Now imagine $\vec{g}$ is fixed and we choose $\vec{h}$ to be $\rho$-correlated to $\vec{g}$. Then $\vec{g} \cdot \vec{h} = \sum_{i=1}^n g_i h_i = \sum_{i=1}^n (\rho g_i^2 + \sqrt{1-\rho^2} g_i g_i')$, where the $g_i'$ random variables are independent standard normals. I.e., the dot product is $\rho \|\vec{g}\|^2$ plus an independent one-dimensional normal, $\sqrt{1-\rho^2} \cdot N(0, \sum_i g_i^2)$. Since $\|\vec{g}\|^2 = \sum_i g_i^2 = n \pm O(\sqrt{n})$ with high probability, the dot product is $\rho n \pm O((\rho + \sqrt{1-\rho^2})\sqrt{n})$ with high probability.
So the cosine of the angle between $\vec{g}$ and $\vec{h}$ will be $(\rho n \pm O(\sqrt{n}))/(n \pm O(\sqrt{n})) = \rho \pm O(1/\sqrt{n})$, and hence the angle will be $\arccos \rho \pm O(1/\sqrt{n})$ with high probability (assuming $\rho$ is treated as a constant in $(0,1)$).
Now this doesn't really yet prove that $\vec{g}$ and $\vec{h}$ are distributed like a random pair on the surface of the $\sqrt{n}$-radius sphere with angle $\arccos \rho$. We just know that their angle will be $\arccos \rho$ and they will both be on the sphere. We should really look at the distribution of $\vec{h} - \rho \vec{g}$. But this is just a scaled $n$-dimensional Gaussian, so its distribution is spherically symmetric. I guess this more or less justifies the overall claim.
Actually, I believe that $\vec{h} - \rho \vec{g}$ will be close to orthogonal to $\rho \vec{g}$ with high probability. The dot product between the two will be distributed like $\sqrt{1-\rho^2} \cdot N(0, \sum_i g_i^2)$, which will probably be on the order of $\sqrt{1-\rho^2}\sqrt{n}$. However the product of the lengths of the two vectors will be like $\rho \sqrt{1-\rho^2} n$. So the cosine of the angle between them will be like $1/(\rho \sqrt{n})$, and the angle will be close to $90$ degrees.
We have that $\| \vec{g} \|^2 = \sum_{i=1}^n g_i^2$. This is the sum of $n$ very well-behaved independent random variables with mean $1$ and variance $2$, so its distribution value will be $n \pm O(\sqrt{n})$ with very high probability. (To be more accurate, say $n \pm O(\sqrt{n} \log n)$ with probability at least $1 - n^{-O(1)}$.) Hence $\|\vec{g}\|$ will be $\sqrt{n}(1 + O(1/\sqrt{n})$ with high probability. Since $\vec{h}$ is also distributed as a standard $n$-dimensional Gaussian, the same is true of $\vec{h}$.
Now imagine $\vec{g}$ is fixed and we choose $\vec{h}$ to be $\rho$-correlated to $\vec{g}$. Then $\vec{g} \cdot \vec{h} = \sum_{i=1}^n g_i h_i = \sum_{i=1}^n (\rho g_i^2 + \sqrt{1-\rho^2} g_i g_i')$, where the $g_i'$ random variables are independent standard normals. I.e., the dot product is $\rho \|\vec{g}\|^2$ plus an independent one-dimensional normal, $\sqrt{1-\rho^2} \cdot N(0, \sum_i g_i^2)$. Since $\|\vec{g}\|^2 = \sum_i g_i^2 = n \pm O(\sqrt{n})$ with high probability, the dot product is $\rho n \pm O((\rho + \sqrt{1-\rho^2})\sqrt{n})$ with high probability.
So the cosine of the angle between $\vec{g}$ and $\vec{h}$ will be $(\rho n \pm O(\sqrt{n}))/(n \pm O(\sqrt{n})) = \rho \pm O(1/\sqrt{n})$, and hence the angle will be $\arccos \rho \pm O(1/\sqrt{n})$ with high probability (assuming $\rho$ is treated as a constant in $(0,1)$).
Now this doesn't really yet prove that $\vec{g}$ and $\vec{h}$ are distributed like a random pair on the surface of the $\sqrt{n}$-radius sphere with angle $\arccos \rho$. We just know that their angle will be $\arccos \rho$ and they will both be on the sphere. We should really look at the distribution of $\vec{h} - \rho \vec{g}$. But this is just a scaled $n$-dimensional Gaussian, so its distribution is spherically symmetric. I guess this more or less justifies the overall claim.
Actually, I believe that $\vec{h} - \rho \vec{g}$ will be close to orthogonal to $\rho \vec{g}$ with high probability. The dot product between the two will be distributed like $\sqrt{1-\rho^2} \cdot N(0, \sum_i g_i^2)$, which will probably be on the order of $\sqrt{1-\rho^2}\sqrt{n}$. However the product of the lengths of the two vectors will be like $\rho \sqrt{1-\rho^2} n$. So the cosine of the angle between them will be like $1/(\rho \sqrt{n})$, and the angle will be close to $90$ degrees.
Wednesday, April 04, 2007
Approximating not-nice functions by nice functions
In the last class I asserted that it was possible to approximate certain not-nice functions like $1_{t < t_0}$ and $|t|$ by $B$-nice functions. For example, I asserted that for all $t_0 \in \mathbb{R}$ and $0 < \lambda < 1/2$ there exists a function $\Delta_{t_0,\lambda} : \mathbb{R} \to \mathbb{R}$ which is $O(1/\lambda^4)$-nice and approximates the $t_0$-step-function in the following sense: $\Delta_{t_0, \lambda}(t) = 1$ for $t < t_0 - \lambda$; $\Delta_{t_0, \lambda}(t) = 0$ for $t > t_0 + \lambda$; and, $0 \leq \Delta_{t_0, \lambda}(t) \leq 1$ for $|t - t_0| \leq \lambda$.
There was a question in class as to whether this could really be possible. Specifically, if $\Delta_{t_0, \lambda}$ is $0$ for all $t > t_0 + \lambda$, and it's smooth, then it has all derivatives equal to $0$ for all $t > t_0 + \lambda$. Shouldn't this make it $0$ everywhere?
Well, not so. It's true that for any $t > t_0 + \lambda$, all derivatives are $0$. But Taylor's theorem does not force such a function to be $0$ everywhere. Remember that Taylor (for smooth functions) just says that for any $r \in \mathbb{N}$, $f(x + \epsilon) = f(x) + \epsilon f'(x) + \frac{\epsilon^2}{2!} f''(x) + \cdots + \frac{\epsilon^{r-1}}{(r-1)!} f^{(r-1)}(x) \epsilon^{r-1} + \frac{\epsilon^r}{r!} f^{(r)}(y)$ for some $y \in [x, x + \epsilon]$.
For a concrete example, consider the function $\phi(t)$ which is $\exp(-1/(1-t^2))$ for $|t| < 1$, and is $0$ for $|t| \geq 1$. It's easy to see that $\phi(t)$ is smooth on $(-1,1)$ and it's not hard to check that its derivatives, of all orders, at $\pm 1$ (when approached from inside) are $0$.
In fact, functions like $\phi$ (known as bump functions or mollifiers) are what one uses to create functions like $\Delta_{t_0, \lambda}$ -- essentially, by taking $\Delta * \phi_\lambda$, where $\Delta_{t_0}$ denotes the actual discontinuous step function, $*$ denotes convolution, and $\phi_\lambda$ denotes a compressed version of $\phi$, viz., $\phi_(t/\lambda)/\lambda$.
There was a question in class as to whether this could really be possible. Specifically, if $\Delta_{t_0, \lambda}$ is $0$ for all $t > t_0 + \lambda$, and it's smooth, then it has all derivatives equal to $0$ for all $t > t_0 + \lambda$. Shouldn't this make it $0$ everywhere?
Well, not so. It's true that for any $t > t_0 + \lambda$, all derivatives are $0$. But Taylor's theorem does not force such a function to be $0$ everywhere. Remember that Taylor (for smooth functions) just says that for any $r \in \mathbb{N}$, $f(x + \epsilon) = f(x) + \epsilon f'(x) + \frac{\epsilon^2}{2!} f''(x) + \cdots + \frac{\epsilon^{r-1}}{(r-1)!} f^{(r-1)}(x) \epsilon^{r-1} + \frac{\epsilon^r}{r!} f^{(r)}(y)$ for some $y \in [x, x + \epsilon]$.
For a concrete example, consider the function $\phi(t)$ which is $\exp(-1/(1-t^2))$ for $|t| < 1$, and is $0$ for $|t| \geq 1$. It's easy to see that $\phi(t)$ is smooth on $(-1,1)$ and it's not hard to check that its derivatives, of all orders, at $\pm 1$ (when approached from inside) are $0$.
In fact, functions like $\phi$ (known as bump functions or mollifiers) are what one uses to create functions like $\Delta_{t_0, \lambda}$ -- essentially, by taking $\Delta * \phi_\lambda$, where $\Delta_{t_0}$ denotes the actual discontinuous step function, $*$ denotes convolution, and $\phi_\lambda$ denotes a compressed version of $\phi$, viz., $\phi_(t/\lambda)/\lambda$.
Tuesday, March 27, 2007
Ben Green and Terry Tao
Seems like Ben Green and (Fields Medalist) Terry Tao are into the analysis of boolean functions more and more these days. Check out Terry's recent blog post on Freiman's theorem in $\mathbb{F}_2^n$, AKA $\{-1,1\}^n$, and Ben's earlier guest post on the same blog on the Polynomial Freiman-Ruzsa conjecture. Their latest two papers (mentioned in Terry's post) have all of our favorites -- the boolean cube, Fourier analysis, monotone functions, shifting, quasirandomness, linear threshold functions.
If only I understood them...
As usual, they look impeccably written, so it's just a matter of putting in the time to read them. If any of you wants to do this and then explain them to me, that would be great :)
For a warmup, I always recommend Ben Green's terrific survey article, Finite Field Models In Additive Combinatorics. The main difference between their world and ours is that we obsess over the low-degree Fourier coefficients, treating them as most notable. On the other hand, they are perfectly egalitarian about $\hat{f}(S)$ across $S$; from their arithmetic point of view, all parity functions are created equal.
If only I understood them...
As usual, they look impeccably written, so it's just a matter of putting in the time to read them. If any of you wants to do this and then explain them to me, that would be great :)
For a warmup, I always recommend Ben Green's terrific survey article, Finite Field Models In Additive Combinatorics. The main difference between their world and ours is that we obsess over the low-degree Fourier coefficients, treating them as most notable. On the other hand, they are perfectly egalitarian about $\hat{f}(S)$ across $S$; from their arithmetic point of view, all parity functions are created equal.
Friday, March 23, 2007
How much are increasing sets positively correlated?
There's a paper of Talagrand, one of the masters of Fourier analysis of boolean functions, called "How much are increasing sets positively correlated?" which I've always kind of known about but never really fully read. A few days ago I studied it more closely, and it turns out to be really nice, with proofs that are not too hard (although the notation took me a while to decipher). Maybe I'll show some of it in a future class.
It proves some nice facts, including:
An important improvement to the second bullet point above came from a notable paper of Benjamini, Kalai, and Schramm: They showed that in fact $W_k(f) \leq O(\mathbb{I}\mathbb{I} (f) \log^{k-1}(1/\mathbb{I}\mathbb{I}(f))$. I'm trying to use this fact right now for a problem I'm working on...
It proves some nice facts, including:
- If $f : \{-1,1\}^n \to \{-1,1\}$ has $Var[f] \leq p$, then $W_1(f) = \sum_{|S| = 1} \hat{f}(S)^2 \leq O(p \log (1/p))$. (This is a fact that I seem to use all the time in my work...)
- If $\mathbb{I}\mathbb{I}(f)$ denotes $\sum_i Inf_i(f)^2$, then $W_2(f) = \sum_{|S| = 2} \hat{f}(S)^2 \leq O(\mathbb{I}\mathbb{I}(f) \log(1/\mathbb{I}\mathbb{I}(f)))$. (This is the main technical lemma/theorem in the paper.)
- There is a monotone function $f : \{-1,1\}^n \to \{-1,1\}$ for which a constant fraction of all $x$ have the property that $f$ has $\Omega(\sqrt{n})$ sensitive coordinates on $x$. This immediately implies that the function has $\mathbb{I}(f) \geq \Omega(\sqrt{n})$, but in a much different way than $\mathbb{I}(Maj_n) \geq \Omega(\sqrt{n})$. With Majority, most $x$'s have $0$ sensitive coordinates, but a $\Omega(1/\sqrt{n})$ fraction of them have $(n+1)/2$ sensitive coordinates. With this function, the distribution on the number of sensitive coordinates is much simpler -- a constant fraction have $\Omega(\sqrt{n})$. The function is constructed randomly -- a random DNF of width $\sqrt{n}$ and size $(\ln 2) 2^{\sqrt{n}}$.
An important improvement to the second bullet point above came from a notable paper of Benjamini, Kalai, and Schramm: They showed that in fact $W_k(f) \leq O(\mathbb{I}\mathbb{I} (f) \log^{k-1}(1/\mathbb{I}\mathbb{I}(f))$. I'm trying to use this fact right now for a problem I'm working on...
Wednesday, March 21, 2007
Tuesday, March 20, 2007
Homework #3 graded
I graded Homework #3. The mean and median were both 39/50, with a high score of 48/50.
Kudos to Runting Shi and Yi Wu for (implicitly, by giving correct answers) noticing that the weighting function in (6b) is a sum of Kravchuk polynomials, not a Kravchuk polynomials as I mistakenly wrote in the solutions.
Kudos also to Ryan Williams for improving the upper bound in problem 1 in some cases: He showed $\mathbb{I}(f) \leq (2 - 2\alpha/(1+\alpha)) w$ when $f$ is a DNF of width $w = \alpha n$. Note that this is tight in the case $\alpha = 1$, by the Parity function.
Open problem note: I'm guessing that indeed $\mathbb{I}(f) \leq w$ for any width-$w$ DNF. Might be a nice thing to try to prove, although I couldn't guarantee that it isn't already known.
Kudos to Runting Shi and Yi Wu for (implicitly, by giving correct answers) noticing that the weighting function in (6b) is a sum of Kravchuk polynomials, not a Kravchuk polynomials as I mistakenly wrote in the solutions.
Kudos also to Ryan Williams for improving the upper bound in problem 1 in some cases: He showed $\mathbb{I}(f) \leq (2 - 2\alpha/(1+\alpha)) w$ when $f$ is a DNF of width $w = \alpha n$. Note that this is tight in the case $\alpha = 1$, by the Parity function.
Open problem note: I'm guessing that indeed $\mathbb{I}(f) \leq w$ for any width-$w$ DNF. Might be a nice thing to try to prove, although I couldn't guarantee that it isn't already known.
Monday, March 19, 2007
Homework #4 is out
Hi all. Homework #4 is posted on the course homepage; it's due Tuesday April 3. I'll bring hard copies to class tomorrow. I should also have Homework #3 graded by tomorrow.
Friday, March 16, 2007
History of the Hypercontractivity Theorem
Hope you're having a good spring break. I put up the lecture notes for Lecture 16, on the Hypercontractivity Theorem. I find the history of the theorem quite fascinating, so I've put a slightly briefer version below. If you read French, you may also want to look at Djalil Chafai's version. Gross also has a nice survey about the history, but unfortunately I can't find it right now...
---
The history of the Hypercontractivity Theorem is quite involved, and
interesting. To understand the history, it's important to first note
that it has a "Gaussian'' version. In this setting it applies to functions $f : \R^n \to \R$, where $\R^n$ is thought of as having the Gaussian distribution and the $T_\rho$ operator is an appropriately generalized linear operator (specifically, the "Ornstein-Uhlenbeck operator" from mathematical physics, which we'll encounter later). Gross observed in 1975 that this Gaussian version follows from our boolean version, by a straightforward argument using the Central Limit Theorem.
An early version of the theorem in the Gaussian setting was proved by Edward Nelson of Princeton in the mid-60's [Nel:66]; I think it was the $q = 4$, $p = 2$ case, possibly with a constant bigger than $1$ on the right side. This was in an important paper on quantum field theory. Several works in subsequent years (e.g., Glimm '68, Federbush '69) improved on the result, and the culmination was the complete proof of the theorem in the Gaussian setting, due to Nelson again [Nel:73]. Nelson's two papers won him the Steele Prize. He is an interesting character, having gone on to work on foundations of mathematics, bounded arithmetic, and automatic proof verification; he is now well-known for having invented Internal Set Theory, a partial axiomatization of Nonstandard Analysis.
In 1973, Leonard Gross proved a limiting version of the theorem called a Logarithmic Sobolev Inequality, and deduced Nelson's Hypercontractive theorem from it [Gro:75]. His proof was in the boolean setting, getting the Gaussian setting via the Central Limit Theorem. However, it turned out that the proof of the Hypercontractivity Theorem (in the boolean setting) was not new; it was first proved by Aline Bonami [Bon:70] in 1979. In fact, Bonami had proved the $q = 4$, $p = 2$ case in the boolean setting even earlier [Bon:68].
The history of the theorem can be traced even further back; Bonami's work was informed by that of Walter Rudin, who proved [Rud:60] similar inequalities in the setting of $\Z_n$ rather than $\{-1,1\}^n$ (``one of my favorite papers'' -- Rudin). Further, a version of the log-sobolev inequality in the Euclidean (rather than Gaussian) setting was proved by A. J. Stam [Sta:59] in 1959, in work on Fisher/Shannon information theory -- much closer to the world of computer science!
Finally, the Hypercontractivity Theorem was introduced to the world of theoretical computer science in the groundbreaking work of Kahn, Kalai, and Linial [KKL:88]. Unfortunately, they attributed the theorem to William Beckner [Bec:75], which I'm not sure is exactly an accurate accreditation. Beckner's work was in fact important followup work on the work of Nelson and Gross, making extensions to Euclidean and complex settings.
In the computer science theory literature, the Hypercontractivity Theorem is often called "Beckner's Theorem". Lately there has been a move towards "Bonami-Beckner Theorem", although "Bonami Theorem" would respect the original discoverer and "Bonami-Gross Theorem" might more properly respect the independent discovery. To sidestep the issue, we will simply call it the Hypercontractivity Theorem.
---
The history of the Hypercontractivity Theorem is quite involved, and
interesting. To understand the history, it's important to first note
that it has a "Gaussian'' version. In this setting it applies to functions $f : \R^n \to \R$, where $\R^n$ is thought of as having the Gaussian distribution and the $T_\rho$ operator is an appropriately generalized linear operator (specifically, the "Ornstein-Uhlenbeck operator" from mathematical physics, which we'll encounter later). Gross observed in 1975 that this Gaussian version follows from our boolean version, by a straightforward argument using the Central Limit Theorem.
An early version of the theorem in the Gaussian setting was proved by Edward Nelson of Princeton in the mid-60's [Nel:66]; I think it was the $q = 4$, $p = 2$ case, possibly with a constant bigger than $1$ on the right side. This was in an important paper on quantum field theory. Several works in subsequent years (e.g., Glimm '68, Federbush '69) improved on the result, and the culmination was the complete proof of the theorem in the Gaussian setting, due to Nelson again [Nel:73]. Nelson's two papers won him the Steele Prize. He is an interesting character, having gone on to work on foundations of mathematics, bounded arithmetic, and automatic proof verification; he is now well-known for having invented Internal Set Theory, a partial axiomatization of Nonstandard Analysis.
In 1973, Leonard Gross proved a limiting version of the theorem called a Logarithmic Sobolev Inequality, and deduced Nelson's Hypercontractive theorem from it [Gro:75]. His proof was in the boolean setting, getting the Gaussian setting via the Central Limit Theorem. However, it turned out that the proof of the Hypercontractivity Theorem (in the boolean setting) was not new; it was first proved by Aline Bonami [Bon:70] in 1979. In fact, Bonami had proved the $q = 4$, $p = 2$ case in the boolean setting even earlier [Bon:68].
The history of the theorem can be traced even further back; Bonami's work was informed by that of Walter Rudin, who proved [Rud:60] similar inequalities in the setting of $\Z_n$ rather than $\{-1,1\}^n$ (``one of my favorite papers'' -- Rudin). Further, a version of the log-sobolev inequality in the Euclidean (rather than Gaussian) setting was proved by A. J. Stam [Sta:59] in 1959, in work on Fisher/Shannon information theory -- much closer to the world of computer science!
Finally, the Hypercontractivity Theorem was introduced to the world of theoretical computer science in the groundbreaking work of Kahn, Kalai, and Linial [KKL:88]. Unfortunately, they attributed the theorem to William Beckner [Bec:75], which I'm not sure is exactly an accurate accreditation. Beckner's work was in fact important followup work on the work of Nelson and Gross, making extensions to Euclidean and complex settings.
In the computer science theory literature, the Hypercontractivity Theorem is often called "Beckner's Theorem". Lately there has been a move towards "Bonami-Beckner Theorem", although "Bonami Theorem" would respect the original discoverer and "Bonami-Gross Theorem" might more properly respect the independent discovery. To sidestep the issue, we will simply call it the Hypercontractivity Theorem.
Tuesday, March 13, 2007
Thursday, March 08, 2007
Shouts-out
On the topic of hypercontractivity, I just wanted to give a couple of shouts-out. Back when I learned this stuff, I didn't have any intuition at all about the hypercontractivity theorem, and would ask all and sundry, "Do you have any idea what's really going on here?" "Do you have any idea what's really going on here?"
Krzysztof Oleszkiewicz showed me the slick, simple induction proof of $(2, 4, 1/\sqrt{3}^d)$-hypercontractivity for boolean functions with degree at most $d$. I asked him, "Who is this proof due to? You?" He said, "No no no, not me. Folklore, I guess."
Top Analysis-of-Boolean-Functions expert Guy Kindler showed me the nice level-sets, geometrical-picture proof of the Two-Point Inequality.
So -- thanks to both of them!
Krzysztof Oleszkiewicz showed me the slick, simple induction proof of $(2, 4, 1/\sqrt{3}^d)$-hypercontractivity for boolean functions with degree at most $d$. I asked him, "Who is this proof due to? You?" He said, "No no no, not me. Folklore, I guess."
Top Analysis-of-Boolean-Functions expert Guy Kindler showed me the nice level-sets, geometrical-picture proof of the Two-Point Inequality.
So -- thanks to both of them!
Wednesday, March 07, 2007
Thursday's class: a warning
Hi all. I know we're all in the School of Computer Science, but Thursday's class is going to be all math, all the time. We've already proved most (all?) of the hypercontractivity stuff we'll need. But you can't have a class on Fourier Analysis of Boolean Functions without actually doing the full-blown Bonami-Beckner theorem. I mean, just for culture's sake.
So I'm just warning you, it's going to be all hot Minkowski Inequality and Generalized Binomial Theorem action on Thursday. I think I'll scribe myself.
So I'm just warning you, it's going to be all hot Minkowski Inequality and Generalized Binomial Theorem action on Thursday. I think I'll scribe myself.
Tuesday, March 06, 2007
Homework 3 Corrections
Hi -- two more corrections for Homework 3. I guess this is what you get when running a course for the first time...
Question 1: Eliminate the part "and show that this is tight". In fact, I strongly believe it's not tight. For extra credit, improve on the constant factor 2.
Question 3a: This one is too hard. I thought I knew an easy way to do it, but I was mistaken. For the story on this problem, see this paper of Amano and Maruoka (but note that the theorem in their paper improving on KKL was already known).
A corrected version of the homework is on the course web page.
--------
Two small additional clarifications: In 4a, you may assume the algorithm knows $\|\hat{f}\|_1$. In 5a, as Aaron pointed out, it would probably be more sensible if I asked for you to show the estimate is within $\sqrt{\epsilon/n^d}$ rather than $\epsilon$; however, it doesn't really matter since $\epsilon$ is a parameter.
Question 1: Eliminate the part "and show that this is tight". In fact, I strongly believe it's not tight. For extra credit, improve on the constant factor 2.
Question 3a: This one is too hard. I thought I knew an easy way to do it, but I was mistaken. For the story on this problem, see this paper of Amano and Maruoka (but note that the theorem in their paper improving on KKL was already known).
A corrected version of the homework is on the course web page.
--------
Two small additional clarifications: In 4a, you may assume the algorithm knows $\|\hat{f}\|_1$. In 5a, as Aaron pointed out, it would probably be more sensible if I asked for you to show the estimate is within $\sqrt{\epsilon/n^d}$ rather than $\epsilon$; however, it doesn't really matter since $\epsilon$ is a parameter.
Monday, March 05, 2007
Homework correction + the Ajtai-Linial Function
First, Aaron correctly pointed out a mistake on the problem set. In 4a, you should also assume the algorithm has query access to $f$. The corrected version of the homework is on the course web page.
Now for the real topic of the post. Recall that KKL proved that if $f : \{-1,1\}^n \to \{-1,1\}$ is any roughly balanced function, there is coalition $J \subseteq [n]$ of size $O(n / \log n)$ with influence .999 on $f$.
A construction of Ajtai and Linial (The influence of large coalitions, Combinatorica 13, 1993; annoyingly, not available online) shows this is close to optimal.
The [AL] construction is randomized. Recall the Tribes function, which breaks up the input into disjoint blocks ("tribes") of size about $\log n - C \log \log n$ and takes the OR across blocks of the AND of the blocks. Here $C$ is a carefully chosen constant (around $1$) picked so that the overall function is roughly balanced.
Let's say a random nonmonotone Tribes function of width $w$ is given by randomly partitioning $[n]$ into blocks of size $w$, randomly choosing whether to negate each variable (50-50), and then again ANDing within blocks and ORing across blocks.
Choose $w$ so that these functions will be True with probability around $1 - (\ln 2)/n$. (This involves taking $C \sim 2$.)
Finally, the [AL] construction is this: Choose $n$ random nonmonotone Tribes functions with this width, and AND them together.
--------
Here is a sketch of the properties of this (randomly chosen) function:
1. The probability it is True is very close to $1/2$, with very high probability.
2. It's hard for small coalitions to force the function towards True. Because to do this, they have to force many of the random Tribes functions towards True, and for each such function ($\sim n$ of them), they have to force one of its Tribes (about $\log n$ variables) towards True. By the random construction, the different Tribes don't overlap much, so the coalition has to be nearly linear in size.
3. It's hard for small coalitions to force the function towards False. To do this, they have to force at least one of the Tribes functions towards False, and this requires fixing at least one variable in each tribe within the function ($\sim n/\log n$ variables).
--------
The actual theorem is that with high probability over the choice of $f$, $f$ is near-balanced and every coalition $J$ of size $o(n/\log^2 n)$ has influence $o(1)$ on $f$.
Oddly enough, the hardest part of the proof is showing that the functions are near-balanced with very high probability.
--------
Some open problems:
1. Close the gap between $n/\log^2 n$ and $n/\log n$. [AL] speculated that the latter is probably the truth, but I'm not so sure...
2. Give an explicit function where all $o(n/\log^2 n)$-size coalitions have $o(1)$ influence. In fact, I believe it's open even to do this for $c \sqrt{n}$-size coalitions for large enough $c$.
Now for the real topic of the post. Recall that KKL proved that if $f : \{-1,1\}^n \to \{-1,1\}$ is any roughly balanced function, there is coalition $J \subseteq [n]$ of size $O(n / \log n)$ with influence .999 on $f$.
A construction of Ajtai and Linial (The influence of large coalitions, Combinatorica 13, 1993; annoyingly, not available online) shows this is close to optimal.
The [AL] construction is randomized. Recall the Tribes function, which breaks up the input into disjoint blocks ("tribes") of size about $\log n - C \log \log n$ and takes the OR across blocks of the AND of the blocks. Here $C$ is a carefully chosen constant (around $1$) picked so that the overall function is roughly balanced.
Let's say a random nonmonotone Tribes function of width $w$ is given by randomly partitioning $[n]$ into blocks of size $w$, randomly choosing whether to negate each variable (50-50), and then again ANDing within blocks and ORing across blocks.
Choose $w$ so that these functions will be True with probability around $1 - (\ln 2)/n$. (This involves taking $C \sim 2$.)
Finally, the [AL] construction is this: Choose $n$ random nonmonotone Tribes functions with this width, and AND them together.
--------
Here is a sketch of the properties of this (randomly chosen) function:
1. The probability it is True is very close to $1/2$, with very high probability.
2. It's hard for small coalitions to force the function towards True. Because to do this, they have to force many of the random Tribes functions towards True, and for each such function ($\sim n$ of them), they have to force one of its Tribes (about $\log n$ variables) towards True. By the random construction, the different Tribes don't overlap much, so the coalition has to be nearly linear in size.
3. It's hard for small coalitions to force the function towards False. To do this, they have to force at least one of the Tribes functions towards False, and this requires fixing at least one variable in each tribe within the function ($\sim n/\log n$ variables).
--------
The actual theorem is that with high probability over the choice of $f$, $f$ is near-balanced and every coalition $J$ of size $o(n/\log^2 n)$ has influence $o(1)$ on $f$.
Oddly enough, the hardest part of the proof is showing that the functions are near-balanced with very high probability.
--------
Some open problems:
1. Close the gap between $n/\log^2 n$ and $n/\log n$. [AL] speculated that the latter is probably the truth, but I'm not so sure...
2. Give an explicit function where all $o(n/\log^2 n)$-size coalitions have $o(1)$ influence. In fact, I believe it's open even to do this for $c \sqrt{n}$-size coalitions for large enough $c$.
Wednesday, February 28, 2007
Sunday, February 25, 2007
Homework 2 graded
I just finished grading Homework #2. The average was 63.25 out of 70, which is great. Pretty much everybody got pretty much everything correct.
If anything, people were too diligent; the commonest "errors" (not actually errors) seemed to be:
a) rederiving Plancherel (E[$fg$] = $\sum_S \hat{f}(S) \hat{g}(S)$) over and over; and,
b) reintroducing $D_i f$ without realizing it in later problems and rederiving the facts from Problem 1.
As mentioned in a comment, I'll be doing the solution for the bonus problem in class later.
If anything, people were too diligent; the commonest "errors" (not actually errors) seemed to be:
a) rederiving Plancherel (E[$fg$] = $\sum_S \hat{f}(S) \hat{g}(S)$) over and over; and,
b) reintroducing $D_i f$ without realizing it in later problems and rederiving the facts from Problem 1.
As mentioned in a comment, I'll be doing the solution for the bonus problem in class later.
Thursday, February 22, 2007
Learning low-degree polynomials over GF(2)
From the lecture on learning juntas, I skipped the learning-theoretic proof that when learning $\mathbb{F}_2$-multilinear polynomials of degree at most $e$, any degree-$e$ polynomial consistent with enough data will with high probability be the truth. The very short proof, using a result from Homework 2, is here.
Condorcet on Randomized Voting
The quotation from his 1788 writing On the form of decisions made by plurality vote:
"But after considering the facts, the average values or the results, we still need to determine their probability."
---
Lauding of the guy aside, apparently a lot of his mathematical writing was riddled with bugs or otherwise unfollowable.
"But after considering the facts, the average values or the results, we still need to determine their probability."
---
Lauding of the guy aside, apparently a lot of his mathematical writing was riddled with bugs or otherwise unfollowable.
The rationality of weakly symmetric functions
Towards the end of class I quickly said the following:
Let $f : \{-1,1\}^n \to \{-1,1\}$ be any social choice function, and suppose we try to use it to rank 3 candidates via 3 pairwise comparisons using $f$. The Rationality of $f$ is the probability (when the voters' preferences are i.i.d. and uniform) that this doesn't not result in a cyclic preference.
Then if $f$ is weakly symmetric, the Rationality of $f$ is at most $7/9 + 4/9\pi \approx .919$. Here is the proof. Since the Rationality of Majority is $3/4 - (3/2\pi) \arcsin(-1/3) \approx .912$, this is looking pretty good for Majority.
In fact, later in this course we will prove the Majority Is Stablest Theorem, which in particular implies that any weakly symmetric function has Rationality at most that of Majority (plus $O(\log \log n / \log n)$).
Let $f : \{-1,1\}^n \to \{-1,1\}$ be any social choice function, and suppose we try to use it to rank 3 candidates via 3 pairwise comparisons using $f$. The Rationality of $f$ is the probability (when the voters' preferences are i.i.d. and uniform) that this doesn't not result in a cyclic preference.
Then if $f$ is weakly symmetric, the Rationality of $f$ is at most $7/9 + 4/9\pi \approx .919$. Here is the proof. Since the Rationality of Majority is $3/4 - (3/2\pi) \arcsin(-1/3) \approx .912$, this is looking pretty good for Majority.
In fact, later in this course we will prove the Majority Is Stablest Theorem, which in particular implies that any weakly symmetric function has Rationality at most that of Majority (plus $O(\log \log n / \log n)$).
Monday, February 19, 2007
Learning juntas
The junta-learning problem is: You have access to uniform-distribution random examples from a function $f : \{-1,1\}^n \to \{-1,1\}$ that is promised to depend only on some subset of $r$ of the variables. Think of $r$ as $\log n$, or even just a large constant. The task is to determine the $r$ relevant variables in time substantially better than $n^{r}$.
For more on this problem, see Section 2 of this collection of open problems, written by Avrim Blum.
In class tomorrow we will see an algorithm that finds the variables in time roughly $n^{.704 r}$. As stated in Avrim's note, if you can improve this to just $n^{.666 r}$, he'll give you \$ 100. If you can improve it to $n^{O(1)}$ for $r = \log n$, he'll give you \$ 1000!
Fourier analysis. It can be lucrative.
For more on this problem, see Section 2 of this collection of open problems, written by Avrim Blum.
In class tomorrow we will see an algorithm that finds the variables in time roughly $n^{.704 r}$. As stated in Avrim's note, if you can improve this to just $n^{.666 r}$, he'll give you \$ 100. If you can improve it to $n^{O(1)}$ for $r = \log n$, he'll give you \$ 1000!
Fourier analysis. It can be lucrative.
Tuesday, February 13, 2007
What's new in Fourier Analysis?
Insofar as a lot of the stuff we do in class is from the olden days (the '90s, let's say), you might be wondering what's going on in Fourier Analysis of Boolean Functions currently. Here are some highlights from the recent top conferences. I would like to emphasize that there are lots of other good recent papers on the topic!
The first few papers in the list may be particularly readable for people taking the class.
- STOC '06: Gowers Uniformity, Influence of Variables, and PCPs by Alex Samorodnitsky and Luca Trevisan. Topic: Hardness of Approximation and pure Fourier Analysis.
- FOCS 2005: Nonembeddability theorems via Fourier analysis by Subhash Khot and Assaf Naor. Topic: Metric Spaces.
- GAFA journal 2007: Boolean functions with small spectral norm by Ben Green and Tom Sanders. Topic: Classical Analysis. (Mentioned in class today.)
- COLT 2006: Improved lower bounds for learning intersections of halfspaces by Adam Klivans and Sasha Sherstov. Topic: Learning Theory.
- FOCS 2005: On non-approximability for quadratic programs by Sanjeev Arora, Eli Berger, Elad Hazan, Guy Kindler, and Muli Safra. Topic: Hardness of Approximation.
- FOCS 2005: On Delsarte's Linear Programming Bounds for Binary Codes by Michael Navon and Alex Samorodnitsky. Topic: Coding Theory.
- FOCS 2005: Lower bounds for the noisy broadcast problem by Navin Goyal, Guy Kindler, and Mike Saks. Topic: Distributed Networks.
- Complexity 2005: On the hardness of approximating Multicut and Sparsest-Cut by Shuchi Chawla (of CMU), Robi Krauthgamer, Ravi Kumar, Yuval Rabani, and D. Sivakumar. Topic: Hardness of Approximation.
- Improving on a Complexity 2005 paper: Learning symmetric k-juntas in time $n^{o(k)}$ by Mihail Kolountzakis, Evangelos Markakis, and Aranyak Mehta. Topic: Number Theory and Learning Theory.
The first few papers in the list may be particularly readable for people taking the class.
Tuesday, February 06, 2007
Homework #2
A student asked if we've already covered in lecture everything needed for Homework #2. The answer is yes.
Questions about the homework can go in the comments of this post.
Questions about the homework can go in the comments of this post.
Sampling
In class, I handwaved over the proof that we can efficiently estimate $E[F_{S \subseteq I}(x)^2]$. The proof is very easy:
$E_x[F_{S \subseteq I}(x)^2] = E_x[\hat{f_x}(S)^2]$ $= E_x[\hat{f_x}(S)^2] = E_x[(E_w[f_x(w)w_S])^2]$.
Now we use a trick: Pick two copies of $w$ independently:
... $ = E_x[E_{w,w'}[f_x(w) w_S f_x(w') w'_S]]$ $= E_{x,w,w'}[f(w,x)w_S f(w',x) w'_S]$.
And this expectation can be efficiently estimated empirically, since we can sample the random variable inside the expectation, and this r.v. is bounded in $[-1,1]$. (I guess officially what I keep using here is "Hoeffding's Bound".)
--
Also: at the end of class, when showing how to break the one-way permutation $f$ given a PPT algorithm $A$ for guessing the hard-core predicate, we had that if $A$ is deterministic, we get that for a $2\gamma$ fraction of $x$'s, $\Pr_r[A(f(x), r) = x \cdot r] \geq 1/2 + \gamma$. If $A$ is not deterministic, let $\mathcal{A}$ denote the function which on input $r$ equals $E_{A}[A(f(x), r)]$, where the expectation is over $A$'s randomness. Now $\mathcal{A}$ is a function with range in $[-1,1]$, so we can apply Goldreich-Levin to it...
Except not quite, since in GL we assume we actually have access to $\mathcal{A}$, whereas really, we only have access to a $\{-1,1\}$-valued randomized function whose expectation is $\mathcal{A}$. But I hope you can convince yourself that that's good enough; basically, we just have to throw in an expectation over $A$'s random coins in the derivation in the first part of this post.
$E_x[F_{S \subseteq I}(x)^2] = E_x[\hat{f_x}(S)^2]$ $= E_x[\hat{f_x}(S)^2] = E_x[(E_w[f_x(w)w_S])^2]$.
Now we use a trick: Pick two copies of $w$ independently:
... $ = E_x[E_{w,w'}[f_x(w) w_S f_x(w') w'_S]]$ $= E_{x,w,w'}[f(w,x)w_S f(w',x) w'_S]$.
And this expectation can be efficiently estimated empirically, since we can sample the random variable inside the expectation, and this r.v. is bounded in $[-1,1]$. (I guess officially what I keep using here is "Hoeffding's Bound".)
--
Also: at the end of class, when showing how to break the one-way permutation $f$ given a PPT algorithm $A$ for guessing the hard-core predicate, we had that if $A$ is deterministic, we get that for a $2\gamma$ fraction of $x$'s, $\Pr_r[A(f(x), r) = x \cdot r] \geq 1/2 + \gamma$. If $A$ is not deterministic, let $\mathcal{A}$ denote the function which on input $r$ equals $E_{A}[A(f(x), r)]$, where the expectation is over $A$'s randomness. Now $\mathcal{A}$ is a function with range in $[-1,1]$, so we can apply Goldreich-Levin to it...
Except not quite, since in GL we assume we actually have access to $\mathcal{A}$, whereas really, we only have access to a $\{-1,1\}$-valued randomized function whose expectation is $\mathcal{A}$. But I hope you can convince yourself that that's good enough; basically, we just have to throw in an expectation over $A$'s random coins in the derivation in the first part of this post.
Saturday, February 03, 2007
Homework #2 is out
I'll bring hard copies to class on Tuesday, but you can find it on the web page now. It is due Tuesday, February 20 at the beginning of class.
On quasirandomness
I just realized I made a mistake in Lecture 5. I said being (small,small)-quasirandom was equivalent to having small correlation with every small junta. That's not true. If you are quasirandom then you have small correlation with every small junta; this will be on the next homework. However, if you have small correlation with every small junta, you are not necessarily quasirandom.
If you want to think about it, you may be able to convince yourself that the function $f(x) = "x_1 \oplus$ Majority$(x_2, ..., x_n)"$ has $o(1)$ correlation with every function on $o(\sqrt{n})$ bits, and yet it is not even $(.3,.3)$-quasirandom!
So the property of having low correlation with every small junta is weaker then our notion of being quasirandom. As it happens, the legit hardness of approximation for 3Lin result (without using UGC) uses the fact that that even functions with this weaker property are rejected by the H{\aa}st-Odd test with probability close to $1/2$.
If you want to think about it, you may be able to convince yourself that the function $f(x) = "x_1 \oplus$ Majority$(x_2, ..., x_n)"$ has $o(1)$ correlation with every function on $o(\sqrt{n})$ bits, and yet it is not even $(.3,.3)$-quasirandom!
So the property of having low correlation with every small junta is weaker then our notion of being quasirandom. As it happens, the legit hardness of approximation for 3Lin result (without using UGC) uses the fact that that even functions with this weaker property are rejected by the H{\aa}st-Odd test with probability close to $1/2$.
Friday, February 02, 2007
Homework 1 graded
I finished grading Homework #1. The average was 80% (56/70) with a max of 90% and a standard deviation of about 7%. Pretty good.
I put solutions up on the web page. Note that they are not necessarily maximally simple; for example, I think Amitabh had a nicer proof for #1 than I did, and Moritz had a nicer proof for #3b then I did. Also almost everybody did #2 by induction, which is about equally easy.
Most common mistakes were: #1 -- using Pr[A and B] = Pr[A] Pr[B]; #3a -- messing up the signs; #5 -- failing to do if *and* only if.
Re #3a -- I realized that I kind of messed up the signs in the problem statement. Actually, what I wrote was technically correct, but misleading ;) Anyway, it only cost people -1 out of 70.
Re #7b -- The hint I gave on the blog was actually too complicated. I simplified problem #7b at the last minute, and the hint was needed for the original more complicated version.
I put solutions up on the web page. Note that they are not necessarily maximally simple; for example, I think Amitabh had a nicer proof for #1 than I did, and Moritz had a nicer proof for #3b then I did. Also almost everybody did #2 by induction, which is about equally easy.
Most common mistakes were: #1 -- using Pr[A and B] = Pr[A] Pr[B]; #3a -- messing up the signs; #5 -- failing to do if *and* only if.
Re #3a -- I realized that I kind of messed up the signs in the problem statement. Actually, what I wrote was technically correct, but misleading ;) Anyway, it only cost people -1 out of 70.
Re #7b -- The hint I gave on the blog was actually too complicated. I simplified problem #7b at the last minute, and the hint was needed for the original more complicated version.
Thursday, February 01, 2007
Hardness via Unique Games Conjecture
Hey folks -- since today's lecture was pretty technical, feel free to ask for clarifications in the comments.
The main takeaway is that if you want to prove a strong hardness-of-approximation result for your favorite constraint satisfaction problem, all you need to do (assuming UGC) is design a strong "dictatorship vs. quasirandom" function tester, where the tests the tester uses are the same as the constraints you're trying to prove hardness for.
E.g., much later in the term we will give a function tester $T$ which makes $2$ queries $x, y \in \{-1,1\}^n$ and then checks that $f(x) \neq f(y)$. We will show that $T$ accepts all dictators with probability at least $.9001$ and rejects all $(.0001, .0001)$-quasirandom functions with probability at most $.7999$.
As a consequence, we get (assuming UGC) that given a CSP over variables $v_1, ... v_n$ where all the constraints are of the form $v_i \neq v_j$, it is NP-hard to distinguish the case that there is an assignment satisfying at least a $.9$ fraction of the constraints and the case that every assignment satisfies at most a $.8$ fraction. Since CSPs with $\neq$ constraints are the same as the "Max-Cut" problem, this shows that giving a $.8/.9 = .8888$ approximation for Max-Cut is NP-hard.
---
As mentioned, on Tuesday we'll move on to the the next segment of the course, Learning. Also, homework #2 will come out. Have a good weekend!
The main takeaway is that if you want to prove a strong hardness-of-approximation result for your favorite constraint satisfaction problem, all you need to do (assuming UGC) is design a strong "dictatorship vs. quasirandom" function tester, where the tests the tester uses are the same as the constraints you're trying to prove hardness for.
E.g., much later in the term we will give a function tester $T$ which makes $2$ queries $x, y \in \{-1,1\}^n$ and then checks that $f(x) \neq f(y)$. We will show that $T$ accepts all dictators with probability at least $.9001$ and rejects all $(.0001, .0001)$-quasirandom functions with probability at most $.7999$.
As a consequence, we get (assuming UGC) that given a CSP over variables $v_1, ... v_n$ where all the constraints are of the form $v_i \neq v_j$, it is NP-hard to distinguish the case that there is an assignment satisfying at least a $.9$ fraction of the constraints and the case that every assignment satisfies at most a $.8$ fraction. Since CSPs with $\neq$ constraints are the same as the "Max-Cut" problem, this shows that giving a $.8/.9 = .8888$ approximation for Max-Cut is NP-hard.
---
As mentioned, on Tuesday we'll move on to the the next segment of the course, Learning. Also, homework #2 will come out. Have a good weekend!
Wednesday, January 31, 2007
Håst-Odd as a Dictatorship vs. Quasirandom test
At the end of last class I stated a proposition we'll need, which says that an $(\epsilon^2, \delta)$-quasirandom function passes the Håst-Odd${}_\delta$ test with probability at most $1/2 + \epsilon/2$.
I meant to prove it at the beginning of tomorrow's class but, surprise surprise, it looks like we'll be pressed for time even with out it. So it goes, as Vonnegut would say.
Here then, is the analysis. Really, it's pretty easy. It basically proves itself.
I meant to prove it at the beginning of tomorrow's class but, surprise surprise, it looks like we'll be pressed for time even with out it. So it goes, as Vonnegut would say.
Here then, is the analysis. Really, it's pretty easy. It basically proves itself.
Tuesday, January 30, 2007
Boosting the correctness probability in testing
In defining what it means to be a tester $T$ for property $\mathcal{P}$, I wrote: if $f$ satisfies $\mathcal{P}$ then Pr[$T(f)$ accepts] $> 2/3$ and if $f$ is $\epsilon$-far from satisfying $\mathcal{P}$ then Pr[$T(f)$ accepts] $< 1/3$. I then added that the $2/3$ and $1/3$ here is "arbitrary" and could be "boosted".
This phrasing is pretty imprecise, and an anonymous commenter asked what exactly I meant. What's meant is this: Suppose you have a tester $T$ making $q$ queries that satisfies the given definition. Then for any constant $\delta > 0$, there is another tester $T'$ making $O(q)$ queries that satisfies the definition with $1-\delta$ and $\delta$ in place of $2/3$ and $1/3$. So if you don't care about constant factors in the number of queries, then the choice of $2/3$ and $1/3$ is arbitrary. In "testing" we don't usually care about constant factors on the number of queries, although in "local testing" we often do.
The way to get $T'$ from $T$ is to just run it $O(\log(1/\delta))$ many times independently and then accept iff at least half of the runs accepted. Then the Chernoff bound implies the claim.
This phrasing is pretty imprecise, and an anonymous commenter asked what exactly I meant. What's meant is this: Suppose you have a tester $T$ making $q$ queries that satisfies the given definition. Then for any constant $\delta > 0$, there is another tester $T'$ making $O(q)$ queries that satisfies the definition with $1-\delta$ and $\delta$ in place of $2/3$ and $1/3$. So if you don't care about constant factors in the number of queries, then the choice of $2/3$ and $1/3$ is arbitrary. In "testing" we don't usually care about constant factors on the number of queries, although in "local testing" we often do.
The way to get $T'$ from $T$ is to just run it $O(\log(1/\delta))$ many times independently and then accept iff at least half of the runs accepted. Then the Chernoff bound implies the claim.
Details from the end of Lecture 5
Many thanks to Daniel Golovin for providing the Book proof of $|S|(1-\delta)^{|S|-1} \leq 1/\delta$ for each $|S| = 0, 1, ...$, the claim we needed to show that any function has at most $1/\epsilon \delta$ variables with $(1-\delta)$-attenuated influence at least $\epsilon$. His proof:
Fix any $|S|$. If $i \leq |S|$ then certainly $(1-\delta)^{|S| - 1} \leq (1-\delta)^{i - 1}$. Sum this fact over $i = 1...|S|$ to conclude $|S| (1-\delta)^{|S| - 1} \leq \sum_{i=1}^{|S|} (1-\delta)^{i-1}$. But $\sum_{i=1}^{|S|} (1-\delta)^{i-1} \leq \sum_{i = 1}^{\infty} (1-\delta)^{i-1} = 1/\delta$.
---
Re one thing I mentioned at the end of class: The constant functions $1$ and $-1$ pass the Håst-Odd test with probability $1/2$ (for every $\delta$). The reason is, the test picks some $x$, $y$, and $z$, but then also picks random signs $a,b,c \in \{-1,1\}$ and tests "$af(ax) \cdot bf(by) \cdot cf(cz) = 1$", which is equivalent to $f(ax)f(by)f(cz) = abc$. Now if $f$ is constant then so is $f(ax)f(by)f(cz)$, and then $abc$ is a random sign, which agrees with the constant with probability $1/2$.
Note also that the "acceptance constraint" in the Håst-Odd test is either of the form "$v_{i_1} v_{i_2} v_{i_3} = 1$" or "$v_{i_1} v_{i_2} v_{i_3} = -1$". I.e., it's a 3-variable "linear" equation where the right-hand side is either "0" or "1" (if you go back and interpret bits as $\mathbb{F}_2$ rather than $\{-1,1\}$).
Fix any $|S|$. If $i \leq |S|$ then certainly $(1-\delta)^{|S| - 1} \leq (1-\delta)^{i - 1}$. Sum this fact over $i = 1...|S|$ to conclude $|S| (1-\delta)^{|S| - 1} \leq \sum_{i=1}^{|S|} (1-\delta)^{i-1}$. But $\sum_{i=1}^{|S|} (1-\delta)^{i-1} \leq \sum_{i = 1}^{\infty} (1-\delta)^{i-1} = 1/\delta$.
---
Re one thing I mentioned at the end of class: The constant functions $1$ and $-1$ pass the Håst-Odd test with probability $1/2$ (for every $\delta$). The reason is, the test picks some $x$, $y$, and $z$, but then also picks random signs $a,b,c \in \{-1,1\}$ and tests "$af(ax) \cdot bf(by) \cdot cf(cz) = 1$", which is equivalent to $f(ax)f(by)f(cz) = abc$. Now if $f$ is constant then so is $f(ax)f(by)f(cz)$, and then $abc$ is a random sign, which agrees with the constant with probability $1/2$.
Note also that the "acceptance constraint" in the Håst-Odd test is either of the form "$v_{i_1} v_{i_2} v_{i_3} = 1$" or "$v_{i_1} v_{i_2} v_{i_3} = -1$". I.e., it's a 3-variable "linear" equation where the right-hand side is either "0" or "1" (if you go back and interpret bits as $\mathbb{F}_2$ rather than $\{-1,1\}$).
Subscribe to:
Posts (Atom)