18-859S: Analysis of Boolean Functions: 2007

Friday, May 04, 2007

Thanks

Well, the course is over -- thanks to everyone for coming and making it a fun affair; I really appreciate it. Don't forget to fill in the course evaluation forms.

Have a good summer.

Thursday, May 03, 2007

The final set of lecture notes are posted -- they contain open problems. Some are definitely harder than others; some are definitely more interesting than others. If you have any more questions about them, feel free to comment. Also, if you are interested in working on any of them, let me know! If you are looking for an easyish-to-think-about problem, one that you can think about idly with a pen and paper while riding the bus, I might suggest the problem of extending the decision tree/influences inequality to real-valued functions.

I probably missed half a dozen better open problems out there. I may add to this list as they occur to me. To paraphrase Mitch Hedberg, these are the open problems I could think of today.

Homework #5 graded

Homework 5 is graded. The average was 90%, with two people getting 39/40.

Most problems were done pretty well; interestingly, only one person attempted #7, even though it has the shortest solution after #3. A number of people did #2 but no one really got part (b) exactly correctly. The issue there is that you have a real-valued (not integer-valued) random variable and you know that it has exponentially small tails; you want to show that the contribution to its $L_1$-norm from large numbers is exponentially small. There is a battle here, between, the contribution of $|u|$ and the probability $O(e^{-u^2/2})$, which of course the exponentially small quantity is going to win; still, one has to integrate, as shown in the solutions.

Also, nobody got #6(c) quite correct. The Tribes function is not exactly balanced; its empty Fourier coefficient is just known to be $\pm O(\log n / n)$. Thus $\mathbb{N}\mathbb{S}_{1/2 - \gamma}(T)$ will always be at most $1/2 - \Omega(\log^2 n / n^2)$, which is why the condition $\gamma \geq 1/n$ is necessary.

I'm also pleased to report that the overall homework average was 83%. Good job, y'all.

Tuesday, May 01, 2007

Course evaluation

Please take a minute to fill in the course evaluation forms!

Szemeredi's Regularity Lemma, full version

The lecture notes for today are up; they include a full proof of the Szemeredi Regularity Lemma for $\mathbb{F}_2^n$ and also the associated result about testing "triangle"-freeness of subsets of $\mathbb{F}_2^n$.

Wednesday, April 25, 2007

Homework #5 & Talagrand

Solutions are posted.

Regarding Question #2: I referred to this as "Talagrand's Lemma", and that's how I personally think of it, having learned it first as Proposition 2.2 in Michel Talagrand's ingenious paper How much are increasing sets positively correlated?.

I think that, really, it should count as a folklore result. For example, in arithmetic combinatorics I think it's sometimes called Chang's Theorem. It is also the pith of the very technical proof of Bourgain's Theorem on the Fourier tails of boolean functions (not available online, but appearing in a paper of Khot and Naor). Bourgain uses a different proof, but doesn't quite derive it explicitly. The different proof relies on hypercontractivity and is extremely short:

Given $f : \{-1,1\}^n \to \mathbb{R}$, let $L$ be the degree-1 part of $f$. Then by Plancherel, $W_1(f) = \langle f, L \rangle \leq \|f\|_{q'} \|L\|_{q}$ for any $1/q + 1/q' = 1$, by Holder. Use hypercontractivity on $\|L\|_q$, noting that it has degree 1. One gets $\|L\|_q \leq \sqrt{q-1} \|L\|_2 = \sqrt{q-1} \sqrt{W_1(f)}$. On the other hand, we have $\|f\|_{q'} = E[|f|^{q'}]^{1/q'} \leq p^{1/q'} = p^{1 - 1/q}$ where $p = E[|f|]$, since $|f| \leq 1$ and $q' \geq 1$. Putting it together, we get $\sqrt{W_1(f)} \leq p^{1-1/q} \sqrt{q - 1}$, which implies $W_1(f) \leq p^2 \cdot q(1/p)^{2/q}$. Now we just need to choose $q$ to balance $q$ and $(1/p)^{2/q}$; the optimum choice is $q = \Theta(\log(1/p))$, and this completes the proof.

Tuesday, April 24, 2007

Randomized DT complexity of monotone graph properties

I posted the lecture notes for today; they contain a bit more information about Yao's Conjecture ($f$ a monotone graph property $\Rightarrow R(f) \geq \Omega(v^2)$).

Edinburgh

Just got back from a workshop on Geometry and Algorithms in Edinburgh. Was there any Fourier analysis? But of course! What would geometry and algorithms be without it?

Unfortunately, though, it's hard to get it into one's talk in 50 minutes, since one has to cover the geometry and algorithms parts. Guy Kindler mentioned some Fourier analysis in connection with UGC-hardness for Max-Cut. Assaf Naor talked a bit about his upcoming magnum opus with Manor Mendel on "Metric Cotype" which will appear in the Annals of Math. That paper is pretty much all Fourier analysis for functions $f : {\mathbb{Z}_m}^n \to \mathbb{R}$, but with $m > 2$. And Robi Krauthgamer came closest to actually doing some Fourier analysis in his slides, while talking about his work on the communication complexity and embeddability of Edit Distance. Unfortunately, his paper (joint with Alexandr Andoni) doesn't seem to be available yet.

I did my best to get there while talking about the parallel repetition/foams stuff I talked about at the ACO Seminar in March, but no dice. I do think the Fourier analysis we used in that result was pretty cute though; basically: (1) Talagrand's Lemma from Homework 5 Problem 2; and (2) the reversed form of the Hypercontractive Inequality, namely:

If $f : \{-1,1\}^n \to \mathbb{R}^{\geq 0}$ and $-\infty < q \leq p \leq 1$, then $\|T_\rho f\|_q \geq \|f\|_p$ so long as $0 \leq \rho \leq \sqrt{(1-p)/(1-q)}$.

You know -- just like the usual inequality, but reversed.

Friday, April 20, 2007

Homework #5

Hope you enjoyed Avrim on Tuesday and a day off on Thursday.
Please post any questions or comments about Homework #5 here in the comments.

Sunday, April 08, 2007

Homework #4 graded

The average was 35/40, with a high of 38. As usual, pretty much everybody knew what was going on.

The most popular questions seemed to be #1 and #2, even though those were definitely the longest (especially #1!). #5 is definitely the fastest, if you see how to do it.

Homework #4

Solutions are posted on the course web page.

Thursday, April 05, 2007

Geometry of correlated Gaussians

In class we were discussing the geometry of $\rho$-correlated, $n$-dimensional Gaussians; call them $\vec{g}$, $\vec{h}$.

We have that $\| \vec{g} \|^2 = \sum_{i=1}^n g_i^2$. This is the sum of $n$ very well-behaved independent random variables with mean $1$ and variance $2$, so its distribution value will be $n \pm O(\sqrt{n})$ with very high probability. (To be more accurate, say $n \pm O(\sqrt{n} \log n)$ with probability at least $1 - n^{-O(1)}$.) Hence $\|\vec{g}\|$ will be $\sqrt{n}(1 + O(1/\sqrt{n})$ with high probability. Since $\vec{h}$ is also distributed as a standard $n$-dimensional Gaussian, the same is true of $\vec{h}$.

Now imagine $\vec{g}$ is fixed and we choose $\vec{h}$ to be $\rho$-correlated to $\vec{g}$. Then $\vec{g} \cdot \vec{h} = \sum_{i=1}^n g_i h_i = \sum_{i=1}^n (\rho g_i^2 + \sqrt{1-\rho^2} g_i g_i')$, where the $g_i'$ random variables are independent standard normals. I.e., the dot product is $\rho \|\vec{g}\|^2$ plus an independent one-dimensional normal, $\sqrt{1-\rho^2} \cdot N(0, \sum_i g_i^2)$. Since $\|\vec{g}\|^2 = \sum_i g_i^2 = n \pm O(\sqrt{n})$ with high probability, the dot product is $\rho n \pm O((\rho + \sqrt{1-\rho^2})\sqrt{n})$ with high probability.

So the cosine of the angle between $\vec{g}$ and $\vec{h}$ will be $(\rho n \pm O(\sqrt{n}))/(n \pm O(\sqrt{n})) = \rho \pm O(1/\sqrt{n})$, and hence the angle will be $\arccos \rho \pm O(1/\sqrt{n})$ with high probability (assuming $\rho$ is treated as a constant in $(0,1)$).

Now this doesn't really yet prove that $\vec{g}$ and $\vec{h}$ are distributed like a random pair on the surface of the $\sqrt{n}$-radius sphere with angle $\arccos \rho$. We just know that their angle will be $\arccos \rho$ and they will both be on the sphere. We should really look at the distribution of $\vec{h} - \rho \vec{g}$. But this is just a scaled $n$-dimensional Gaussian, so its distribution is spherically symmetric. I guess this more or less justifies the overall claim.
Actually, I believe that $\vec{h} - \rho \vec{g}$ will be close to orthogonal to $\rho \vec{g}$ with high probability. The dot product between the two will be distributed like $\sqrt{1-\rho^2} \cdot N(0, \sum_i g_i^2)$, which will probably be on the order of $\sqrt{1-\rho^2}\sqrt{n}$. However the product of the lengths of the two vectors will be like $\rho \sqrt{1-\rho^2} n$. So the cosine of the angle between them will be like $1/(\rho \sqrt{n})$, and the angle will be close to $90$ degrees.

Wednesday, April 04, 2007

Approximating not-nice functions by nice functions

In the last class I asserted that it was possible to approximate certain not-nice functions like $1_{t < t_0}$ and $|t|$ by $B$-nice functions. For example, I asserted that for all $t_0 \in \mathbb{R}$ and $0 < \lambda < 1/2$ there exists a function $\Delta_{t_0,\lambda} : \mathbb{R} \to \mathbb{R}$ which is $O(1/\lambda^4)$-nice and approximates the $t_0$-step-function in the following sense: $\Delta_{t_0, \lambda}(t) = 1$ for $t < t_0 - \lambda$; $\Delta_{t_0, \lambda}(t) = 0$ for $t > t_0 + \lambda$; and, $0 \leq \Delta_{t_0, \lambda}(t) \leq 1$ for $|t - t_0| \leq \lambda$.

There was a question in class as to whether this could really be possible. Specifically, if $\Delta_{t_0, \lambda}$ is $0$ for all $t > t_0 + \lambda$, and it's smooth, then it has all derivatives equal to $0$ for all $t > t_0 + \lambda$. Shouldn't this make it $0$ everywhere?

Well, not so. It's true that for any $t > t_0 + \lambda$, all derivatives are $0$. But Taylor's theorem does not force such a function to be $0$ everywhere. Remember that Taylor (for smooth functions) just says that for any $r \in \mathbb{N}$, $f(x + \epsilon) = f(x) + \epsilon f'(x) + \frac{\epsilon^2}{2!} f''(x) + \cdots + \frac{\epsilon^{r-1}}{(r-1)!} f^{(r-1)}(x) \epsilon^{r-1} + \frac{\epsilon^r}{r!} f^{(r)}(y)$ for some $y \in [x, x + \epsilon]$.

For a concrete example, consider the function $\phi(t)$ which is $\exp(-1/(1-t^2))$ for $|t| < 1$, and is $0$ for $|t| \geq 1$. It's easy to see that $\phi(t)$ is smooth on $(-1,1)$ and it's not hard to check that its derivatives, of all orders, at $\pm 1$ (when approached from inside) are $0$.

In fact, functions like $\phi$ (known as bump functions or mollifiers) are what one uses to create functions like $\Delta_{t_0, \lambda}$ -- essentially, by taking $\Delta * \phi_\lambda$, where $\Delta_{t_0}$ denotes the actual discontinuous step function, $*$ denotes convolution, and $\phi_\lambda$ denotes a compressed version of $\phi$, viz., $\phi_(t/\lambda)/\lambda$.

Tuesday, March 27, 2007

Ben Green and Terry Tao

Seems like Ben Green and (Fields Medalist) Terry Tao are into the analysis of boolean functions more and more these days. Check out Terry's recent blog post on Freiman's theorem in $\mathbb{F}_2^n$, AKA $\{-1,1\}^n$, and Ben's earlier guest post on the same blog on the Polynomial Freiman-Ruzsa conjecture. Their latest two papers (mentioned in Terry's post) have all of our favorites -- the boolean cube, Fourier analysis, monotone functions, shifting, quasirandomness, linear threshold functions.

If only I understood them...

As usual, they look impeccably written, so it's just a matter of putting in the time to read them. If any of you wants to do this and then explain them to me, that would be great :)

For a warmup, I always recommend Ben Green's terrific survey article, Finite Field Models In Additive Combinatorics. The main difference between their world and ours is that we obsess over the low-degree Fourier coefficients, treating them as most notable. On the other hand, they are perfectly egalitarian about $\hat{f}(S)$ across $S$; from their arithmetic point of view, all parity functions are created equal.

Friday, March 23, 2007

How much are increasing sets positively correlated?

There's a paper of Talagrand, one of the masters of Fourier analysis of boolean functions, called "How much are increasing sets positively correlated?" which I've always kind of known about but never really fully read. A few days ago I studied it more closely, and it turns out to be really nice, with proofs that are not too hard (although the notation took me a while to decipher). Maybe I'll show some of it in a future class.

It proves some nice facts, including:

If $f : \{-1,1\}^n \to \{-1,1\}$ has $Var[f] \leq p$, then $W_1(f) = \sum_{|S| = 1} \hat{f}(S)^2 \leq O(p \log (1/p))$. (This is a fact that I seem to use all the time in my work...)
If $\mathbb{I}\mathbb{I}(f)$ denotes $\sum_i Inf_i(f)^2$, then $W_2(f) = \sum_{|S| = 2} \hat{f}(S)^2 \leq O(\mathbb{I}\mathbb{I}(f) \log(1/\mathbb{I}\mathbb{I}(f)))$. (This is the main technical lemma/theorem in the paper.)
There is a monotone function $f : \{-1,1\}^n \to \{-1,1\}$ for which a constant fraction of all $x$ have the property that $f$ has $\Omega(\sqrt{n})$ sensitive coordinates on $x$. This immediately implies that the function has $\mathbb{I}(f) \geq \Omega(\sqrt{n})$, but in a much different way than $\mathbb{I}(Maj_n) \geq \Omega(\sqrt{n})$. With Majority, most $x$'s have $0$ sensitive coordinates, but a $\Omega(1/\sqrt{n})$ fraction of them have $(n+1)/2$ sensitive coordinates. With this function, the distribution on the number of sensitive coordinates is much simpler -- a constant fraction have $\Omega(\sqrt{n})$. The function is constructed randomly -- a random DNF of width $\sqrt{n}$ and size $(\ln 2) 2^{\sqrt{n}}$.

Actually, this last function, one can check, gives an example of a monotone function that has $NS_{O(1/\sqrt{n})}(f) \geq \Omega(1)$; and as we saw when studying hardness amplification, $\Theta(1/\sqrt{n})$ is the least noise rate to which a monotone function can be sensitive.

An important improvement to the second bullet point above came from a notable paper of Benjamini, Kalai, and Schramm: They showed that in fact $W_k(f) \leq O(\mathbb{I}\mathbb{I} (f) \log^{k-1}(1/\mathbb{I}\mathbb{I}(f))$. I'm trying to use this fact right now for a problem I'm working on...

Wednesday, March 21, 2007

Homework #4

No time like the present to get started on Homework #4. Post comments/questions here.

Tuesday, March 20, 2007

Homework #3 graded

I graded Homework #3. The mean and median were both 39/50, with a high score of 48/50.

Kudos to Runting Shi and Yi Wu for (implicitly, by giving correct answers) noticing that the weighting function in (6b) is a sum of Kravchuk polynomials, not a Kravchuk polynomials as I mistakenly wrote in the solutions.

Kudos also to Ryan Williams for improving the upper bound in problem 1 in some cases: He showed $\mathbb{I}(f) \leq (2 - 2\alpha/(1+\alpha)) w$ when $f$ is a DNF of width $w = \alpha n$. Note that this is tight in the case $\alpha = 1$, by the Parity function.

Open problem note: I'm guessing that indeed $\mathbb{I}(f) \leq w$ for any width-$w$ DNF. Might be a nice thing to try to prove, although I couldn't guarantee that it isn't already known.

Monday, March 19, 2007

Homework #4 is out

Hi all. Homework #4 is posted on the course homepage; it's due Tuesday April 3. I'll bring hard copies to class tomorrow. I should also have Homework #3 graded by tomorrow.

Friday, March 16, 2007

History of the Hypercontractivity Theorem

Hope you're having a good spring break. I put up the lecture notes for Lecture 16, on the Hypercontractivity Theorem. I find the history of the theorem quite fascinating, so I've put a slightly briefer version below. If you read French, you may also want to look at Djalil Chafai's version. Gross also has a nice survey about the history, but unfortunately I can't find it right now...

---

The history of the Hypercontractivity Theorem is quite involved, and
interesting. To understand the history, it's important to first note
that it has a "Gaussian'' version. In this setting it applies to functions $f : \R^n \to \R$, where $\R^n$ is thought of as having the Gaussian distribution and the $T_\rho$ operator is an appropriately generalized linear operator (specifically, the "Ornstein-Uhlenbeck operator" from mathematical physics, which we'll encounter later). Gross observed in 1975 that this Gaussian version follows from our boolean version, by a straightforward argument using the Central Limit Theorem.

An early version of the theorem in the Gaussian setting was proved by Edward Nelson of Princeton in the mid-60's [Nel:66]; I think it was the $q = 4$, $p = 2$ case, possibly with a constant bigger than $1$ on the right side. This was in an important paper on quantum field theory. Several works in subsequent years (e.g., Glimm '68, Federbush '69) improved on the result, and the culmination was the complete proof of the theorem in the Gaussian setting, due to Nelson again [Nel:73]. Nelson's two papers won him the Steele Prize. He is an interesting character, having gone on to work on foundations of mathematics, bounded arithmetic, and automatic proof verification; he is now well-known for having invented Internal Set Theory, a partial axiomatization of Nonstandard Analysis.

In 1973, Leonard Gross proved a limiting version of the theorem called a Logarithmic Sobolev Inequality, and deduced Nelson's Hypercontractive theorem from it [Gro:75]. His proof was in the boolean setting, getting the Gaussian setting via the Central Limit Theorem. However, it turned out that the proof of the Hypercontractivity Theorem (in the boolean setting) was not new; it was first proved by Aline Bonami [Bon:70] in 1979. In fact, Bonami had proved the $q = 4$, $p = 2$ case in the boolean setting even earlier [Bon:68].

The history of the theorem can be traced even further back; Bonami's work was informed by that of Walter Rudin, who proved [Rud:60] similar inequalities in the setting of $\Z_n$ rather than $\{-1,1\}^n$ (``one of my favorite papers'' -- Rudin). Further, a version of the log-sobolev inequality in the Euclidean (rather than Gaussian) setting was proved by A. J. Stam [Sta:59] in 1959, in work on Fisher/Shannon information theory -- much closer to the world of computer science!

Finally, the Hypercontractivity Theorem was introduced to the world of theoretical computer science in the groundbreaking work of Kahn, Kalai, and Linial [KKL:88]. Unfortunately, they attributed the theorem to William Beckner [Bec:75], which I'm not sure is exactly an accurate accreditation. Beckner's work was in fact important followup work on the work of Nelson and Gross, making extensions to Euclidean and complex settings.

In the computer science theory literature, the Hypercontractivity Theorem is often called "Beckner's Theorem". Lately there has been a move towards "Bonami-Beckner Theorem", although "Bonami Theorem" would respect the original discoverer and "Bonami-Gross Theorem" might more properly respect the independent discovery. To sidestep the issue, we will simply call it the Hypercontractivity Theorem.

Tuesday, March 13, 2007

Solutions for Homework 3

Are posted.

Thursday, March 08, 2007

Shouts-out

On the topic of hypercontractivity, I just wanted to give a couple of shouts-out. Back when I learned this stuff, I didn't have any intuition at all about the hypercontractivity theorem, and would ask all and sundry, "Do you have any idea what's really going on here?" "Do you have any idea what's really going on here?"

Krzysztof Oleszkiewicz showed me the slick, simple induction proof of $(2, 4, 1/\sqrt{3}^d)$-hypercontractivity for boolean functions with degree at most $d$. I asked him, "Who is this proof due to? You?" He said, "No no no, not me. Folklore, I guess."

Top Analysis-of-Boolean-Functions expert Guy Kindler showed me the nice level-sets, geometrical-picture proof of the Two-Point Inequality.

So -- thanks to both of them!

Wednesday, March 07, 2007

Thursday's class: a warning

Hi all. I know we're all in the School of Computer Science, but Thursday's class is going to be all math, all the time. We've already proved most (all?) of the hypercontractivity stuff we'll need. But you can't have a class on Fourier Analysis of Boolean Functions without actually doing the full-blown Bonami-Beckner theorem. I mean, just for culture's sake.

So I'm just warning you, it's going to be all hot Minkowski Inequality and Generalized Binomial Theorem action on Thursday. I think I'll scribe myself.

Tuesday, March 06, 2007

Homework 3 Corrections

Hi -- two more corrections for Homework 3. I guess this is what you get when running a course for the first time...

Question 1: Eliminate the part "and show that this is tight". In fact, I strongly believe it's not tight. For extra credit, improve on the constant factor 2.

Question 3a: This one is too hard. I thought I knew an easy way to do it, but I was mistaken. For the story on this problem, see this paper of Amano and Maruoka (but note that the theorem in their paper improving on KKL was already known).

A corrected version of the homework is on the course web page.

--------

Two small additional clarifications: In 4a, you may assume the algorithm knows $\|\hat{f}\|_1$. In 5a, as Aaron pointed out, it would probably be more sensible if I asked for you to show the estimate is within $\sqrt{\epsilon/n^d}$ rather than $\epsilon$; however, it doesn't really matter since $\epsilon$ is a parameter.

Monday, March 05, 2007

Homework correction + the Ajtai-Linial Function

First, Aaron correctly pointed out a mistake on the problem set. In 4a, you should also assume the algorithm has query access to $f$. The corrected version of the homework is on the course web page.

Now for the real topic of the post. Recall that KKL proved that if $f : \{-1,1\}^n \to \{-1,1\}$ is any roughly balanced function, there is coalition $J \subseteq [n]$ of size $O(n / \log n)$ with influence .999 on $f$.

A construction of Ajtai and Linial (The influence of large coalitions, Combinatorica 13, 1993; annoyingly, not available online) shows this is close to optimal.

The [AL] construction is randomized. Recall the Tribes function, which breaks up the input into disjoint blocks ("tribes") of size about $\log n - C \log \log n$ and takes the OR across blocks of the AND of the blocks. Here $C$ is a carefully chosen constant (around $1$) picked so that the overall function is roughly balanced.

Let's say a random nonmonotone Tribes function of width $w$ is given by randomly partitioning $[n]$ into blocks of size $w$, randomly choosing whether to negate each variable (50-50), and then again ANDing within blocks and ORing across blocks.

Choose $w$ so that these functions will be True with probability around $1 - (\ln 2)/n$. (This involves taking $C \sim 2$.)

Finally, the [AL] construction is this: Choose $n$ random nonmonotone Tribes functions with this width, and AND them together.

--------

Here is a sketch of the properties of this (randomly chosen) function:

1. The probability it is True is very close to $1/2$, with very high probability.

2. It's hard for small coalitions to force the function towards True. Because to do this, they have to force many of the random Tribes functions towards True, and for each such function ($\sim n$ of them), they have to force one of its Tribes (about $\log n$ variables) towards True. By the random construction, the different Tribes don't overlap much, so the coalition has to be nearly linear in size.

3. It's hard for small coalitions to force the function towards False. To do this, they have to force at least one of the Tribes functions towards False, and this requires fixing at least one variable in each tribe within the function ($\sim n/\log n$ variables).

--------

The actual theorem is that with high probability over the choice of $f$, $f$ is near-balanced and every coalition $J$ of size $o(n/\log^2 n)$ has influence $o(1)$ on $f$.

Oddly enough, the hardest part of the proof is showing that the functions are near-balanced with very high probability.

--------

Some open problems:

1. Close the gap between $n/\log^2 n$ and $n/\log n$. [AL] speculated that the latter is probably the truth, but I'm not so sure...

2. Give an explicit function where all $o(n/\log^2 n)$-size coalitions have $o(1)$ influence. In fact, I believe it's open even to do this for $c \sqrt{n}$-size coalitions for large enough $c$.

Wednesday, February 28, 2007

Homework 3

Hi all. Any comments/questions about Homework #3 can go in the comments here.

Sunday, February 25, 2007

Homework 2 graded

I just finished grading Homework #2. The average was 63.25 out of 70, which is great. Pretty much everybody got pretty much everything correct.

If anything, people were too diligent; the commonest "errors" (not actually errors) seemed to be:

a) rederiving Plancherel (E[$fg$] = $\sum_S \hat{f}(S) \hat{g}(S)$) over and over; and,

b) reintroducing $D_i f$ without realizing it in later problems and rederiving the facts from Problem 1.

As mentioned in a comment, I'll be doing the solution for the bonus problem in class later.

Thursday, February 22, 2007

Learning low-degree polynomials over GF(2)

From the lecture on learning juntas, I skipped the learning-theoretic proof that when learning $\mathbb{F}_2$-multilinear polynomials of degree at most $e$, any degree-$e$ polynomial consistent with enough data will with high probability be the truth. The very short proof, using a result from Homework 2, is here.

Condorcet on Randomized Voting

The quotation from his 1788 writing On the form of decisions made by plurality vote:

"But after considering the facts, the average values or the results, we still need to determine their probability."

---

Lauding of the guy aside, apparently a lot of his mathematical writing was riddled with bugs or otherwise unfollowable.

Homework #2 solutions

Are posted on the course web page.

The rationality of weakly symmetric functions

Towards the end of class I quickly said the following:

Let $f : \{-1,1\}^n \to \{-1,1\}$ be any social choice function, and suppose we try to use it to rank 3 candidates via 3 pairwise comparisons using $f$. The Rationality of $f$ is the probability (when the voters' preferences are i.i.d. and uniform) that this doesn't not result in a cyclic preference.

Then if $f$ is weakly symmetric, the Rationality of $f$ is at most $7/9 + 4/9\pi \approx .919$. Here is the proof. Since the Rationality of Majority is $3/4 - (3/2\pi) \arcsin(-1/3) \approx .912$, this is looking pretty good for Majority.

In fact, later in this course we will prove the Majority Is Stablest Theorem, which in particular implies that any weakly symmetric function has Rationality at most that of Majority (plus $O(\log \log n / \log n)$).

Monday, February 19, 2007

Learning juntas

The junta-learning problem is: You have access to uniform-distribution random examples from a function $f : \{-1,1\}^n \to \{-1,1\}$ that is promised to depend only on some subset of $r$ of the variables. Think of $r$ as $\log n$, or even just a large constant. The task is to determine the $r$ relevant variables in time substantially better than $n^{r}$.

For more on this problem, see Section 2 of this collection of open problems, written by Avrim Blum.

In class tomorrow we will see an algorithm that finds the variables in time roughly $n^{.704 r}$. As stated in Avrim's note, if you can improve this to just $n^{.666 r}$, he'll give you \$ 100. If you can improve it to $n^{O(1)}$ for $r = \log n$, he'll give you \$ 1000!

Fourier analysis. It can be lucrative.

Tuesday, February 13, 2007

What's new in Fourier Analysis?

Insofar as a lot of the stuff we do in class is from the olden days (the '90s, let's say), you might be wondering what's going on in Fourier Analysis of Boolean Functions currently. Here are some highlights from the recent top conferences. I would like to emphasize that there are lots of other good recent papers on the topic!

STOC '06: Gowers Uniformity, Influence of Variables, and PCPs by Alex Samorodnitsky and Luca Trevisan. Topic: Hardness of Approximation and pure Fourier Analysis.
FOCS 2005: Nonembeddability theorems via Fourier analysis by Subhash Khot and Assaf Naor. Topic: Metric Spaces.
GAFA journal 2007: Boolean functions with small spectral norm by Ben Green and Tom Sanders. Topic: Classical Analysis. (Mentioned in class today.)
COLT 2006: Improved lower bounds for learning intersections of halfspaces by Adam Klivans and Sasha Sherstov. Topic: Learning Theory.
FOCS 2005: On non-approximability for quadratic programs by Sanjeev Arora, Eli Berger, Elad Hazan, Guy Kindler, and Muli Safra. Topic: Hardness of Approximation.
FOCS 2005: On Delsarte's Linear Programming Bounds for Binary Codes by Michael Navon and Alex Samorodnitsky. Topic: Coding Theory.
FOCS 2005: Lower bounds for the noisy broadcast problem by Navin Goyal, Guy Kindler, and Mike Saks. Topic: Distributed Networks.
Complexity 2005: On the hardness of approximating Multicut and Sparsest-Cut by Shuchi Chawla (of CMU), Robi Krauthgamer, Ravi Kumar, Yuval Rabani, and D. Sivakumar. Topic: Hardness of Approximation.
Improving on a Complexity 2005 paper: Learning symmetric k-juntas in time $n^{o(k)}$ by Mihail Kolountzakis, Evangelos Markakis, and Aranyak Mehta. Topic: Number Theory and Learning Theory.

The first few papers in the list may be particularly readable for people taking the class.

Tuesday, February 06, 2007

Homework #2

A student asked if we've already covered in lecture everything needed for Homework #2. The answer is yes.

Questions about the homework can go in the comments of this post.

Sampling

In class, I handwaved over the proof that we can efficiently estimate $E[F_{S \subseteq I}(x)^2]$. The proof is very easy:

$E_x[F_{S \subseteq I}(x)^2] = E_x[\hat{f_x}(S)^2]$ $= E_x[\hat{f_x}(S)^2] = E_x[(E_w[f_x(w)w_S])^2]$.

Now we use a trick: Pick two copies of $w$ independently:

... $ = E_x[E_{w,w'}[f_x(w) w_S f_x(w') w'_S]]$ $= E_{x,w,w'}[f(w,x)w_S f(w',x) w'_S]$.

And this expectation can be efficiently estimated empirically, since we can sample the random variable inside the expectation, and this r.v. is bounded in $[-1,1]$. (I guess officially what I keep using here is "Hoeffding's Bound".)

--

Also: at the end of class, when showing how to break the one-way permutation $f$ given a PPT algorithm $A$ for guessing the hard-core predicate, we had that if $A$ is deterministic, we get that for a $2\gamma$ fraction of $x$'s, $\Pr_r[A(f(x), r) = x \cdot r] \geq 1/2 + \gamma$. If $A$ is not deterministic, let $\mathcal{A}$ denote the function which on input $r$ equals $E_{A}[A(f(x), r)]$, where the expectation is over $A$'s randomness. Now $\mathcal{A}$ is a function with range in $[-1,1]$, so we can apply Goldreich-Levin to it...

Except not quite, since in GL we assume we actually have access to $\mathcal{A}$, whereas really, we only have access to a $\{-1,1\}$-valued randomized function whose expectation is $\mathcal{A}$. But I hope you can convince yourself that that's good enough; basically, we just have to throw in an expectation over $A$'s random coins in the derivation in the first part of this post.

Saturday, February 03, 2007

Homework #2 is out

I'll bring hard copies to class on Tuesday, but you can find it on the web page now. It is due Tuesday, February 20 at the beginning of class.

On quasirandomness

I just realized I made a mistake in Lecture 5. I said being (small,small)-quasirandom was equivalent to having small correlation with every small junta. That's not true. If you are quasirandom then you have small correlation with every small junta; this will be on the next homework. However, if you have small correlation with every small junta, you are not necessarily quasirandom.

If you want to think about it, you may be able to convince yourself that the function $f(x) = "x_1 \oplus$ Majority$(x_2, ..., x_n)"$ has $o(1)$ correlation with every function on $o(\sqrt{n})$ bits, and yet it is not even $(.3,.3)$-quasirandom!

So the property of having low correlation with every small junta is weaker then our notion of being quasirandom. As it happens, the legit hardness of approximation for 3Lin result (without using UGC) uses the fact that that even functions with this weaker property are rejected by the H{\aa}st-Odd test with probability close to $1/2$.

Friday, February 02, 2007

Homework 1 graded

I finished grading Homework #1. The average was 80% (56/70) with a max of 90% and a standard deviation of about 7%. Pretty good.

I put solutions up on the web page. Note that they are not necessarily maximally simple; for example, I think Amitabh had a nicer proof for #1 than I did, and Moritz had a nicer proof for #3b then I did. Also almost everybody did #2 by induction, which is about equally easy.

Most common mistakes were: #1 -- using Pr[A and B] = Pr[A] Pr[B]; #3a -- messing up the signs; #5 -- failing to do if *and* only if.

Re #3a -- I realized that I kind of messed up the signs in the problem statement. Actually, what I wrote was technically correct, but misleading ;) Anyway, it only cost people -1 out of 70.

Re #7b -- The hint I gave on the blog was actually too complicated. I simplified problem #7b at the last minute, and the hint was needed for the original more complicated version.

Thursday, February 01, 2007

Hardness via Unique Games Conjecture

Hey folks -- since today's lecture was pretty technical, feel free to ask for clarifications in the comments.

The main takeaway is that if you want to prove a strong hardness-of-approximation result for your favorite constraint satisfaction problem, all you need to do (assuming UGC) is design a strong "dictatorship vs. quasirandom" function tester, where the tests the tester uses are the same as the constraints you're trying to prove hardness for.

E.g., much later in the term we will give a function tester $T$ which makes $2$ queries $x, y \in \{-1,1\}^n$ and then checks that $f(x) \neq f(y)$. We will show that $T$ accepts all dictators with probability at least $.9001$ and rejects all $(.0001, .0001)$-quasirandom functions with probability at most $.7999$.

As a consequence, we get (assuming UGC) that given a CSP over variables $v_1, ... v_n$ where all the constraints are of the form $v_i \neq v_j$, it is NP-hard to distinguish the case that there is an assignment satisfying at least a $.9$ fraction of the constraints and the case that every assignment satisfies at most a $.8$ fraction. Since CSPs with $\neq$ constraints are the same as the "Max-Cut" problem, this shows that giving a $.8/.9 = .8888$ approximation for Max-Cut is NP-hard.

---

As mentioned, on Tuesday we'll move on to the the next segment of the course, Learning. Also, homework #2 will come out. Have a good weekend!

Wednesday, January 31, 2007

Håst-Odd as a Dictatorship vs. Quasirandom test

At the end of last class I stated a proposition we'll need, which says that an $(\epsilon^2, \delta)$-quasirandom function passes the Håst-Odd${}_\delta$ test with probability at most $1/2 + \epsilon/2$.

I meant to prove it at the beginning of tomorrow's class but, surprise surprise, it looks like we'll be pressed for time even with out it. So it goes, as Vonnegut would say.

Here then, is the analysis. Really, it's pretty easy. It basically proves itself.

Tuesday, January 30, 2007

Boosting the correctness probability in testing

In defining what it means to be a tester $T$ for property $\mathcal{P}$, I wrote: if $f$ satisfies $\mathcal{P}$ then Pr[$T(f)$ accepts] $> 2/3$ and if $f$ is $\epsilon$-far from satisfying $\mathcal{P}$ then Pr[$T(f)$ accepts] $< 1/3$. I then added that the $2/3$ and $1/3$ here is "arbitrary" and could be "boosted".

This phrasing is pretty imprecise, and an anonymous commenter asked what exactly I meant. What's meant is this: Suppose you have a tester $T$ making $q$ queries that satisfies the given definition. Then for any constant $\delta > 0$, there is another tester $T'$ making $O(q)$ queries that satisfies the definition with $1-\delta$ and $\delta$ in place of $2/3$ and $1/3$. So if you don't care about constant factors in the number of queries, then the choice of $2/3$ and $1/3$ is arbitrary. In "testing" we don't usually care about constant factors on the number of queries, although in "local testing" we often do.

The way to get $T'$ from $T$ is to just run it $O(\log(1/\delta))$ many times independently and then accept iff at least half of the runs accepted. Then the Chernoff bound implies the claim.

Details from the end of Lecture 5

Many thanks to Daniel Golovin for providing the Book proof of $|S|(1-\delta)^{|S|-1} \leq 1/\delta$ for each $|S| = 0, 1, ...$, the claim we needed to show that any function has at most $1/\epsilon \delta$ variables with $(1-\delta)$-attenuated influence at least $\epsilon$. His proof:

Fix any $|S|$. If $i \leq |S|$ then certainly $(1-\delta)^{|S| - 1} \leq (1-\delta)^{i - 1}$. Sum this fact over $i = 1...|S|$ to conclude $|S| (1-\delta)^{|S| - 1} \leq \sum_{i=1}^{|S|} (1-\delta)^{i-1}$. But $\sum_{i=1}^{|S|} (1-\delta)^{i-1} \leq \sum_{i = 1}^{\infty} (1-\delta)^{i-1} = 1/\delta$.

---

Re one thing I mentioned at the end of class: The constant functions $1$ and $-1$ pass the Håst-Odd test with probability $1/2$ (for every $\delta$). The reason is, the test picks some $x$, $y$, and $z$, but then also picks random signs $a,b,c \in \{-1,1\}$ and tests "$af(ax) \cdot bf(by) \cdot cf(cz) = 1$", which is equivalent to $f(ax)f(by)f(cz) = abc$. Now if $f$ is constant then so is $f(ax)f(by)f(cz)$, and then $abc$ is a random sign, which agrees with the constant with probability $1/2$.

Note also that the "acceptance constraint" in the Håst-Odd test is either of the form "$v_{i_1} v_{i_2} v_{i_3} = 1$" or "$v_{i_1} v_{i_2} v_{i_3} = -1$". I.e., it's a 3-variable "linear" equation where the right-hand side is either "0" or "1" (if you go back and interpret bits as $\mathbb{F}_2$ rather than $\{-1,1\}$).

Sunday, January 28, 2007

Homomorphism testing

Suresh asks, "Can we make use of the same BLR test to check whether a given mapping between two groups is a homomorphism or not?"

The answer is: Yes.

The question is: Let $G$ and $H$ be groups. A homomorphism between $G$ and $H$ (this is a definition from group theory) is a function $f : G \rightarrow H$ that satisfies $f(x \bullet y) = f(x) \cdot f(y)$ for all $x, y \in G$, where $\bullet$ denotes the group operation in $G$ and $\cdot$ denotes the group operation in $H$.

A natural question is, does the BLR test "work" for functions $f : G \to H$? I.e., if $f : G \rightarrow H$ is $\epsilon$-far from being a homomorphism (i.e., you need to change its values on $\epsilon |G|$ inputs to make it a true homomorphism), will the BLR test reject with probability $\Omega(\epsilon)$?

We showed the answer is yes, with rejection probability at least $\epsilon$, when $G = \mathbb{Z}_2^n$ and $H = \mathbb{Z}_2$.

The answer is still yes for any finite abelian groups $G$ and $H$. Possibly one could prove it with appropriately generalized Fourier analysis, but it's much easier to prove in this general case with pure combinatorics. This was originally proved by BLR for cyclic groups; later, Coppersmith gave a nice proof which also works for all abelian groups. See Section 4.1 of Rubinfeld's Testing Survey. Note that the constant factors necessarily get worse, because of examples like $G = H = \mathbb{Z}_9$, $f : 0, 3, 6, \mapsto 0; 1, 4, 7 \mapsto 1; 2, 5, 8 \mapsto -1$.

In fact, the BLR test still works even for nonabelian finite groups; this was claimed in BLR and then proved by Ben-Or, Coppersmith, L & R.

Wednesday, January 24, 2007

Homework 1

The comments section of this post is for questions/discussion about Homework 1.

Tuesday, January 23, 2007

Also rejecting 1

At the end of class we considered running Håstad's test with $\delta = .75 \epsilon$ many times - $O(1/\epsilon^2)$ to be precise - and then accepting iff the empirical fraction of passes was at least $1 - .8 \epsilon$. We saw that this gave a test for the class {${\chi_S : |S| \leq 1}$}. I asked how we could also reject functions close to the $1$ function, but hurried over the explanation.

Aaron's suggestion was a very good one since we can do it making zero additional queries. Look at all the $f(x)$'s and $f(y)$'s we queried (not the $f(z)$'s) and reject also if the fraction of times the answer was 1 is at least $3/4$. Note that all the $x$'s and $y$'s are independent random strings.

Now certainly dictators (which are half 1, half -1) will pass this extra test except with exponentially small probability in $\epsilon$, so their acceptance probability, if it was at least $2/3$ before, is at least $.66$ now (which is fine).

On the other hand, suppose $f$ is $\epsilon$-far from the set of dictators. Then it's also $\epsilon$-far from the set {dictators} $\cup$ $\{1\}$ (and so the Håstad tests will reject it with probability at least $2/3$) unless it's $\epsilon$-close to $1$. But in this case, the additional test will reject except with exponentially small probability in $\epsilon$, so we're again fine.

The alternate fix I suggested involved using local decodability of parity functions. After all the Håstad tests, if we still haven't rejected then (with probability at least $2/3$) $f$ is $\epsilon$-close to either $1$ or a dictator function - in particular, to a linear function. Now use 2 more queries to do local decoding on the string $(-1, -1, ..., -1)$, and reject if the answer is 1. Dictators will pass this with probability at least $1-2\epsilon$, so overall they pass with probability at least $2/3 - 2\epsilon$ which is large enough (assuming $\epsilon < .01$ now, say). And any function $\epsilon$-close to being $1$ will pass this with probability at most $2\epsilon$, so overall it will pass with probability at most $1/3 + 2\epsilon$, again fine.

Two good exercises for you:

1. Understand why it is without loss of generality to assume $\epsilon < .01$.

2. In the local decoding of $(-1,-1, ..., -1)$ we are really doing the following 2-query test: Pick $y$ at random, and accept iff $f(y) \neq f(-y)$. Express the probability that this test passes in terms of the Fourier coefficients of $f$.

Monday, January 22, 2007

Getting ready for Tuesday's class

Just wanted to remind you of a few things for next class:

1. It might not have been clear in the last class, but we are using the convention that the "empty product" is defined to be 1. I.e., $\prod_{i \in \emptyset} x_i$ is taken to mean 1. In particular, the "empty parity" $\chi_\emptyset$ is the function that is 1 on every input string.

2. A trivial fact that one can sometimes forget: If $f$ is parity function itself, say $\chi_T$, then its Fourier expansion is just $\chi_T$. In other words, $\widehat{\chi_T}(S) = 1$ if $S = T$ and $= 0$ if $S \neq T$.

3. We'll use the Chernoff given on the homework, as follows: If |Pr[Test passes in case A] - Pr[Test passes in case B]| $\geq \epsilon$, then you can tell with high confidence if you're in case A or case B after $O(1/\epsilon^2)$ tests.

4. One of you will be scribing!

Sunday, January 21, 2007

Diffusion on the discrete cube

I wanted to say a few more words about the "diffusion process" on the discrete cube, mentioned in Lecture 1. Actually, on any graph you can set up an analogous process - basically, a "continuous-time" random walk on the graph.

Given a graph $G = (V,E)$, its Lapalacian $L$ is usually defined to be the $V \times V$ symmetric matrix that has a $-1$ in entries $(i,j)$ and $(j,i)$ when this is an edge, deg($v$) in the $(v,v)$ diagonal entry, and zeroes elsewhere.

In our case we considered the discrete cube, $G$, with $V = \{-1,1\}^n$ and $(x,y) \in E$ iff $x$ and $y$ have Hamming distance 1.

You can see now that the diffusion equation I gave in Lecture 1 says that for functions $f : V \to \mathbb{R}$, the rate of change of $f(x)$ with respect to time is defined to be $(-1/2)L f$. (The factor $-1$ is important here, but the factor $1/2$ is not; it just slows down time by a factor of $2$.) The same diffusion process can be set up for any graph.

Further, the "basic solutions" to this differential equation - by which I mean those $f : V \to \mathbb{R}$ that just decay with time but don't change their "shape" - are the eigenvectors of the matrix $L$; i.e., the functions such that $L f = \lambda f$ for some constant $\lambda$. On the discrete cube, it's easy to check that the $2^n$ parity functions are all eigenvectors, and since they are linearly independent in a space of dimension $2^n$, linear algebra tells us this is all of them.

I'm not sure if there's a more principled way to derive the fact that the parity functions are the eigenvectors of $L$, as opposed to just observing that they are and that they are sufficient in number. Maybe someone can figure this out.

For way more on this than you might want to know, see this paper by Bobkov and Tetali. It can be hard reading, but you might want to at least check that the quantity $\mathcal{E}(f,f)$ they introduce on line 4 is our "energy" $\mathbb{I}(f)$, up to a constant. (Indeed, $\mathcal{E}$ stands for "energy" in their paper.)

Thursday, January 18, 2007

Voting with real numbers

In a comment, Yi suggested that voters use real numbers, say, in [-1,1]. This raises a number of interesting questions. Are our voting schemes now $f : [-1,1]^n \rightarrow \{-1,1\}$? Presumably here a vote of -1 means you really really like candidate -1, a vote of .9 means you really like candidate 1, a vote of .01 means you are basically indifferent with a very slight preference towards candidate 1, etc. (It's a bit like rating the candidates on the Hot Or Not scale.)

In some cases (where $f$ can be expressed as a multilinear polynomial) this is basically the same as usual voting; one can just have the voting machine convert a vote of $y$ to a random vote from $\{-1,1\}$ chosen to have expectation $y$. But in other cases it's not the same.

The KKL theorem can be extended (with an appropriate new definition of "influence") to the case of functions $f : [-1,1]^n \to \{-1,1\}$, but certain corollaries cease to hold. We may see this result in the class; it's by "BKKKL" -- the B is Fields Medalist is Jean Bourgain and the extra K is Yitzhak Katznelson.

Wednesday, January 17, 2007

Stealing from Avrim

I was checking out the materials for Avrim and Nina's Learning Theory course today. Two things:

1. I swear I didn't steal the "spam" example for learning from him!

2. I like his idea of dividing the homework into "exercises" (relatively straightforward problems you should really do yourself) and "problems" (harder ones, where collaboration would be fine). On our homework, #2 -- #6 would be exercises, #7 a problem, and #1 halfway in between (probably closer to a problem).

In particular, if you're looking at the homework and thinking, "Man, I can't even do #1!", try moving on to #2 through #6.

Tuesday, January 16, 2007

Voting power

The slides from Lecture 1 are on the course web site now.

I mentioned John F. Banzhaf III in class; he's got quite a web site there, in which he proudly declaims about being the "Osama bin Laden of torts", among other things. I guess he's famous for suing McDonald's (over obesity) and tobacco companies. Apparently he's part of the reason for smoking bans in the UK and other countries. He also shows up in the news these days as a commenter on the Duke Lacrosse happening.

Anyway, he's frequently credited with describing the notion of influences of voters on elections we discussed in class. That came about when he sued the Nassau County Board for having an unfair voting scheme; they passed proposals according to the following boolean function:

$f(x)$ = 1 iff $9x_1 + 9x_2 + 7x_3 + 3x_4 + x_5 + x_6 \geq 16$.

You can check that for this $f$, $Inf_4(f)$ = $Inf_5(f)$ = $Inf_6(f)$ = 0.

More interesting info on measuring the power of voters is here:

Banzhaf Power Index

Voting Power