Wednesday, February 28, 2007
Sunday, February 25, 2007
Homework 2 graded
I just finished grading Homework #2. The average was 63.25 out of 70, which is great. Pretty much everybody got pretty much everything correct.
If anything, people were too diligent; the commonest "errors" (not actually errors) seemed to be:
a) rederiving Plancherel (E[$fg$] = $\sum_S \hat{f}(S) \hat{g}(S)$) over and over; and,
b) reintroducing $D_i f$ without realizing it in later problems and rederiving the facts from Problem 1.
As mentioned in a comment, I'll be doing the solution for the bonus problem in class later.
If anything, people were too diligent; the commonest "errors" (not actually errors) seemed to be:
a) rederiving Plancherel (E[$fg$] = $\sum_S \hat{f}(S) \hat{g}(S)$) over and over; and,
b) reintroducing $D_i f$ without realizing it in later problems and rederiving the facts from Problem 1.
As mentioned in a comment, I'll be doing the solution for the bonus problem in class later.
Thursday, February 22, 2007
Learning low-degree polynomials over GF(2)
From the lecture on learning juntas, I skipped the learning-theoretic proof that when learning $\mathbb{F}_2$-multilinear polynomials of degree at most $e$, any degree-$e$ polynomial consistent with enough data will with high probability be the truth. The very short proof, using a result from Homework 2, is here.
Condorcet on Randomized Voting
The quotation from his 1788 writing On the form of decisions made by plurality vote:
"But after considering the facts, the average values or the results, we still need to determine their probability."
---
Lauding of the guy aside, apparently a lot of his mathematical writing was riddled with bugs or otherwise unfollowable.
"But after considering the facts, the average values or the results, we still need to determine their probability."
---
Lauding of the guy aside, apparently a lot of his mathematical writing was riddled with bugs or otherwise unfollowable.
The rationality of weakly symmetric functions
Towards the end of class I quickly said the following:
Let $f : \{-1,1\}^n \to \{-1,1\}$ be any social choice function, and suppose we try to use it to rank 3 candidates via 3 pairwise comparisons using $f$. The Rationality of $f$ is the probability (when the voters' preferences are i.i.d. and uniform) that this doesn't not result in a cyclic preference.
Then if $f$ is weakly symmetric, the Rationality of $f$ is at most $7/9 + 4/9\pi \approx .919$. Here is the proof. Since the Rationality of Majority is $3/4 - (3/2\pi) \arcsin(-1/3) \approx .912$, this is looking pretty good for Majority.
In fact, later in this course we will prove the Majority Is Stablest Theorem, which in particular implies that any weakly symmetric function has Rationality at most that of Majority (plus $O(\log \log n / \log n)$).
Let $f : \{-1,1\}^n \to \{-1,1\}$ be any social choice function, and suppose we try to use it to rank 3 candidates via 3 pairwise comparisons using $f$. The Rationality of $f$ is the probability (when the voters' preferences are i.i.d. and uniform) that this doesn't not result in a cyclic preference.
Then if $f$ is weakly symmetric, the Rationality of $f$ is at most $7/9 + 4/9\pi \approx .919$. Here is the proof. Since the Rationality of Majority is $3/4 - (3/2\pi) \arcsin(-1/3) \approx .912$, this is looking pretty good for Majority.
In fact, later in this course we will prove the Majority Is Stablest Theorem, which in particular implies that any weakly symmetric function has Rationality at most that of Majority (plus $O(\log \log n / \log n)$).
Monday, February 19, 2007
Learning juntas
The junta-learning problem is: You have access to uniform-distribution random examples from a function $f : \{-1,1\}^n \to \{-1,1\}$ that is promised to depend only on some subset of $r$ of the variables. Think of $r$ as $\log n$, or even just a large constant. The task is to determine the $r$ relevant variables in time substantially better than $n^{r}$.
For more on this problem, see Section 2 of this collection of open problems, written by Avrim Blum.
In class tomorrow we will see an algorithm that finds the variables in time roughly $n^{.704 r}$. As stated in Avrim's note, if you can improve this to just $n^{.666 r}$, he'll give you \$ 100. If you can improve it to $n^{O(1)}$ for $r = \log n$, he'll give you \$ 1000!
Fourier analysis. It can be lucrative.
For more on this problem, see Section 2 of this collection of open problems, written by Avrim Blum.
In class tomorrow we will see an algorithm that finds the variables in time roughly $n^{.704 r}$. As stated in Avrim's note, if you can improve this to just $n^{.666 r}$, he'll give you \$ 100. If you can improve it to $n^{O(1)}$ for $r = \log n$, he'll give you \$ 1000!
Fourier analysis. It can be lucrative.
Tuesday, February 13, 2007
What's new in Fourier Analysis?
Insofar as a lot of the stuff we do in class is from the olden days (the '90s, let's say), you might be wondering what's going on in Fourier Analysis of Boolean Functions currently. Here are some highlights from the recent top conferences. I would like to emphasize that there are lots of other good recent papers on the topic!
The first few papers in the list may be particularly readable for people taking the class.
- STOC '06: Gowers Uniformity, Influence of Variables, and PCPs by Alex Samorodnitsky and Luca Trevisan. Topic: Hardness of Approximation and pure Fourier Analysis.
- FOCS 2005: Nonembeddability theorems via Fourier analysis by Subhash Khot and Assaf Naor. Topic: Metric Spaces.
- GAFA journal 2007: Boolean functions with small spectral norm by Ben Green and Tom Sanders. Topic: Classical Analysis. (Mentioned in class today.)
- COLT 2006: Improved lower bounds for learning intersections of halfspaces by Adam Klivans and Sasha Sherstov. Topic: Learning Theory.
- FOCS 2005: On non-approximability for quadratic programs by Sanjeev Arora, Eli Berger, Elad Hazan, Guy Kindler, and Muli Safra. Topic: Hardness of Approximation.
- FOCS 2005: On Delsarte's Linear Programming Bounds for Binary Codes by Michael Navon and Alex Samorodnitsky. Topic: Coding Theory.
- FOCS 2005: Lower bounds for the noisy broadcast problem by Navin Goyal, Guy Kindler, and Mike Saks. Topic: Distributed Networks.
- Complexity 2005: On the hardness of approximating Multicut and Sparsest-Cut by Shuchi Chawla (of CMU), Robi Krauthgamer, Ravi Kumar, Yuval Rabani, and D. Sivakumar. Topic: Hardness of Approximation.
- Improving on a Complexity 2005 paper: Learning symmetric k-juntas in time $n^{o(k)}$ by Mihail Kolountzakis, Evangelos Markakis, and Aranyak Mehta. Topic: Number Theory and Learning Theory.
The first few papers in the list may be particularly readable for people taking the class.
Tuesday, February 06, 2007
Homework #2
A student asked if we've already covered in lecture everything needed for Homework #2. The answer is yes.
Questions about the homework can go in the comments of this post.
Questions about the homework can go in the comments of this post.
Sampling
In class, I handwaved over the proof that we can efficiently estimate $E[F_{S \subseteq I}(x)^2]$. The proof is very easy:
$E_x[F_{S \subseteq I}(x)^2] = E_x[\hat{f_x}(S)^2]$ $= E_x[\hat{f_x}(S)^2] = E_x[(E_w[f_x(w)w_S])^2]$.
Now we use a trick: Pick two copies of $w$ independently:
... $ = E_x[E_{w,w'}[f_x(w) w_S f_x(w') w'_S]]$ $= E_{x,w,w'}[f(w,x)w_S f(w',x) w'_S]$.
And this expectation can be efficiently estimated empirically, since we can sample the random variable inside the expectation, and this r.v. is bounded in $[-1,1]$. (I guess officially what I keep using here is "Hoeffding's Bound".)
--
Also: at the end of class, when showing how to break the one-way permutation $f$ given a PPT algorithm $A$ for guessing the hard-core predicate, we had that if $A$ is deterministic, we get that for a $2\gamma$ fraction of $x$'s, $\Pr_r[A(f(x), r) = x \cdot r] \geq 1/2 + \gamma$. If $A$ is not deterministic, let $\mathcal{A}$ denote the function which on input $r$ equals $E_{A}[A(f(x), r)]$, where the expectation is over $A$'s randomness. Now $\mathcal{A}$ is a function with range in $[-1,1]$, so we can apply Goldreich-Levin to it...
Except not quite, since in GL we assume we actually have access to $\mathcal{A}$, whereas really, we only have access to a $\{-1,1\}$-valued randomized function whose expectation is $\mathcal{A}$. But I hope you can convince yourself that that's good enough; basically, we just have to throw in an expectation over $A$'s random coins in the derivation in the first part of this post.
$E_x[F_{S \subseteq I}(x)^2] = E_x[\hat{f_x}(S)^2]$ $= E_x[\hat{f_x}(S)^2] = E_x[(E_w[f_x(w)w_S])^2]$.
Now we use a trick: Pick two copies of $w$ independently:
... $ = E_x[E_{w,w'}[f_x(w) w_S f_x(w') w'_S]]$ $= E_{x,w,w'}[f(w,x)w_S f(w',x) w'_S]$.
And this expectation can be efficiently estimated empirically, since we can sample the random variable inside the expectation, and this r.v. is bounded in $[-1,1]$. (I guess officially what I keep using here is "Hoeffding's Bound".)
--
Also: at the end of class, when showing how to break the one-way permutation $f$ given a PPT algorithm $A$ for guessing the hard-core predicate, we had that if $A$ is deterministic, we get that for a $2\gamma$ fraction of $x$'s, $\Pr_r[A(f(x), r) = x \cdot r] \geq 1/2 + \gamma$. If $A$ is not deterministic, let $\mathcal{A}$ denote the function which on input $r$ equals $E_{A}[A(f(x), r)]$, where the expectation is over $A$'s randomness. Now $\mathcal{A}$ is a function with range in $[-1,1]$, so we can apply Goldreich-Levin to it...
Except not quite, since in GL we assume we actually have access to $\mathcal{A}$, whereas really, we only have access to a $\{-1,1\}$-valued randomized function whose expectation is $\mathcal{A}$. But I hope you can convince yourself that that's good enough; basically, we just have to throw in an expectation over $A$'s random coins in the derivation in the first part of this post.
Saturday, February 03, 2007
Homework #2 is out
I'll bring hard copies to class on Tuesday, but you can find it on the web page now. It is due Tuesday, February 20 at the beginning of class.
On quasirandomness
I just realized I made a mistake in Lecture 5. I said being (small,small)-quasirandom was equivalent to having small correlation with every small junta. That's not true. If you are quasirandom then you have small correlation with every small junta; this will be on the next homework. However, if you have small correlation with every small junta, you are not necessarily quasirandom.
If you want to think about it, you may be able to convince yourself that the function $f(x) = "x_1 \oplus$ Majority$(x_2, ..., x_n)"$ has $o(1)$ correlation with every function on $o(\sqrt{n})$ bits, and yet it is not even $(.3,.3)$-quasirandom!
So the property of having low correlation with every small junta is weaker then our notion of being quasirandom. As it happens, the legit hardness of approximation for 3Lin result (without using UGC) uses the fact that that even functions with this weaker property are rejected by the H{\aa}st-Odd test with probability close to $1/2$.
If you want to think about it, you may be able to convince yourself that the function $f(x) = "x_1 \oplus$ Majority$(x_2, ..., x_n)"$ has $o(1)$ correlation with every function on $o(\sqrt{n})$ bits, and yet it is not even $(.3,.3)$-quasirandom!
So the property of having low correlation with every small junta is weaker then our notion of being quasirandom. As it happens, the legit hardness of approximation for 3Lin result (without using UGC) uses the fact that that even functions with this weaker property are rejected by the H{\aa}st-Odd test with probability close to $1/2$.
Friday, February 02, 2007
Homework 1 graded
I finished grading Homework #1. The average was 80% (56/70) with a max of 90% and a standard deviation of about 7%. Pretty good.
I put solutions up on the web page. Note that they are not necessarily maximally simple; for example, I think Amitabh had a nicer proof for #1 than I did, and Moritz had a nicer proof for #3b then I did. Also almost everybody did #2 by induction, which is about equally easy.
Most common mistakes were: #1 -- using Pr[A and B] = Pr[A] Pr[B]; #3a -- messing up the signs; #5 -- failing to do if *and* only if.
Re #3a -- I realized that I kind of messed up the signs in the problem statement. Actually, what I wrote was technically correct, but misleading ;) Anyway, it only cost people -1 out of 70.
Re #7b -- The hint I gave on the blog was actually too complicated. I simplified problem #7b at the last minute, and the hint was needed for the original more complicated version.
I put solutions up on the web page. Note that they are not necessarily maximally simple; for example, I think Amitabh had a nicer proof for #1 than I did, and Moritz had a nicer proof for #3b then I did. Also almost everybody did #2 by induction, which is about equally easy.
Most common mistakes were: #1 -- using Pr[A and B] = Pr[A] Pr[B]; #3a -- messing up the signs; #5 -- failing to do if *and* only if.
Re #3a -- I realized that I kind of messed up the signs in the problem statement. Actually, what I wrote was technically correct, but misleading ;) Anyway, it only cost people -1 out of 70.
Re #7b -- The hint I gave on the blog was actually too complicated. I simplified problem #7b at the last minute, and the hint was needed for the original more complicated version.
Thursday, February 01, 2007
Hardness via Unique Games Conjecture
Hey folks -- since today's lecture was pretty technical, feel free to ask for clarifications in the comments.
The main takeaway is that if you want to prove a strong hardness-of-approximation result for your favorite constraint satisfaction problem, all you need to do (assuming UGC) is design a strong "dictatorship vs. quasirandom" function tester, where the tests the tester uses are the same as the constraints you're trying to prove hardness for.
E.g., much later in the term we will give a function tester $T$ which makes $2$ queries $x, y \in \{-1,1\}^n$ and then checks that $f(x) \neq f(y)$. We will show that $T$ accepts all dictators with probability at least $.9001$ and rejects all $(.0001, .0001)$-quasirandom functions with probability at most $.7999$.
As a consequence, we get (assuming UGC) that given a CSP over variables $v_1, ... v_n$ where all the constraints are of the form $v_i \neq v_j$, it is NP-hard to distinguish the case that there is an assignment satisfying at least a $.9$ fraction of the constraints and the case that every assignment satisfies at most a $.8$ fraction. Since CSPs with $\neq$ constraints are the same as the "Max-Cut" problem, this shows that giving a $.8/.9 = .8888$ approximation for Max-Cut is NP-hard.
---
As mentioned, on Tuesday we'll move on to the the next segment of the course, Learning. Also, homework #2 will come out. Have a good weekend!
The main takeaway is that if you want to prove a strong hardness-of-approximation result for your favorite constraint satisfaction problem, all you need to do (assuming UGC) is design a strong "dictatorship vs. quasirandom" function tester, where the tests the tester uses are the same as the constraints you're trying to prove hardness for.
E.g., much later in the term we will give a function tester $T$ which makes $2$ queries $x, y \in \{-1,1\}^n$ and then checks that $f(x) \neq f(y)$. We will show that $T$ accepts all dictators with probability at least $.9001$ and rejects all $(.0001, .0001)$-quasirandom functions with probability at most $.7999$.
As a consequence, we get (assuming UGC) that given a CSP over variables $v_1, ... v_n$ where all the constraints are of the form $v_i \neq v_j$, it is NP-hard to distinguish the case that there is an assignment satisfying at least a $.9$ fraction of the constraints and the case that every assignment satisfies at most a $.8$ fraction. Since CSPs with $\neq$ constraints are the same as the "Max-Cut" problem, this shows that giving a $.8/.9 = .8888$ approximation for Max-Cut is NP-hard.
---
As mentioned, on Tuesday we'll move on to the the next segment of the course, Learning. Also, homework #2 will come out. Have a good weekend!
Subscribe to:
Posts (Atom)