At the end of last class I stated a proposition we'll need, which says that an $(\epsilon^2, \delta)$-quasirandom function passes the Håst-Odd${}_\delta$ test with probability at most $1/2 + \epsilon/2$.

I meant to prove it at the beginning of tomorrow's class but, surprise surprise, it looks like we'll be pressed for time even with out it. So it goes, as Vonnegut would say.

Here then, is the analysis. Really, it's pretty easy. It basically proves itself.

## Wednesday, January 31, 2007

## Tuesday, January 30, 2007

### Boosting the correctness probability in testing

In defining what it means to be a tester $T$ for property $\mathcal{P}$, I wrote: if $f$ satisfies $\mathcal{P}$ then Pr[$T(f)$ accepts] $> 2/3$ and if $f$ is $\epsilon$-far from satisfying $\mathcal{P}$ then Pr[$T(f)$ accepts] $< 1/3$. I then added that the $2/3$ and $1/3$ here is "arbitrary" and could be "boosted".

This phrasing is pretty imprecise, and an anonymous commenter asked what exactly I meant. What's meant is this: Suppose you have a tester $T$ making $q$ queries that satisfies the given definition. Then for any constant $\delta > 0$, there is another tester $T'$ making $O(q)$ queries that satisfies the definition with $1-\delta$ and $\delta$ in place of $2/3$ and $1/3$. So if you don't care about constant factors in the number of queries, then the choice of $2/3$ and $1/3$ is arbitrary. In "testing" we don't usually care about constant factors on the number of queries, although in "local testing" we often do.

The way to get $T'$ from $T$ is to just run it $O(\log(1/\delta))$ many times independently and then accept iff at least half of the runs accepted. Then the Chernoff bound implies the claim.

This phrasing is pretty imprecise, and an anonymous commenter asked what exactly I meant. What's meant is this: Suppose you have a tester $T$ making $q$ queries that satisfies the given definition. Then for any constant $\delta > 0$, there is another tester $T'$ making $O(q)$ queries that satisfies the definition with $1-\delta$ and $\delta$ in place of $2/3$ and $1/3$. So if you don't care about constant factors in the number of queries, then the choice of $2/3$ and $1/3$ is arbitrary. In "testing" we don't usually care about constant factors on the number of queries, although in "local testing" we often do.

The way to get $T'$ from $T$ is to just run it $O(\log(1/\delta))$ many times independently and then accept iff at least half of the runs accepted. Then the Chernoff bound implies the claim.

### Details from the end of Lecture 5

Many thanks to Daniel Golovin for providing the Book proof of $|S|(1-\delta)^{|S|-1} \leq 1/\delta$ for each $|S| = 0, 1, ...$, the claim we needed to show that any function has at most $1/\epsilon \delta$ variables with $(1-\delta)$-attenuated influence at least $\epsilon$. His proof:

Fix any $|S|$. If $i \leq |S|$ then certainly $(1-\delta)^{|S| - 1} \leq (1-\delta)^{i - 1}$. Sum this fact over $i = 1...|S|$ to conclude $|S| (1-\delta)^{|S| - 1} \leq \sum_{i=1}^{|S|} (1-\delta)^{i-1}$. But $\sum_{i=1}^{|S|} (1-\delta)^{i-1} \leq \sum_{i = 1}^{\infty} (1-\delta)^{i-1} = 1/\delta$.

---

Re one thing I mentioned at the end of class: The constant functions $1$ and $-1$ pass the Håst-Odd test with probability $1/2$ (for every $\delta$). The reason is, the test picks some $x$, $y$, and $z$, but then also picks random signs $a,b,c \in \{-1,1\}$ and tests "$af(ax) \cdot bf(by) \cdot cf(cz) = 1$", which is equivalent to $f(ax)f(by)f(cz) = abc$. Now if $f$ is constant then so is $f(ax)f(by)f(cz)$, and then $abc$ is a random sign, which agrees with the constant with probability $1/2$.

Note also that the "acceptance constraint" in the Håst-Odd test is either of the form "$v_{i_1} v_{i_2} v_{i_3} = 1$" or "$v_{i_1} v_{i_2} v_{i_3} = -1$". I.e., it's a 3-variable "linear" equation where the right-hand side is either "0" or "1" (if you go back and interpret bits as $\mathbb{F}_2$ rather than $\{-1,1\}$).

Fix any $|S|$. If $i \leq |S|$ then certainly $(1-\delta)^{|S| - 1} \leq (1-\delta)^{i - 1}$. Sum this fact over $i = 1...|S|$ to conclude $|S| (1-\delta)^{|S| - 1} \leq \sum_{i=1}^{|S|} (1-\delta)^{i-1}$. But $\sum_{i=1}^{|S|} (1-\delta)^{i-1} \leq \sum_{i = 1}^{\infty} (1-\delta)^{i-1} = 1/\delta$.

---

Re one thing I mentioned at the end of class: The constant functions $1$ and $-1$ pass the Håst-Odd test with probability $1/2$ (for every $\delta$). The reason is, the test picks some $x$, $y$, and $z$, but then also picks random signs $a,b,c \in \{-1,1\}$ and tests "$af(ax) \cdot bf(by) \cdot cf(cz) = 1$", which is equivalent to $f(ax)f(by)f(cz) = abc$. Now if $f$ is constant then so is $f(ax)f(by)f(cz)$, and then $abc$ is a random sign, which agrees with the constant with probability $1/2$.

Note also that the "acceptance constraint" in the Håst-Odd test is either of the form "$v_{i_1} v_{i_2} v_{i_3} = 1$" or "$v_{i_1} v_{i_2} v_{i_3} = -1$". I.e., it's a 3-variable "linear" equation where the right-hand side is either "0" or "1" (if you go back and interpret bits as $\mathbb{F}_2$ rather than $\{-1,1\}$).

## Sunday, January 28, 2007

### Homomorphism testing

Suresh asks, "Can we make use of the same BLR test to check whether a given mapping between two groups is a homomorphism or not?"

The answer is: Yes.

The question is: Let $G$ and $H$ be groups. A homomorphism between $G$ and $H$ (this is a definition from group theory) is a function $f : G \rightarrow H$ that satisfies $f(x \bullet y) = f(x) \cdot f(y)$ for all $x, y \in G$, where $\bullet$ denotes the group operation in $G$ and $\cdot$ denotes the group operation in $H$.

A natural question is, does the BLR test "work" for functions $f : G \to H$? I.e., if $f : G \rightarrow H$ is $\epsilon$-far from being a homomorphism (i.e., you need to change its values on $\epsilon |G|$ inputs to make it a true homomorphism), will the BLR test reject with probability $\Omega(\epsilon)$?

We showed the answer is yes, with rejection probability at least $\epsilon$, when $G = \mathbb{Z}_2^n$ and $H = \mathbb{Z}_2$.

The answer is still yes for any finite abelian groups $G$ and $H$. Possibly one could prove it with appropriately generalized Fourier analysis, but it's much easier to prove in this general case with pure combinatorics. This was originally proved by BLR for cyclic groups; later, Coppersmith gave a nice proof which also works for all abelian groups. See Section 4.1 of Rubinfeld's Testing Survey. Note that the constant factors necessarily get worse, because of examples like $G = H = \mathbb{Z}_9$, $f : 0, 3, 6, \mapsto 0; 1, 4, 7 \mapsto 1; 2, 5, 8 \mapsto -1$.

In fact, the BLR test still works even for nonabelian finite groups; this was claimed in BLR and then proved by Ben-Or, Coppersmith, L & R.

The answer is: Yes.

The question is: Let $G$ and $H$ be groups. A homomorphism between $G$ and $H$ (this is a definition from group theory) is a function $f : G \rightarrow H$ that satisfies $f(x \bullet y) = f(x) \cdot f(y)$ for all $x, y \in G$, where $\bullet$ denotes the group operation in $G$ and $\cdot$ denotes the group operation in $H$.

A natural question is, does the BLR test "work" for functions $f : G \to H$? I.e., if $f : G \rightarrow H$ is $\epsilon$-far from being a homomorphism (i.e., you need to change its values on $\epsilon |G|$ inputs to make it a true homomorphism), will the BLR test reject with probability $\Omega(\epsilon)$?

We showed the answer is yes, with rejection probability at least $\epsilon$, when $G = \mathbb{Z}_2^n$ and $H = \mathbb{Z}_2$.

The answer is still yes for any finite abelian groups $G$ and $H$. Possibly one could prove it with appropriately generalized Fourier analysis, but it's much easier to prove in this general case with pure combinatorics. This was originally proved by BLR for cyclic groups; later, Coppersmith gave a nice proof which also works for all abelian groups. See Section 4.1 of Rubinfeld's Testing Survey. Note that the constant factors necessarily get worse, because of examples like $G = H = \mathbb{Z}_9$, $f : 0, 3, 6, \mapsto 0; 1, 4, 7 \mapsto 1; 2, 5, 8 \mapsto -1$.

In fact, the BLR test still works even for nonabelian finite groups; this was claimed in BLR and then proved by Ben-Or, Coppersmith, L & R.

## Wednesday, January 24, 2007

## Tuesday, January 23, 2007

### Also rejecting 1

At the end of class we considered running Håstad's test with $\delta = .75 \epsilon$ many times - $O(1/\epsilon^2)$ to be precise - and then accepting iff the empirical fraction of passes was at least $1 - .8 \epsilon$. We saw that this gave a test for the class {${\chi_S : |S| \leq 1}$}. I asked how we could also reject functions close to the $1$ function, but hurried over the explanation.

Aaron's suggestion was a very good one since we can do it making zero additional queries. Look at all the $f(x)$'s and $f(y)$'s we queried (not the $f(z)$'s) and reject also if the fraction of times the answer was 1 is at least $3/4$. Note that all the $x$'s and $y$'s are independent random strings.

Now certainly dictators (which are half 1, half -1) will pass this extra test except with exponentially small probability in $\epsilon$, so their acceptance probability, if it was at least $2/3$ before, is at least $.66$ now (which is fine).

On the other hand, suppose $f$ is $\epsilon$-far from the set of dictators. Then it's also $\epsilon$-far from the set {dictators} $\cup$ $\{1\}$ (and so the Håstad tests will reject it with probability at least $2/3$) unless it's $\epsilon$-close to $1$. But in this case, the additional test will reject except with exponentially small probability in $\epsilon$, so we're again fine.

The alternate fix I suggested involved using local decodability of parity functions. After all the Håstad tests, if we still haven't rejected then (with probability at least $2/3$) $f$ is $\epsilon$-close to either $1$ or a dictator function - in particular, to a linear function. Now use 2 more queries to do local decoding on the string $(-1, -1, ..., -1)$, and reject if the answer is 1. Dictators will pass this with probability at least $1-2\epsilon$, so overall they pass with probability at least $2/3 - 2\epsilon$ which is large enough (assuming $\epsilon < .01$ now, say). And any function $\epsilon$-close to being $1$ will pass this with probability at most $2\epsilon$, so overall it will pass with probability at most $1/3 + 2\epsilon$, again fine.

Two good exercises for you:

1. Understand why it is without loss of generality to assume $\epsilon < .01$.

2. In the local decoding of $(-1,-1, ..., -1)$ we are really doing the following 2-query test: Pick $y$ at random, and accept iff $f(y) \neq f(-y)$. Express the probability that this test passes in terms of the Fourier coefficients of $f$.

Aaron's suggestion was a very good one since we can do it making zero additional queries. Look at all the $f(x)$'s and $f(y)$'s we queried (not the $f(z)$'s) and reject also if the fraction of times the answer was 1 is at least $3/4$. Note that all the $x$'s and $y$'s are independent random strings.

Now certainly dictators (which are half 1, half -1) will pass this extra test except with exponentially small probability in $\epsilon$, so their acceptance probability, if it was at least $2/3$ before, is at least $.66$ now (which is fine).

On the other hand, suppose $f$ is $\epsilon$-far from the set of dictators. Then it's also $\epsilon$-far from the set {dictators} $\cup$ $\{1\}$ (and so the Håstad tests will reject it with probability at least $2/3$) unless it's $\epsilon$-close to $1$. But in this case, the additional test will reject except with exponentially small probability in $\epsilon$, so we're again fine.

The alternate fix I suggested involved using local decodability of parity functions. After all the Håstad tests, if we still haven't rejected then (with probability at least $2/3$) $f$ is $\epsilon$-close to either $1$ or a dictator function - in particular, to a linear function. Now use 2 more queries to do local decoding on the string $(-1, -1, ..., -1)$, and reject if the answer is 1. Dictators will pass this with probability at least $1-2\epsilon$, so overall they pass with probability at least $2/3 - 2\epsilon$ which is large enough (assuming $\epsilon < .01$ now, say). And any function $\epsilon$-close to being $1$ will pass this with probability at most $2\epsilon$, so overall it will pass with probability at most $1/3 + 2\epsilon$, again fine.

Two good exercises for you:

1. Understand why it is without loss of generality to assume $\epsilon < .01$.

2. In the local decoding of $(-1,-1, ..., -1)$ we are really doing the following 2-query test: Pick $y$ at random, and accept iff $f(y) \neq f(-y)$. Express the probability that this test passes in terms of the Fourier coefficients of $f$.

## Monday, January 22, 2007

### Getting ready for Tuesday's class

Just wanted to remind you of a few things for next class:

1. It might not have been clear in the last class, but we are using the convention that the "empty product" is defined to be 1. I.e., $\prod_{i \in \emptyset} x_i$ is taken to mean 1. In particular, the "empty parity" $\chi_\emptyset$ is the function that is 1 on every input string.

2. A trivial fact that one can sometimes forget: If $f$ is parity function itself, say $\chi_T$, then its Fourier expansion is just $\chi_T$. In other words, $\widehat{\chi_T}(S) = 1$ if $S = T$ and $= 0$ if $S \neq T$.

3. We'll use the Chernoff given on the homework, as follows: If |Pr[Test passes in case A] - Pr[Test passes in case B]| $\geq \epsilon$, then you can tell with high confidence if you're in case A or case B after $O(1/\epsilon^2)$ tests.

4. One of you will be scribing!

1. It might not have been clear in the last class, but we are using the convention that the "empty product" is defined to be 1. I.e., $\prod_{i \in \emptyset} x_i$ is taken to mean 1. In particular, the "empty parity" $\chi_\emptyset$ is the function that is 1 on every input string.

2. A trivial fact that one can sometimes forget: If $f$ is parity function itself, say $\chi_T$, then its Fourier expansion is just $\chi_T$. In other words, $\widehat{\chi_T}(S) = 1$ if $S = T$ and $= 0$ if $S \neq T$.

3. We'll use the Chernoff given on the homework, as follows: If |Pr[Test passes in case A] - Pr[Test passes in case B]| $\geq \epsilon$, then you can tell with high confidence if you're in case A or case B after $O(1/\epsilon^2)$ tests.

4. One of you will be scribing!

## Sunday, January 21, 2007

### Diffusion on the discrete cube

I wanted to say a few more words about the "diffusion process" on the discrete cube, mentioned in Lecture 1. Actually, on any graph you can set up an analogous process - basically, a "continuous-time" random walk on the graph.

Given a graph $G = (V,E)$, its Lapalacian $L$ is usually defined to be the $V \times V$ symmetric matrix that has a $-1$ in entries $(i,j)$ and $(j,i)$ when this is an edge, deg($v$) in the $(v,v)$ diagonal entry, and zeroes elsewhere.

In our case we considered the discrete cube, $G$, with $V = \{-1,1\}^n$ and $(x,y) \in E$ iff $x$ and $y$ have Hamming distance 1.

You can see now that the diffusion equation I gave in Lecture 1 says that for functions $f : V \to \mathbb{R}$, the rate of change of $f(x)$ with respect to time is defined to be $(-1/2)L f$. (The factor $-1$ is important here, but the factor $1/2$ is not; it just slows down time by a factor of $2$.) The same diffusion process can be set up for any graph.

Further, the "basic solutions" to this differential equation - by which I mean those $f : V \to \mathbb{R}$ that just decay with time but don't change their "shape" - are the eigenvectors of the matrix $L$; i.e., the functions such that $L f = \lambda f$ for some constant $\lambda$. On the discrete cube, it's easy to check that the $2^n$ parity functions are all eigenvectors, and since they are linearly independent in a space of dimension $2^n$, linear algebra tells us this is all of them.

I'm not sure if there's a more principled way to derive the fact that the parity functions are the eigenvectors of $L$, as opposed to just observing that they are and that they are sufficient in number. Maybe someone can figure this out.

For way more on this than you might want to know, see this paper by Bobkov and Tetali. It can be hard reading, but you might want to at least check that the quantity $\mathcal{E}(f,f)$ they introduce on line 4 is our "energy" $\mathbb{I}(f)$, up to a constant. (Indeed, $\mathcal{E}$ stands for "energy" in their paper.)

Given a graph $G = (V,E)$, its Lapalacian $L$ is usually defined to be the $V \times V$ symmetric matrix that has a $-1$ in entries $(i,j)$ and $(j,i)$ when this is an edge, deg($v$) in the $(v,v)$ diagonal entry, and zeroes elsewhere.

In our case we considered the discrete cube, $G$, with $V = \{-1,1\}^n$ and $(x,y) \in E$ iff $x$ and $y$ have Hamming distance 1.

You can see now that the diffusion equation I gave in Lecture 1 says that for functions $f : V \to \mathbb{R}$, the rate of change of $f(x)$ with respect to time is defined to be $(-1/2)L f$. (The factor $-1$ is important here, but the factor $1/2$ is not; it just slows down time by a factor of $2$.) The same diffusion process can be set up for any graph.

Further, the "basic solutions" to this differential equation - by which I mean those $f : V \to \mathbb{R}$ that just decay with time but don't change their "shape" - are the eigenvectors of the matrix $L$; i.e., the functions such that $L f = \lambda f$ for some constant $\lambda$. On the discrete cube, it's easy to check that the $2^n$ parity functions are all eigenvectors, and since they are linearly independent in a space of dimension $2^n$, linear algebra tells us this is all of them.

I'm not sure if there's a more principled way to derive the fact that the parity functions are the eigenvectors of $L$, as opposed to just observing that they are and that they are sufficient in number. Maybe someone can figure this out.

For way more on this than you might want to know, see this paper by Bobkov and Tetali. It can be hard reading, but you might want to at least check that the quantity $\mathcal{E}(f,f)$ they introduce on line 4 is our "energy" $\mathbb{I}(f)$, up to a constant. (Indeed, $\mathcal{E}$ stands for "energy" in their paper.)

## Thursday, January 18, 2007

### Voting with real numbers

In a comment, Yi suggested that voters use real numbers, say, in [-1,1]. This raises a number of interesting questions. Are our voting schemes now $f : [-1,1]^n \rightarrow \{-1,1\}$? Presumably here a vote of -1 means you really really like candidate -1, a vote of .9 means you really like candidate 1, a vote of .01 means you are basically indifferent with a very slight preference towards candidate 1, etc. (It's a bit like rating the candidates on the Hot Or Not scale.)

In some cases (where $f$ can be expressed as a multilinear polynomial) this is basically the same as usual voting; one can just have the voting machine convert a vote of $y$ to a random vote from $\{-1,1\}$ chosen to have expectation $y$. But in other cases it's not the same.

The KKL theorem can be extended (with an appropriate new definition of "influence") to the case of functions $f : [-1,1]^n \to \{-1,1\}$, but certain corollaries cease to hold. We may see this result in the class; it's by "BKKKL" -- the B is Fields Medalist is Jean Bourgain and the extra K is Yitzhak Katznelson.

In some cases (where $f$ can be expressed as a multilinear polynomial) this is basically the same as usual voting; one can just have the voting machine convert a vote of $y$ to a random vote from $\{-1,1\}$ chosen to have expectation $y$. But in other cases it's not the same.

The KKL theorem can be extended (with an appropriate new definition of "influence") to the case of functions $f : [-1,1]^n \to \{-1,1\}$, but certain corollaries cease to hold. We may see this result in the class; it's by "BKKKL" -- the B is Fields Medalist is Jean Bourgain and the extra K is Yitzhak Katznelson.

## Wednesday, January 17, 2007

### Stealing from Avrim

I was checking out the materials for Avrim and Nina's Learning Theory course today. Two things:

1. I swear I didn't steal the "spam" example for learning from him!

2. I like his idea of dividing the homework into "exercises" (relatively straightforward problems you should really do yourself) and "problems" (harder ones, where collaboration would be fine). On our homework, #2 -- #6 would be exercises, #7 a problem, and #1 halfway in between (probably closer to a problem).

In particular, if you're looking at the homework and thinking, "Man, I can't even do #1!", try moving on to #2 through #6.

1. I swear I didn't steal the "spam" example for learning from him!

2. I like his idea of dividing the homework into "exercises" (relatively straightforward problems you should really do yourself) and "problems" (harder ones, where collaboration would be fine). On our homework, #2 -- #6 would be exercises, #7 a problem, and #1 halfway in between (probably closer to a problem).

In particular, if you're looking at the homework and thinking, "Man, I can't even do #1!", try moving on to #2 through #6.

## Tuesday, January 16, 2007

### Voting power

The slides from Lecture 1 are on the course web site now.

I mentioned John F. Banzhaf III in class; he's got quite a web site there, in which he proudly declaims about being the "Osama bin Laden of torts", among other things. I guess he's famous for suing McDonald's (over obesity) and tobacco companies. Apparently he's part of the reason for smoking bans in the UK and other countries. He also shows up in the news these days as a commenter on the Duke Lacrosse happening.

Anyway, he's frequently credited with describing the notion of influences of voters on elections we discussed in class. That came about when he sued the Nassau County Board for having an unfair voting scheme; they passed proposals according to the following boolean function:

$f(x)$ = 1 iff $9x_1 + 9x_2 + 7x_3 + 3x_4 + x_5 + x_6 \geq 16$.

You can check that for this $f$, $Inf_4(f)$ = $Inf_5(f)$ = $Inf_6(f)$ = 0.

More interesting info on measuring the power of voters is here:

Banzhaf Power Index

Voting Power

I mentioned John F. Banzhaf III in class; he's got quite a web site there, in which he proudly declaims about being the "Osama bin Laden of torts", among other things. I guess he's famous for suing McDonald's (over obesity) and tobacco companies. Apparently he's part of the reason for smoking bans in the UK and other countries. He also shows up in the news these days as a commenter on the Duke Lacrosse happening.

Anyway, he's frequently credited with describing the notion of influences of voters on elections we discussed in class. That came about when he sued the Nassau County Board for having an unfair voting scheme; they passed proposals according to the following boolean function:

$f(x)$ = 1 iff $9x_1 + 9x_2 + 7x_3 + 3x_4 + x_5 + x_6 \geq 16$.

You can check that for this $f$, $Inf_4(f)$ = $Inf_5(f)$ = $Inf_6(f)$ = 0.

More interesting info on measuring the power of voters is here:

Banzhaf Power Index

Voting Power

Subscribe to:
Posts (Atom)