Please turn in a PDF of typeset solutions solutions using LaTeX. Please turn in the pset by emailing me this pdf.
This pset may have bugs! Please email me, or reach out on Piazza, if you think there is a bug in a problem!
Due: October 14, 11:59pm Eastern time.
You may use external resources such as Wikipedia to look up material which would reasonably be a prerequisite for this course (undergraduate-level algebra, linear algebra, probability, etc.). These exercises are designed to teach you something interesting about SoS and to acquaint you with an idea we won’t have time to cover in lecture. As such, solutions to some of them may appear in the research literature or in reference material for this course. Please avoid using such solutions, and if you find you have already read one, please set aside the corresponding problem for a few days before working on it again.
Problem 1, 10 pts (Size of SoS coefficients): In lecture 1, we stated, but did not prove, that SoS proofs of nonnegativity over the hypercube can without loss of generality be taken to have coefficients expressible in a small number of bits. This is an important component of the proof that a polynomial-time algorithm using semidefinite programming can find such SoS proofs, if they exist.
Part A: Let \(f \, : \, \{0,1\}^n \rightarrow \mathbb{R}\) be a Boolean function on \(n\) bits, let \(d \in \mathbb{N}\) be even, and suppose that \(|f(x)| \leq n^{O(d)}\) for all \(x\) and that \(\vdash_d f \geq 0\). Show that there are \(p_1,\ldots,p_m \in \mathbb{R}[x]_{\leq d}\) with \(m \leq n^{O(d)}\) such that the magnitude of every coefficient of each \(p_i\) when written as a multilinear polynomial is at most \(n^{O(d)}\).
Part B: The Ellipsoid algorithm for finding a point in a convex set has the following guarantee. If \(\mathcal{C} \subseteq \mathbb{R}^n\) is contained in a ball of radius \(R\) and contains a ball of radius \(r\), then Ellipsoid returns a point in \(\mathcal{C}\) after \((n \log R / r)^{O(1)}\) calls to a separation oracle for \(\mathcal{C}\).
Show that there is a polynomial-time algorithm which takes as input a degree-\(d\) function \(f \, : \, \{0,1\}^n \rightarrow \mathbb{R}\), represented as a list of \(n^{O(d)}\) rational coefficients of magnitude at most \(n^{O(d)}\), and a parameter \(\varepsilon> 0\), and if \(\vdash_d f \geq 0\), returns a proof \(p_1,\ldots,p_m\) witnessing \(\vdash_d f + \epsilon \geq 0\), in time \((n^d \cdot \log (1/\varepsilon))^{O(1)}\).
You may assume that there is a polynomial-time separation oracle for the set of PSD matrices – that is, an algorithm which, given an \(N \times N\) matrix \(M\) of rationals as input, either certifies that this matrix is PSD or returns a PSD matrix \(X\) such that \(\langle X,M \rangle < 0\) but \(\langle X,P \rangle \geq 0\) for all \(P \succeq 0\).
Problem 2, 10 pts (Max cut in almost-bipartite graphs): Show that there is a polynomial-time algorithm with the following guarantee: given a graph \(G = (V,E)\) such that there is a cut which cuts \((1-\varepsilon)|E|\) edges, the algorithm outputs a cut which cuts \((1-\tilde{O}(\sqrt{\varepsilon}))|E|\) edges. \((\tilde{O}\) can hide factors of \(\log(1/\varepsilon)\), though this is not strictly necessary.)
You may use the following basic anticoncentration fact for Gaussians: if \(Z \sim N(0,1)\), then \(Pr(|Z| \leq \delta) = O(\delta)\).
Problem 3, 10 pts (Max Bisection): Max-bisection is the following optimization problem on graphs. Given a graph \(G = (V,E)\), with \(|V|\) an even number, the goal is to find \[ \max_{S \subset V, |S| = n/2} E(S,\overline{S})\] where \(E(S,\overline{S})\) denotes the set of edges crossing the cut \((S,\overline{S})\).
In this exercise, we will see another application of the low global correlation idea to round pseudoexpectations for max bisection. The goal will be to prove the following theorem:
Theorem: For every \(\varepsilon> 0\), there is an algorithm running in time \(n^{(1/\varepsilon)^{O(1)}}\) which \((1+\varepsilon)\)-approximates the max-bisection value in dense graphs.
It turns out that the ideas to prove this result can be combined with an extension of the ideas in Problem 2 to prove the following stronger theorem (which is not conceptually different but requires more technical work to prove): there is a polynomial time algorithm which, given a graph in which the max-bisection value is at least \((1-\varepsilon) |E|\), returns a bisection cutting \((1-O(\sqrt{\varepsilon}))|E|\) edges.
Constrained pseudoexpectations: You may assume that there is an \(n^{O(d)}\)-time algorithm for the following problem: given a graph \(G\), find a pseudoexpectation \(\tilde{\mathbb E}\) of degree \(d\) maximizing \(\tilde{\mathbb E}\sum_{i \sim j} (x_i - x_j)^2\) among all \(\tilde{\mathbb E}\) with the following “balance” property: for all polynomials \(p\) of degree at most \(d-1\), \(\tilde{\mathbb E}p(x) \cdot \sum_{i \leq n} x_i = \tfrac n 2 \tilde{\mathbb E}p(x)\). (We are ignoring some minor numerical issues here.) Notice that this property would be satisfied by \(\mathbb E_\mu p(x)\) for any distribution \(\mu\) on bisections; that is, on Boolean vectors with \(\sum_{i \leq n} x_i = n/2\).
Part A: Prove that if \(\tilde{\mathbb E}\) is a degree-\(d\) pseudoexpectation satisfying the balance property above, and if \(\tilde{\mathbb E}'\) is the result of conditioning \(\tilde{\mathbb E}\) on the values of \(t \ll d\) variables, then \(\tilde{\mathbb E}'\) also satisfies the balance property.
Part B: Consider independent rounding applied to \(\tilde{\mathbb E}\) to produce a Boolean vector \(y \in \{0,1\}^n\). Show that if the global information of \(\tilde{\mathbb E}\) is \(\delta\), then \(Var(\sum y_i) \leq \delta^{\Omega(1)} n^2\). (In fact, you can even show \(Var(\sum y_i) \leq O(n)\).)
Part C: Prove the above theorem on the min-bisection value in dense graphs. You may appeal to any facts from lecture about global information, correlation rounding, etc. Your algorithm may be randomized and should succeed with high probability.
Problem 4, 10 pts (Fun with SoS Proofs):
Part A (Cauchy-Schwarz): For every family of degree-\(d\) polynomials \(p_1(x),\ldots,p_m(x), q_1(x),\ldots,q_m(x)\), show that \[\vdash_{4d} (\sum_{i \leq m} p_i(x)^2 )(\sum_{i \leq m} q_i(x)^2) - (\sum_{i \leq m} p_i(x) q_i(x))^2 \geq 0.\]
Part B (Cauchy-Schwarz, Pseudoexpectation version): Show that for every degree-\(d\) pseudoexpectation \(\tilde{\mathbb E}\) and every pair of degree \(d/2\) polynomials \(p,q\), \(\tilde{\mathbb E}pq \leq \sqrt{\tilde{\mathbb E}p^2 \cdot \tilde{\mathbb E}q^2}\).
Part C (Approximate Triangle Inequalities): Show that there is a constant \(C\) (indep. of \(m\)) such that for vectors of polynomials \(p_1(x),\ldots,p_m(x), q_1,(x),\ldots,q_m(x)\), \[ \vdash_{O(\max \{ \deg p_i, \deg q_i \}_{i \leq m})} C (\|p\|_2^2 + \|q\|_2^2) - \| p + q \|_2^2 \geq 0\] and in fact for every even \(t\), there is \(C_t\) such that \[ \vdash_{O(t \cdot \max \{ \deg p_i, \deg q_i \}_{i \leq m})} C_t (\|p\|_t^t + \|q\|_t^t) - \|p+q\|_t^t \geq 0.\]
Problem 5, 10 pts (Finding hidden cliques):
Part A: A random graph \(G\) on \(n\) vertices (containing each possible edge independently with probability \(1/2\)) contains, with high probability, a clique of size \((2 -o(1))\log n\), and no clique larger than \((2+ o(1)) \log n\). But, are there SoS proofs of the latter fact?
Show that the following sort of SoS proof (somwhat different from what we have discussed thus far) exists with high probability over choice of a random \(G\): \[O(\sqrt n) - \sum_{i \leq n} x_i = \sum_{i,j \leq n, i \not \sim j} p_{ij}(x) x_i x_j + \sum_{i \leq m} q_i(x)^2, \text{ for all } x \in \{0,1\}^n,\] where \(p_{ij}, q_i\) are polynomials of degree at most \(O(1)\), and \(i \not \sim j\) means that \(i\) and \(j\) are not adjacent in \(G\).
(First, argue that the existence of such polynomials rules out cliques in \(G\) of size \(\gg \sqrt n\).)
You may use the following fact about random graphs. If \(G\) is a random graph and \(A\) is its adjacency matrix, then with high probability, \(\|A - \tfrac 12 11^\top\| \leq O(\sqrt n)\), where \(1\) denotes the all-\(1\)s vector.
Part B (robustness against monotone adversary): Argue that the same type of SoS proof exists with high probability for a random graph \(G\) which has been adversarially perturbed in the following way: observing the graph, a malicious adversary removes (but does not add) an arbitrary set of edges.
Part C (recovering a clique): Consider the following planted clique distribution on random graphs: a graph \(G\) is sampled uniformly, and then a subset \(S\) of vertices of size \(k\) is also chosen uniformly at random. Any edges with both endpoints in \(S\) are added to \(G\), so that \(S\) is a clique in \(G\).
UPDATED: Show that there is a polynomial time algorithm which, with probability at least \(0.99\), finds a subset of \(O(\sqrt n)\) vertices of \(G\) which contains \(99\%\) of \(S\).
OPTIONAL: Show that there is a polynomial-time algorithm which, with probability at least \(0.99\), finds a clique of size \(\Omega(\sqrt n)\) in \(G\), assuming \(k \geq C\sqrt{n}\) for a large-enough constant \(C\).
Show furthermore that this algorithm still finds \(S\) even if a monotone adversary removes any non-clique edges from \(G\) before your algorithm receives \(G\) as input.
You may assume that there is a polynomial-time algorithm for the following problem: given a graph \(G\) and constant \(d\), find a degree-\(d\) pseudoexpectation maximizing \(\tilde{\mathbb E}\sum_{i \leq n} x_i\) among all \(\tilde{\mathbb E}\) with the following property: for each non-edge \((i,j)\) in \(G\), and every polynomial \(p\) of degree at most \(d-2\), \(\tilde{\mathbb E}p(x) \cdot x_i x_j = 0\). (That is, \(\tilde{\mathbb E}\) ``looks like’’ the moments of a distribution supported only on cliques in \(G\).)
Explanation: There are less-sophisticated algorithms for finding a planted clique of size \(\gg \sqrt n\) in a random \(n\)-vertex graph. So, why study the SoS-based algorithm?
Take, for instance, an algorithm based on eigenvectors of the adjacency matrix – it turns out that for \(k \gg \sqrt n\), the top eigenvector of the (centered) adjacency matrix is well-aligned with the indicator vector for the clique (in the non-adversarial case), and this can be used to find the hidden clique. However, this algorithm is in some sense overfit to the random-graph model: the monotone adversary, who intuitively shouldn’t make finding a planted clique harder – after all, she is making the rest of the graph less “clique-y” – can actually make this spectral algorithm fail.
SoS-based algorithms often have appealing robustness guarantees, making the details of the theoretical model in which one analyzes them less important. This is nice, since in real life you rarely know if any assumptions you have made about input data to make some theoretical analysis work actually hold for the data you have in hand.