Prev: "Book Smart" NP-Complete Method: Musatov is closing in... Gaining... People are starting to talk...
Next: Call for papers - AITC 2010
From: babagamppis on 24 May 2010 07:13 This is the html version of the file http://www.cs.princeton.edu/~arora/talks/pcptalk.ppt. Executed and transcribed by M. M. Musatov and http://meami.org/ How NP got a new definition: Probabilistically Checkable Proofs (PCPs) & Approximation Properties of NP-hard problems SANJEEV ARORA PRINCETON UNIVERSITY Recap of NP-completeness and its philosophical importance. Definition of approximation. How to prove approximation is NP-complete (new definition of NP; PCP Theorem) Survey of approximation algorithms. Talk Overview A central theme in modern TCS: Computational Complexity How much time (i.e., # of basic operations) are needed to solve an instance of the problem? Example: Traveling Salesperson Problem on n cities Number of all possible salesman tours = n! (> # of atoms in the universe for n =49) One key distinction: Polynomial time (n3, n7 etc.) versus Exponential time (2n, n!, etc.) n =49 Is there an inherent difference between being creative / brilliant and being able to appreciate creativity / brilliance? Writing the Moonlight Sonata Proving Fermatâs Last Theorem Coming up with a low-cost salesman tour Appreciating/verifying any of the above When formulated as âcomputational effortâ, just the P vs NP Question. P vs. NP NP P NPC âYESâ answer has certificate of O(nc) size, verifiable in O(ncâ) time. Solvable in O(nc) time. NP-complete: Every NP problem is reducible to it in O(nc) time. (âHardestâ) e.g., 3SAT: Decide satisfiability of a boolean formula like Pragmatic Researcher Practical Importance of P vs NP: 1000s of optimization problems are NP-complete/NP-hard. (Traveling Salesman, CLIQUE, COLORING, Scheduling, etc.) âWhy the fuss? I am perfectly content with approximately optimal solutions.â (e.g., cost within 10% of optimum) Bad News: NP-hard for many problems. Good news: Possible for quite a few problems. Approximation Algorithms MAX-3SAT: Given 3-CNF formula ï¦, find assignment maximizing the number of satisfied clauses. An ï¡-approximation algorithm is one that for every formula, produces in polynomial time an assignment that satisfies at least OPT/ï¡ clauses. (ï¡ Ã¸ 1). Good News: [KZâ97] An 8/7-approximation algorithm exists. Bad News: [Hastadâ97] If P ï¹ NP then for every ï¥ > 0, an (8/7 -ï¥)-approximation algorithm does not exist. Observation (1960âs thru ~1990) NP-hard problems differ with respect to approximability [Johnsonâ74]: Provide explanation? Classification? Last 15 years: Avalanche of Good and Bad news Next few slides: How to rule out existence of good approximation algorithms (New definition of NP via PCP Theorems and why it was needed) Recall: âReductionâ âIf you give me a place to stand, I will move the earth.â â Archimedes (~ 250BC) âIf you give me a polynomial-time algorithm for 3SAT, I will give you a polynomial-time algorithm for every NP problem.â --- Cook, Levin (1971) âEvery instance of an NP problem can be disguised as an instance of 3SAT.â a 1.01-approximation for MAX-3SAT [A., Safra] [A., Lund, Motwani, Sudan, Szegedy] 1992 MAX-3SAT Desired Way to disguise instances of any NP problem as instances of MAX-3SAT s.t. âYesâ instances turn into satisfiable formulae âNoâ instances turn into formulae in which < 0.99 fraction of clauses can be simultaneously satisfied âGapâ Cook-Levin reduction doesnât produce instances where approximation is hard. Transcript of computation ? Transcript is âcorrectâ if it satisfies all âlocalâ constraints. Main point: Express these as boolean formula But, there always exists a transcript that satisfies almost all local constraints! (No âGapâ) New definition of NPâ¦. Recall: Usual definition of NP INPUT x CERTIFICATE ï n nc M x is a âYESâ input ï there is a ï s.t. M accepts (x, ï) x is a âNOâ input ï M rejects (x, ï) for every ï NP = PCP (log n, 1) [ASâ92][ALMSSâ92]; inspired by [BFLâ90], [BFLSâ91][FGLSSâ91] x is a âYESâ input ï there is a ï s.t. M accepts (x, ï) x is a âNOâ input ï for every ï, M rejects (x, ï) INPUT x CERTIFICATE ï n nc M Reads Fixed number of bits (chosen in randomized fashion) Pr [ ] = 1 Pr [ ] > 1/2 Uses O(log n) random bits (Only 3 bits ! (Hastad 97)) Many other âPCP Theoremsâ known now. Disguising an NP problem as MAX-3SAT INPUT x ? M O(lg n) random bits Note: 2O(lg n) = nO(1). ) M â¡ nO(1) constraints, each on O(1) bits x is YES instance ) All are satisfiable x is NO instance ) · ½ fraction satisfiable âgapâ Of related interestâ¦. Do you need to read a math proof completely to check it? Recall: Math can be axiomatized (e.g., Peano Arithmetic) Proof = Formal sequence of derivations from axioms Verification of math proofs Theorem Proof M M runs in poly(n) time n bits (spot-checking) O(1) bits PCP Theorem Theorem correct ï there is a proof that M accepts w. prob. 1 Theorem incorrect ï M rejects every claimed proof w. prob 1/2 HITTING SET DOMINATING SET HYPERGRAPH - TRAVERSAL ... [PY â88]; OTHERS [LY â93] [LY â93, ABSS â93] Known Inapproximability Results The tree of reductions [AL â96] MAX-3SAT MAX-3SAT(3) CLIQUE LABEL COVER SET COVER COLORING [PY â88] [LY â93] [FGLSS â91, BS â89] Metric TSP Vertex Cover MAX-CUT STEINER... NEAREST VECTOR MIN-UNSATISFY QUADRATIC -PROGRAMMING LONGEST PATH .... INDEPENDENT SET BICLIQUE COVER FRACTIONAL COLORING MAX-PLANAR SUBGRAPH MAX-SET PACKING MAX-SATISFY Class II O(lg n) Class I 1+ï¥ Class III 2(lg n)1-ï¥ Class IV nï¥ Proof of PCP Theorems ( Some ideas ) Need for ârobustâ representation O(lg n) random bits 3 bits 1 0 0 1 0 1 1 1 0 1 0 0 0 1 0 1 ï : Randomly corrupt 1% of ï x x x Correct proof still accepted with 0.97-ï¥ probability! Original proof of PCP Thm used polynomial representations, Local âtestingâ algorithms for polynomials, etc. (~30-40 pages) New Proof (Dinurâ06); ~15-20 pages Repeated applications of two operations on the clauses: Globalize: Create new constraints using âwalksâ in the adjacency graph of the old constraints. Domain reduction: Change constraints so variables take values in a smaller domain (e.g., 0,1) (uses ideas from old proof) Unique game conjecture and why it is useful Problem: Given system of equations modulo p (p is prime). 7x2 + 2x4 = 6 5x1 + 3x5 = 2 ï ï 7x5 + x2 = 21 2 variables per equation UGC (Khot03): Computationally intractable to distinguish between the cases: 0.99 fraction of equations are simultaneously satisfiable no more than 0.001 fraction of equations are simultaneously satisfiable. Implies hardness of approximating vertex cover, max-cut, etc. (K04), (KR05)(KKMO05) Applications of PCP Techniques: Tour dâHorizon Locally checkable / decodable codes. List decoding of error-correcting codes. Private Info Retrieval Zero Knowledge arguments / CS proofs Amplification of hardness / derandomization Constructions of Extractors. Property testing [Sudan â96, Guruswami-Sudan] [Katz, Trevisan 2000] [Kilian â94] [Micali] [Lipton â88] [A., Sudan â97] [Sudan, Trevisan, Vadhan] [Safra, Ta-shma, Zuckermann] [Shaltiel, Umans] [Goldreich, Goldwasser, Ron â97] Approximation algorithms: Some major ideas Relax, solve, and round : Represent problem using a linear or semidefinite program, solve to get fractional solution, and round to get an integer solution. (e.g., MAX-CUT, MAX-3SAT, SPARSEST CUT) Primal-dual: âGrowâ a solution edge by edge; prove its near optimality using LP duality. (Usually gives faster algorithms.) e.g., NETWORK DESIGN, SET COVER How can you prove that the solution you found has cost at most 1.5 times (say) the optimum cost? Show existence of âeasy to findâ near-optimal solutions: e.g., Euclidean TSP and Steiner Tree What is semidefinite programming? Ans. Generalization of linear programming; variables are vectors instead of fractions. âNonlinear optimization.â [Groetschel, Lovasz, Schrijver â81]; first used in approximation algorithms by [Goemans-Williamsonâ94] Next few slides: The semidefinite programming approach G = (V,E) n vertices v1 v2 v3 vn Rn n vectors, d(vi,vj) satisfy some constraints. Ex: 1.13 ratio for MAX-CUT, MAX-2SAT [GW â93] O(lg n) ratio for min-multicut, sparsest cut. [LLR â94, AR â94] n1/4-coloring of 3-colorable graphs. [KMS â94] (lg n)O(1) ratio for min-bandwidth and related problems [F â98, BKRV â98] 8/7 ratio for MAX-3SAT [KZ â97] plog n-approximation for graph partitioning problems (ARV04) Main Idea: âRoundâ How do you analyze these vector programs? Ans. Geometric arguments; sometimes very complicated Ratio 1.13.. for MAX-CUT [GW â93] G = (V,E) Find that maximizes capacity . Quadratic Programming Formulation Semidefinite Relaxation [DP â91, GW â93] Randomized Rounding [GW â93] v6 v2 v3 v5 Rn v1 Form a cut by partitioning v1,v2,...,vn around a random hyperplane. SDPOPT vj vi ï±ij Old math rides to the rescue... sparsest cut: edge expansion Input: A graph G=(V,E). S E(S, S) For a cut (S,S) let E(S,S) denote the edges crossing the cut. The sparsity of S is the value The SPARSEST CUT problem is to find the cut which minimizes ï¡(S). SDPs used to give plog n -approximation involves proving a nontrivial fact about high-dimensional geometry [ARV04] ARV structure theorem Arora, Rao, and Vazirani showed how the SDP could be rounded to obtain an approximation to Sparsest Cut (2004) ARV structure theorem: If the points xu 2 Rn are well-spread, e.g. ï¥u,v (xu-xv)2 ø 0.1 and xu2 · 10 for u 2 V and d(u,v) = (xu-xv)2 is a metric, then⦠A B There exist two large, well-separated sets A, B µ {x1, x2, â¦, xn} with |A|,|B| ø 0.1 n and After we have such A and B, it is easy to extend them to a good sparse cut in G. Unexpected progress in other disciplines⦠ARV structure theorem led to new understanding of the interrelationship between l1 and l2 norms (resolved open question in math) l1 distances among n points can be realized as l2 distances among some other set of n points, and the distortion incurred is only plog n [A., Lee, Naorâ05], building upon [Chawla Gupta Raeckeâ05] Theory of Approximability? Desired Ingredients: Definition of approximation-preserving reduction. Use reductions and algorithms to show: Approx. upto ï(ï¡) factor ï(ï¢) factor ï(ï§) .. . . . . . . All interesting problems Partial Progress Max-SNP: Problems similar to MAX-3SAT. [PY â88] RMAX(2): Problems similar to CLIQUE. [PR â90] F+ï2(1): Problems similar to SET COVER. [KT â91]] MAX-ONES CSP, MIN-CSP,etc. (KST97, KSM96) Further Directions Investigate alternatives to approximation Average case analysis Slightly subexponential algorithms (e.g. 2o(n) algorithm for CLIQUE??) Resolve the approximability of graph partitioning problems. (BISECTION, SPARSEST CUT, plog n vs loglog n??) and Graph Coloring 3. Complete the classification of problems w.r.t. approximability. 4. Simplify proofs of PCP Thms even further. 5. Resolve âunique gamesâconjecture. 6. Fast solutions to SDPs? Limitations of SDPs? Attributions Definition of PCP Polynomial Encoding Method Verifier Composition PCP Hardness of Approx. Fourier Transform Technique [Fortnow, Rompel, Sipser â88] [Feige, Goldwasser, Lovįsz, Safra, Szegedy â91] [Arora, Safra â92] [Lund, Fortnow, Karloff, Nisan â90] [Shamir â90] [Babai, Fortnow â90] [Babai, Fortnow, Levin, Szegedy â91] [Arora, Safra â92] [FGLSS â91] [ALMSS â92] [HÃ¥stad â96, â97] Constraint Satisfaction Problems Let F = a finite family of boolean constraints. An instance of CSP(F): x1 x2 xn g1 g2 gm .. . . . . . . . . . . . .. . . . . . . . . . . . variables functions from F [Schaefer â78] Ex: Dichotomy Thm: P NP Complete {CSP(F) : F is finite} Iff F is 0-valid, 1-valid, weakly positive or negative, affine, or 2CNF MAX-CSP MAX-ONES-CSP MIN-ONES-CSP P MAX-SNP-hard (1+ï¥) ratio is NP-hard Iff F is 0-valid, 1-valid, or 2-monotone [Creignou â96] [Khanna, Sudan, Williamson â97] (Supercedes MAXSNP work) Ex: P 1+ï¥ nï¥ Feasibilty NP-hard Feasibility is undecidable Ex: [KSW â97] [KST â97] P 1+ï¥ nï¥ Feasibilty NP-hard NEAREST-CODEWORD-complete MIN-HORN-DELETION-complete Geometric Embeddings of Graphs G = (V,E) n vertices v1 v2 v3 vn Rn n vectors, d(vi,vj) satisfy some constraints. Ex: 1.13 ratio for MAX-CUT, MAX-2SAT [GW â93] O(lg n) ratio for min-multicut, sparsest cut. [LLR â94, AR â94] n1/4-coloring of 3-colorable graphs. [KMS â94] (lg n)O(1) ratio for min-bandwidth and related problems [F â98, BKRV â98] 8/7 ratio for MAX-3SAT [KZ â97] plog n-approximation for graph partitioning problems (ARV04) Example: Low Degree Test F =GF(q) f : F m ! F Is f a degree d polynomial ? Easy: f is a degree d polynomial iff its restriction on every line is a univariate degree d polynomial. [Line â¡ 1 dimensional affine subspace] â¡ q points. Does f agree with a degree d polynomial in 90% of the points? Theorem: Iff on ~ 90% of lines, f has agreement ~90% with a univariate degree d polynomial. Weaker results: [Babai, Fortnow, Lund â90] [Rubinfeld Sudan â92] [Feige, Goldwasser, Lovįsz, Szegedy â91] Stronger results: [A. Sudan â96]; [Raz, Safra â96] Example: Low Degree Test F =GF(q) f : F m ! F Is f a degree d polynomial ? Does f agree with a degree d polynomial in 90% of the points? Theorem: Iff on ~ 90% of lines, f has agreement ~90% with a univariate degree d polynomial. Weaker results: [Babai, Fortnow, Lund â90] [Rubinfeld Sudan â92] [Feige, Goldwasser, Lovįsz, Szegedy â91] Stronger results: [A. Sudan â96]; [Raz, Safra â96] The results described in this paper indicate a possible classification of optimization problems as to the behavior of their approximation algorithms. Such a classification must remain tentative, at least until the existence of polynomial-time algorithms for finding optimal solutions has been proved or disproved. Are there indeed O(log n) coloring algorithms? Are there any clique finding algorithms better than O(ne) for all e>0? Where do other optimization problems fit into the scheme of things? What is it that makes algorithms for different problems behave the same way? Is there some stronger kind of reducibility than the simple polynomial reducibility that will explain these results, or are they due to some structural similarity between te problems as we define them? And what other types of behavior and ways of analyzing and measuring it are possible? David Johnson, 1974 MAX-LIN(3): Given a linear system over GF(2) of the form NP-hard Optimization Problems MAX-3SAT: Given 3-CNF formula ï¦, find assignment maximizing the number of satisfied clauses. find its largest feasible subsystem. Defn: An ï¡-approximation for MAX-LIN(3) is a polynomial-time algorithm that computes, for each system, a feasible subsystem of size ø . (ï¡ Ã¸ 1) Approximation Algorithms Easy Fact: 2-approximation exists. Theorem : If P ï¹ NP, (2-ï¥)-approximation does not exists. Common Approx. Ratios Early History Grahamâs algorithm for multiprocessor scheduling [approx. ratio = 2] 1971,72 NP-completeness Sahni and Gonzalez: Approximating TSP is NP-hard 1975 FPTAS for Knapsack [IK] 1976 Christofides heuristic for metric TSP 1977 Karpâs probabilistic analysis of Euclidean TSP 1980 PTAS for Bin Packing [FL; KK] 1980-82 PTASâs for planar graph problems [LT, B] Subsequent Developments 1988 MAX-SNP: MAX-3SAT is complete problem [PY] 1990 IP=PSPACE, MIP=NEXPTIME 1991 First results on PCPs [BFLS, FGLSS] 1992 NP=PCP(log n,1) [AS,ALMSS] 1992-95 Better algorithms for scheduling, MAX- CUT [GW], MAX-3SAT,... 1995-98 Tight Lowerbounds (H97); (1+ ï¥)- approximation for Euclidean TSP, Steiner Tree... 1999-now Many new algorithms and hardness results. 2005 New simpler proof of NP=PCP(log n,1) (Dinur) 3SAT: Given a 3-CNF formula, like decide if it has a satisfying assignment. THEOREMS: Given decide if T has a proof of length · n in Axiomatic Mathematics Philosophical meaning of P vs NP: Is there an inherent difference between being creative / brilliant and being able to appreciate creativity / brilliance? SOME NP-COMPLETE PROBLEMS âFeasibleâ computations: those that run in polynomial (i.e.,O(nc)) time (central tenet of theoretical computer science) e.g., time is âinfeasibleâ NP=PCP(log n, 1) [A., Safra â92] [A., Lund, Motwani, Sudan, Szegedy â92] INPUT x CERTIFICATE ï n nc M O(1) bits O(lg n) random bits Accept / Reject x is a âYESâ input ï there is ï s.t. M accepts x is a âNOâ input ï for every ï M rejects > 1 - ï¥ > ½ + ï¥ HÃ¥stadâs 3-bit PCP Theorem (1997) Reads 3 bits; Computes sum mod 2 Pr[ ] Pr[ ] (2-ï¥)-approx. to MAX-LIN(3) ) P=NP INPUT x ? M O(lg n) random bits Note: 2O(lg n) = nO(1). ) M â¡ nO(1) linear constraints x is YES instance ) > 1-ï¥ fraction satisfiable x is NO instance ) · ½+ï¥ fraction satisfiable 1-ï¥ _ ½ +ï¥ Polynomial Encoding Idea 1 [LFKN â90] [BFL â90] Sequence of bits / numbers 2 4 5 7 Represent as m-variate degree d polynomial: 2x1x2 + 4x1(1-x2) + 5x2(1-x1) + 7(1-x1)(1-x2) Evaluate at all points in GF(q)m Note: 2 different polynomials differ in (1-d/q) fraction of points. 2nd Idea: Many properties of polynomials are locally checkable. Program Checking [Blum Kannan â88] Program Testing / Correcting [Blum, Luby, Rubinfeld â90] MIP = NEXPTIME [Babai, Fortnow, Lund â90] 1st âPCP Theoremâ Dinur [05]âs proof uses random walks on expander graphs instead of polynomials. HÃ¥stadâs 3-bit Theorem (and âfourier methodâ) NP = PCP(lg n, 1) T1 T2 c bits 1 bit YES instances ) 9 T1T2 Pr[Accept] = 1 NO instances ) 8 T1T2 Pr[Accept] < 1-ï¤ V0 Razâs Thm S1 S2 ck bits k bits Pr[Accept] = 1 vs. Pr[Accept] < 2-k/10 V1 22ck bits 22k bits LONG CODING [BGS â95] Verifier Composition V2 Suppose Pr[Accept] > ½ + ï¥ (A few pages of Fourier Analysis) 9 S1 S2 which V1 accepts w/ Prob ø 2-k/10 ) x is a YES instance. Sparsest Cut / Edge Expansion S S G = (V, E) c- balanced separator Both NP-hard ï¡(G) = min S µ V | E(S, Sc)| |S| |S| < |V|/2 ï¡c(G) = min S µ V | E(S, Sc)| |S| c |V| < |S| < |V|/2 c-balanced separator ï¡c(G) = min S µ V | E(S, Sc)| |S| c |V| < |S| < |V|/2 S S Assign {+1, -1} to v1, v2, â¦, vn to minimize ï¥(i, j) 2 E |vi âvj|2/4 Subject to ï¥i < j |vi âvj|2/4 ø c(1-c)n2 +1 -1 |vi âvj|2/4 =1 Semidefinite relaxation for Find unit vectors in <n |vi âvj|2 + |vj âvk|2 ø |vi âvk|2 8 i, j, k Triangle inequality âcut semimetricâ |vi âvj|2 =0 |