## Greatest Hits of TCS: SL=L

Alright!  As promised, here’s the first of the Greatest Hits of TCS.

To kick it off, Prasad Raghavendra presented Omer Reingold’s celebrated result of SL=L.  Prasad not only gave a beautiful tour through the work and threw some great perspectives on it, but he also set a strong precedent for embedding the work within its cultural context.

# The Outline

I’ll follow the talk loosely and give some of my own musings as we go.  But as this is the first in the Greatest Hits series, I’ll lay down a format that I hope to follow for the rest of the series.

We’ll start off with a good dose of the human element of research in Teetering Bulbs, where we’ll talk about the humans involved.  Of course, these aren’t greatly relevant to the result but, as my blog hopefully gets across, I’m in this business for the human journey and for research as a social activity where we learn not only about Platonic truths but about each other and ourselves as humans (if you’re an AI or not a human, feel free to skip this portion).

We’ll then move to Once Upon A Time… where we’ll look at the landscape the result sits in and the narrative that that leads up to it and that it continues.  Context!

The next section, If You Get Nothing Else…, it almost a TL;DR for Once Upon A Time…  This section may not even be specifically related to the result but it should capture the main paradigm (shift) that the result captures or falls into.  If you get nothing else out of this post, at least get the big landscape picture If You Get Nothing Else… tries to capture.

The Result section is pretty self-explanatory.  It mentions the result and and fits it into the story Once Upon A Time… sets up.

Finally, we’ll sketch the proof or main ideas in The Sketch.  These may be to varying degrees of depth but some of them may end up being in the following spirit:

The rest of the series should follow this outline so read in whatever order and whichever subset of sections you want, although I won’t redefine the outline in the future.  Feel free to post questions or comments, especially if you want me to elaborate on something or if there’s an error or to just show off your knowledge!  Have fun!

# Teetering Bulbs

## Omer

Omer Reingold is a great human.  He is, in fact, one of my inspirations for this blog.  The popular Windows On Theory blog was in part created by him and he wrote its very first post; his introspection, humility, self-deprecation, aspiration, and overwhelming humanity in that post was consciously in my mind as I corralled courage for my own first post.

Omer also was at Berkeley the whole summer of 2015 for the Simons Institute’s Cryptography program.  While here, he carried on Shafi Goldwasser and Nir Shavit’s practice of playback theater, beginning it and leading most of them.  Playback is a form of improv theater where a participant tells some personal story and the performers play it back to them in some improvised way under some format.  It can be funny, poignant, personal, sad, inspiring, and always human.  I was very happy to participate in this with Omer and the lively crypto community.  And it did leave an impression on me how human and enthusiastic and open Omer was as a person.  He seems to embody research as a human experience and allows himself to be human in all aspects of his life.  And, as he confided in his blog posts, he shared how his research is marked with points of inactivity and how actually the result we’re about to talk about here was the culmination of a previous visit to Berkeley in which the result was not one he originally thought to accomplish while here but manifested as a surprising success.

# Once Upon A Time…

## The power of randomness!

OK, so let’s actually get to some material (or at least some context for the material).  Randomness gives us power!

The ability to flip coins and (most importantly) allow errors, relaxes our notion of what it means to “solve” a problem: we don’t always have to get the right answer, we just have to get it right with overwhelming probability (like, the probability we mess up is lower than the probability that we win the lottery six times in a row).  And so, using randomness as a resource, we can widen the scope of what we consider solvable by allowing (negligible) errors.

And, in the early days of computation, it became quickly apparent that randomness gives us a great deal of power.  By creating randomized algorithms we could (easily and elegantly) solve problems that have generally seemed very hard to pin down.

Primality Testing, for example, is the decision problem of determining whether or not a given number is prime.  Creating a deterministic algorithm that was efficient was very elusive for a long time, yet simple randomized algorithms were often created and used that decided the problem (with a negligible probability of failure).

Similarly, the problem of Polynomial Identity Testing (PIT) also evaded deterministic attacks, yet yields easily to a randomized technique (to see if a polynomial simplifies to be just zero/identity, just plug in random values to see if it ever outputs something not zero!).

## The power of randomness?

But, wait! – you say – we have a deterministic algorithm solving Primality Testing.

And, right you are.  We do have an algorithm for Primality Testing that doesn’t use a single coin flip.  In fact, many of the problems that were captured with randomness’ power in 70’s were later toppled entirely deterministically.

It began to look like randomness’ power wasn’t all that powerful…

PIT became the main stronghold left for problems left that were efficient when randomized but had not yet succumbed to a deterministic approach.  But, besides these loose ends, it seemed that randomness gave us no power.  That is BPP (the version of P where we can flip coins), captures no new problems that P can’t capture: BPP=P.

This was addressed rigorously in the beautiful work by Russell Impagliazzo and Avi Wigderson that shows we can derandomize BPP (i.e. BPP=P) based on worst-case hardness assumptions (a little stronger than P$\neq$NP, but that we nonetheless very much believe).

So, there we have it!  Randomness seems to not give us mortals (i.e. P) any power!

## Space Complexity

So far we’ve talked about randomness in terms of time-bounded complexity.  More specifically, we’ve talked about it in terms of the small (polynomial) lifespans of mortals: P.  (Note that this automatically puts a polynomial bound on space as well since you can’t write down more than polynomial information before your lifespan runs out and you drop dead!)

So, with mortals (which is what we really care about in complexity theory) still in mind, we can ask about the problems we can do super efficiently.  In this case let’s look at what we can do with just a little bit of memory on hand; that is, what can we do with just a logarithmic amount of space, L (note that it’s easy to show that L$\subseteq$P which, because of mortals, is why we care about log space and not polylog space).

Just because we can’t save time using randomness, doesn’t mean we can’t we be clever to use less space when flipping coins!  So the question still remains on if BPL=L.

We’re back to our original ponderings: does randomness give us power?

# If You Get Nothing Else…

Randomness is a resource.

When we allow ourselves randomness, we try to leverage a helpful resource for power.  Derandomization results are striking in that they tell us that the coins we flipped were placebos: we had the power in us all along and didn’t need that resource.

But, mostly, these results are profound philosophically.  We start poking at the nature of randomness and try to pin this age-old notion down rigorously and, surprisingly, often find that a random world is not more robust than a deterministic one.  So when people try to fight the Church-Turing thesis that humans are essentially just Turing machines by saying our brains are doing non-Turing stuff, this takes the wind out of one more of their arguments: “But I have free will and stuff.  I don’t do things deterministically!  There’s at least random occurrences happening in the marvelous biological machinery of the brain.”  Doesn’t matter!  You can still be modeled by P whether or not you live in a fatalistic world!

# The Result

Theorem: Undirected ST-Connectivity is in Log-Space

OK…So what is this and what does it have to do with randomness and what we have been talking about?

Undirected ST-Connectivity (call it USTCON) is the decision problem of, given an undirected graph $G=(V,E)$ and vertices $s,t\in V$, deciding whether $t$ is reachable from $s$ – i.e. are $s$ and $t$ connected.  So the theorem can be stated as USTCON$\in$L.

Again, what does this have to do with randomness?  Well, USTCON have a very simple randomized algorithm: just start at vertex $s$ and take a random walk for $n^5$ steps.  With high probability you’ll hit $t$ if they’re connected, so just accept if you hit $t$ and reject if you don’t and you’ll very likely be right.  Moreover, this is in logspace since you just need to store which node you’re on and that node’s neighbors (every graph can easily be made d-regular in logspace while preserving connected components) which takes logarithmically many bits to identify.  So what Omer did is akin to taking Primality Testing (which had an efficient randomized algorithm – i.e. it was in BPP) and finding a deterministic algorithm for it (i.e. put it in P); he moved USTCON from BPL to L.  We’re knocking off randomized problems one-by-one to show they can be derandomized (hinting that randomness doesn’t give any power)!

But, more than that, USTCON was an important problem to derandomize!  It has it’s own class defined by it!  So some background here: like the super-important P vs. NP problem, there’s the analogous L vs. NL problem.  And, just like 3-SAT is the flagship complete problem for NP, STCON (the directed version of USTCON) is the flagship complete problem for NL.  So, naturally, if STCON is so important and foundational to the L vs. NL problem, we wonder about the smaller problem of USTCON.  It seems easier and like it doesn’t need nondeterminism.  Where does it lie?  Does it shed light on the STCON problem?

At a loss for where USTCON should sit, Harry Lewis and Christos Papadimitriou captured it with the class SL.  By defining a notion of a “symmetric” Turing machine, they showed that USTCON is complete for SL, the class of problems decidable by logspace symmetric Turing machines or, equivalently, just the class of decision problems reducible to USTCON (Christos has told me before that he thinks of complexity classes in terms of their complete problems, as they exactly capture the notion of that class).

So, noting that USTCON$\in$BPL, we have the a picture of L$\subseteq$SL$\subseteq$BPL$\subseteq$NL.  Omer’s result not just puts USTCON in L, but collapses the first inequality so that SL=L, thus bring BPL ever closer to L.

# The Sketch

The main idea of solving USTCON in logspace is, in Omer’s own words, that “if you want to solve a connectivity problem on your input graph, first improve its connectivity.”

## The Dream

Imagine for a moment how simple life would be if all of our graphs were “well connected.”  What if, no matter what node I set you on, you knew you could always get to any of the other nodes in just 3 hops!  That is, there was always some path from whichever node you’re on to any other node you’d want to be at that was only 3 edges long.
Then USTCON would be so simple in this dream world!

All we’d have to do, to see if $s$ is connected to $t$, is to start at node $s$ and go through all length-3 paths from there to see if you ever hit $t$!  If you hit $t$ then they’re certainly connected, but if you don’t then you know they can’t be connected since all nodes that $s$ can connect to it will connect to in a length-3 path.  And since each node necessarily has less than $n$ neighbors, you can branch into at most $n$ possible choices for each step of you path of length 3; this just comes down to $n^3$ possible length-3 paths in total which are entirely enumerable with $\text{log} (n^3)=3 \text{log} (n)$ bits.  Thus we easily solve USTCON in logspace!

You wake with an excited jolt, your mind lingering in a dream while reality slips over it.  Your lips thin with disappointment as you are reminded that reality is not a dream world and you are not always given such idyllic graphs.

But, as your more hopeful dream-persona takes its leave, it stumbles over a synapse and ignites a spark: what if we lived in a dreamy enough world?

Of course not every node is just 3 hops away from every other node in all graphs, but what if you could make them not too far?  We call the maximum distance between any two nodes in a graph, that graph’s diameter.  Our dream was a world full of only graphs of diameter 3.  And, while that extreme is ludicrous, what if we could at least have a decently small diameter.  I mean, we saw how useful having a small diameter was; that type of connectivity allows us to really test connectivity properties a lot more!

So, Omer’s idea is to find a way to make a graph much more connected (in terms of its diameter) so that we can easily decide whether $s$ and $t$ are connected.

In fact, we have a suite of graphs that have very nice connectivity properties: expanders.  Intuitively, expanders have the great property that they have high connectivity properties, but they’re also very “sparse”; that is, they act like a complete graph while having very few edges (which makes things simple).  Best of all, expanders have logarithmic diameter!

If we’re given a constant-degree expander then we can start at node $s$ and look at all logarithmic-length (instead of length-3) paths to see if we hit $t$!  And, since the degree is constant, we only have a constant number of branches to choose from at each of our logarithmic number of steps…a constant to a logarithm will be polynomial.  Thus, again, we can entirely enumerate all paths with just a logarithmic number of bits and thus go through them and test $s-t$ connectivity in log space!

Now we’ve got to scratch our heads a little bit.  This is less unrealistic than our dream…but we’re still not always just handed constant-degree expanders on a platter.  How do we get such an enviable critter when all we have to work with is an arbitrary crude graph?

First off, every graph already is already an expander!  Just maybe not a very good one…

That is, each graph is either really well connected or it has bottlenecks in it or is somewhere in between, and we have a way to measure that: (vertex) expansion.  This expansion measure of a graph $G$, called the Cheeger constant $\phi(G)$, is between 0 and 1 and informally is bigger if things are more well connected and smaller if there are bottlenecks.  And, loosely, we call a graph a $\lambda$-expander if its Cheeger constant is about $\lambda$ (we actually often use the second largest eigenvalue of the graph’s adjacency matrix to define expanders since it’s a good approximation to the Cheeger constant as shown in the Cheeger inequalities).  So every graph will be a $\lambda$-expander for some $\lambda$, it’s just that some may have particularly poor $\lambda\text{'}$s.

## Think Within The Square

The goal is to take our graph (which is a poor expander) and turn it into a better expander.  And the easiest way to do this is to raise the graph to some high power!

If you haven’t played much with spectral graph theory, this may seem like a weird thing to say – raise a graph to a power – but the whole crux of spectral graph theory is relating graphs to matrices/linear algebra.  And we certainly know how to raise a matrix to a power!

So simply, look at a graph’s adjacency matrix, raise that to some high power, and reinterpret that new matrix as the (weighted) adjacency matrix for a graph that we just “operated on” (note that I’m being really glib in all of this, add all your self-loops and whatever you’ve got to do, people).

But, why does this help?

Well, our resulting graph is a lot more connected and has much better expansion.  In fact, raising our graph to the $k^{\text{th}}$ power would result in the graph that has an edge between nodes from the original graph iff there was a path of length $k$ between them in the original graph.  This allows us to connect disparate parts of the original graph so that our new graph is well connected and has high expansion (if we properly normalize the original adjacency matrix, this can also be thought of as taking a random walk on the graph and mixing this Markov chain).  Note that $t$ is still reachable by $s$ iff it was in the original graph.

OK!  So now we’re getting ourselves closer to USTCON with these nicer to handle expander graphs (with small diameter)!

In fact, it can be shown that any (connected, non-bipartite) graph can be turn into a (good) expander by raising it to just a polynomial power.  Which, lucky for us, can be done by squaring the graph a logarithmic number of times!

So it looks like we’re in the clear and everything boils down to squaring a graph.

What’s the problem then?  Let’s just take any graph and square away until we have an expander with small diameter reminiscent of our dream and then just go ahead and brute-force solve USTCON!

Well we need to remember we need constant-degree expanders for our brute-force technique to work.  But if we look at the operation of squaring the graph (say it’s $d$-regular) then we get the new graph that connects a node to each of the original graph 2-hop-away neighbors; this is more connected (the diameter roughly halves) but the degree of each node shot up from $d$ to $d^2$.  Every time we square the graph (each of the logarithmically many times) we square the degree!  That certainly doesn’t leave us with a constant-degree expander and it ruins our approach.

This leaves us at the crux of Omer’s insight.  He gets around this problem by finding a new notion of squaring the graph to increase its expansion properties, yet leave its degree low.

## Ziggy Zagdust

After squaring a graph we now have better expansion but have increased the degree.  We would like a way to keep that oh-so-nice expansion gain but be able to decrease our degree again.  To do this, Omer repurposes his zig-zag product that was introduced in 2000 for other reasons (including winning a Gödel Prize).

This allows us to take our squared graph $G$ that is kind of expand-y, but has large degree, and zig-zag product it with a small (good) expander $H$ so that the resulting graph inherits $G\text{'}$s expansion along with $H\text{'}$s degree.  So we don’t lose much of the expansion gains we got from squaring $G$ but, as long as $H$ is small and small degree, we can keep the degree small.

The main idea is to take our squared graph $G$ and replace each of it’s vertices by a copy of $H$!  The notion is that this explodes each of $G\text{'}$s large degree vertices up into a bunch of vertices so that its high degree is spread amongst all these little vertices so that no vertex has too many edges on it.  And, if $H$ is a good expander, then these replacements don’t injure $G\text{'}$s connectivity by much: the previous connectivity is only stymied by having to step through $H$ a little bit on your path between nodes but, since $H$ is well connected, that’s not much of a problem.  (Technically this is the replacement product and the zig-zag product connects edges of the replacement product a little more cleverly to really preserve $G\text{'}$s expansion: the edges come from taking little zig’s within a copy $H$ with big zag’s between copies of $H$.  See Luca’s treatment for more detail and some nice visuals).

So, if our graph $G$ was $D$-regular (and $D$ is just too damn big for us) and we have a smaller expander $H$ that is $d$-regular (and $d$ is much smaller and nice), then the zig-zag product ends up giving us a larger graph (with $|G||H|$ vertices) which is only $d^2$-regular . (I’ve really glossed over it, but we can in logspace take any graph $G$ and turn it into a regular graph whose connected components are non-bipartite easily, and finding some small constant-degree expander $H$ is also doable in logspace as this was actually one of the original uses of the zig-zag product).

And so we have a way to square the graph in a constant-degree way and thus we can repeat this and end up with a constant-degree expander that preserves connected components and so we can answer USTCON on this logarithmic diameter graph in logspace!

USTCON$\in$L $\blacksquare$

## Well, That Was Random…

OK, so hopefully it seems a little more natural and a little less esoteric to be making a big deal out of something abstruse-sounding like SL=L.  To some, it can be easy to see results like this superficially as just theoreticians doing weird, inbred things; so I hope that putting it in the context of randomness makes it feel more like a natural progression and less, well, random.

But, as far as derandomization goes, it does feel a little different than how many of our other (time-based complexity) derandomization results go.  In talking about BPP=P type results, they’re, first of all, typically conditional whereas Omer’s result used no assumptions and, second of all, they often use a notion of pseudo-random number generators to derandomize.  That is, we often use good enough randomness to make things deterministic, whereas with SL=L we just talked about graphs and spectral and combinatorial properties to derandomize all of SL.  It feels different.

So, before we finish here, I want to give a rough idea of how this result fits into the usual derandomization framework and context.  In fact, Prasad mostly presented from a paper, Derandomized Squaring of Graphs, that recast Omer’s result in terms of derandomization with some simplifications and some insights as to what was really happening in his combinatorial argument.

The key notion is from the idea of how an expander can be used as a pseudo-random generator.  Let’s look at a complete graph for example: if we have a complete graph on $2^r$ nodes, then we can think of each node representing each possible $r$-bit sequence.  To choose a completely random $r$-bit string, we can just take a random walk on the graph and end up at a completely random node which is an $r$-bit string.  But, there’s something very silly about this.  To take that random walk we need to flip some coins and each of the, say, $t$ steps we have to choose one of our current node’s $2^r$ neighbors which needs to be specified with $r$ bits.  So taking the random walk takes $rt$ bits just to get out $r$ random bits!

But!  If we had a $d$-regular expander instead, then it only takes $\text{log} (d)$ bits to specify a neighbor of our current node.  So we’d only have to generate $\text{log}(d)t$ bits to take a random walk and end up with a “random” $r$-bit string.  Yet, since expanders are well connected, it acts like a complete graph so that you still end up at a pretty random node after $t$ steps.  So expanders allows us to take a small truly random seed and blow it up into a larger pseudo-random string.

What does this have to do with Omer’s approach?  Well, remember how we knew USTCON was already in BPL by simply taking a random walk from $s$ and hoping to stumble on $t$?  This interpretation of Omer’s result is saying that Omer just found a fancy way to take pseudo-random walks on the graph and, most importantly, these walks are latently spawned by a pseudorandom generator with only logarithmically sized seeds: since there’s only polynomially many such paths we can enumerate them with logarithmically many bits (these paths are the ones Omer talks about taking on our expander with logarithmic diameter).  And then we can just brute force through all of them in logspace and not leave it up to chance; deterministic!

So, given a $d$-regular graph $G$, we want to sample a neighbor of our current node from $G^2$ (i.e. start our random walk with 2 hops!).  We can do this by sampling a random one of our $d$ neighbors and then do that again for our second step.  But that would take $2\text{log}(d)$ bits of randomness.  We’d like to use less (especially since we’re doing this recursively as we raise the graph to a power and grow our walk exponentially).  How do we use less random bits?  A pseudo-random generator of course!

And what better than an expander!  So on a vertex, when we want to walk to a random neighbor, let’s use some expander $H$ to make that pseudo-random decision for us.  And as we get further into this idea, we might see that this is kind of like replacing the node we’re on with our expander $H$.  Omer’s zig-zag product is just kind of a combinatorial way of taking a pseudo-random walk on our graph being squared where the copies of $H$ are implicitly the pseudo-random generator plopped right into our actual graph’s structure!

In this view we don’t actually use the zig-zag product but instead realize it to be more of a conceptual framework to allow us to make pseudo-random walks and we now actually don’t even have to explicitly take the product and grow the number of nodes.  We can just operate on the same number of nodes but sample from $G^2\text{'}$s edges recursively as we walk (and the exponentiated graph is implicit as we walk).

So, on a high level, that’s it.  Omer found a way to use expanders to take pseudo-random walks on graphs with a logarithmic seed to brute force everything in logspace and be sure everything is hit in that walk.  So, without any assumptions, we have a brute-forceable pseudo-random generator that allows us to derandomize SL.  And thus SL=L falls nicely into our context of derandomization.

While this ended up being long, there’s still lots of details I left out.
But that’s the main skeleton of the proof.

Draw the rest of the damn owl.