“Neural circuits as computational dynamical systems”, by David Sussillo

My friends at Shannon Labs recently organized a reading group focusing on computational models of the neocortex. I don’t have background in neuroscience, but it seemed prudent to jump in and get a chance to engage with people smarter than me about this kind of content. The first paper we read was David Sussillo’s “Neural circuits as computational dynamical systems”. It was a great event. Thanks to Laura at Age1 for hosting!

Here are some clarifications on the paper, based on our group’s discussions:

Equation 1

Equation 1 is confusing. The author claims that this describes an RNN, but it’s not immediately obvious how it does.

Here, bold face upper case varibles are matrices. Bold face lower case variables are vectors. Otherwise, a variable is a scalar.

\( \mathbf{x}(t) \) is the “recurrent” state vector. Each scalar of this vector is the current accumulated potential of a single neuron, whatever that means.

\( \dot{\mathbf{x}}(t) \) is the change in the recurrent state vector

\( \mathbf{u}(t) \) is the input at time t.

\( \mathbf{r}(t) = \sigma(\mathbf{x}(t)) \), where \( \sigma \) is some kind of element-wise “squashing” function. You can think of it as the regular sigmod function in deep learning for our purposes. I believe this is a differentiable approximation to an axon, the output of a neuron, which fires only after a threshold is achieved.

This equation is a stark contrast to the “normal” RNN we are used to seeing, which looks like this:

Here, \( \mathbf{W_{xu}} \) and \( \mathbf{W_{xx}} \) are the input-to-hidden matrices and the hidden-state-to-hidden-state matrices respectively. I adopt this notation to be consistent with chapter 10 in the deep learning book.

Notably, Eq. (1) does not immediately give us a way to compute the \( n\mathbf{x}(t+1) \). It is not stated by the author, but he is basically proposing that this value be computed as follows:

Here, the timestep, \( \Delta t \) is of course 1.

This is interesting because this kind of RNN is very similar to residual networks. It is basically proposing that the next hidden state is a function of the previous hidden state, plus some perturbation. It would be cool to see how this kind of neural network does on practical problems.

But why is the equation for \( \dot{\mathbf{x}}(t) \) good? In machine learning terms, how does it encode a useful inductive bias? To gain some intuition for this differential equation, suppose it was simply as follows:

We can solve this directly to get:

, where C is some constant.

This means that the current potential of each neuron would simply decay, proportional to \( \tau \), until it reaches 0. This, is rather boring. However, we’d like neurtons to be connected to each other. Therefore, you can view \( \mathbf{J} \) in \( \mathbf{J} \mathbf{r}(t) \) as a weighted adjancy matrix, describing how each the axon of each neuron connects to other neurons. When incoming axons to a neuron fires, that neuron’s potential goes up. Were we to use \( \mathbf{J} \mathbf{x}(t) \) instead in the equation, we’d have a totally linear system, which is not very interesting. The \( \mathbf{B} \mathbf{u}(t) \) is there because the system would again be uninteresting if it could not take in inputs.

Equation 2

The author states it is common to use \( \mathbf{z}(t) = \mathbf{W} \mathbf{r}(t) \) as the output of an RNN, without a justification. The justification is this: Suppose that you used \( \mathbf{r}(t) \) directly as your output. Then you’d force your output vector to be the same size as your hidden state vector. But what if you task requires only a single output? Then your hidden state vector would be of size 1, which is a rather weak model.

Other thoughts

This is a review paper, but I dislike that both the experimental methods and the analysis methods used in the paper are inadequately described. It seems to me that the techniques proposedare quite useful, but it would be great to know the techniques for . Also, PC1 (and so on) means “Principal Component 1”

Acknowledgements

I understood quite little of this paper the first time I read it, but this writeup benefited greatly from discussion with Daniel Fernandes, Katarina Slama, Laura Deming, Parnian Barekatain, and a Michael whose last name I never got.