09:05:24 And you want to open the paper recently. No, Israel is going to a serious turmoil, so let me apologize on behalf of all of your Israeli collaborators responding to things because they're demonstrating and protesting and so it's wonderful to be away from this
09:05:43 phobic and deal with your sign. So thank.
09:05:51 And then the other thing is that there's some kind of a was kind of a format for things, for organizing the talks and being very coming from a place where we were very respectable rules.
09:06:07 We did things not according to this schedule. So I want to try and give perspective that will come from different angles.
09:06:16 I will speak for 45 min or so about notions of population codes.
09:06:21 Jasper will give a twist on how they're wonderful and amazing.
09:06:25 Then on a break. Then India will give yet another version of why you should all be in that, and then I'll try to wrap up with some other questions in general, things that we can.
09:06:35 Talk about over the cheese that we don't steal from other people.
09:06:40 In the afternoon cool. So the of the day is to think about actual neurons.
09:06:49 And this is a movie I stole from Takasika.
09:06:52 Osima, the whole for me, and the. And this is kind of cool imaging of all the neurons in the brain of a zebrafish and the colorful.
09:07:07 Flickering. Things are individual neurons. Accident when the picture lets you making certain decisions.
09:07:13 And of course evolving these things in much less colorful version, which is kind of this is the common way.
09:07:20 We look and think about nature, of activity, of large populations.
09:07:23 And here there's about a 100 neurons.
09:07:27 Every line is the activity of one individual neurons. You put a dot whenever there was an action potential, there's about 2 min of response here.
09:07:36 Neurons in the cortex of the monkey, making visual discrimination decisions.
09:07:41 And what we would like to do is to sleep with all the lights, and what we would like to do is to read this and and we we need this.
09:07:52 But until we get better, since we don't have this, we need to work.
09:08:00 And so this is hard from many different reasons, one, of course, is that neurons are sparse and very selective.
09:08:08 So it's even hard to of make yourself happy about sampling the space of things it might care about in a sensible way.
09:08:14 The other is that everything is very noisy or probabilistic.
09:08:18 So you need to think in probabilistic terms of how things work and the one that now bothers us the most is the fact that almost everything interesting is the result of the activity of many neurons doing things together.
09:08:31 And so if you just think of a short time window, let's say I don't know 10 or 20 ms, and you just ask whether each of the 100 neurons here was spiking or not.
09:08:41 You can think of this as a binary word that represents the activity and that's what we want to try and aim at not making some assumptions about rates and so forth, but really at the detailed structure of the activity and of course, if you want to do it this way the pain that comes with
09:08:56 it is that the number of things that you might need to consider, hey, Matthew is painfully big, and so the fact that it's painful big means that it's all about how good of an experiment anyone can run because you're never going to sample that space even for 100
09:09:12 neurons, let alone larger populations. And we need to find some principles that hopefully will make us help us make sense of what's really going on here and how to.
09:09:22 And so I want to talk, and the rest of the day will be about 3 notions of what you might do with this one is well.
09:09:30 Can we find roles that really govern what we might think of as the vocaminary of this thing written?
09:09:38 What are the rules of how things are written in this language?
09:09:40 The other is. Can we actually learn the code in the sense that we'd like to also not only figure out what are the patterns, but something about what they actually mean?
09:09:49 And then this might get us to think about. Maybe some point we can even draw and write.
09:09:58 And then the other notion is, no, we try to make sense of how this things are built and how they convey information.
09:10:05 The rain uses this, so maybe we can even figure out what the brain is doing with this, and how it might.
09:10:13 And so the starting point from many of these is notions of correlations, and, as you probably all know, typically almost anywhere, you look at in almost any system, the typical correlations between neurons doesn't seem like anything to write home about the delivery of pairs this is a summary from the
09:10:33 retina, and Michael Berry's lab a million years ago, but this is kind of the typical histogram of correlation coefficients between pairs of neurons.
09:10:42 This is Noise or Singapore, so I hope. Well, I heard that wonderful.
09:10:48 So this is signal correlation at this point. This is just asking about the joint activity of the neurons over, yeah.
09:10:54 And if you care about the explorations it kind of has a similar flavor and a kind of nice way to put these things together is a notion of actually talking about the information that pairs of neurons carry about the outside world so this is summary of lots of experiments.
09:11:11 And the retina. But you can see this in many other places.
09:11:15 So every.in this plot is one pair of neurons, the color of the dot tells you which kind of movie the retina was presented with in that particular segment.
09:11:26 And so the value of the so this is the physical distance between the cells on the rightin up.
09:11:33 And this is the measure of how redundant the code is, namely, if the neurons are carrying the exact same information, this would be one complete synergy where there is something being carried by the joint activity that you cannot read from each of the individual neuron would be minus one down there somewhere and 0 means
09:11:50 that they're kind of indicative in what they carry, and though you can find neurons that show high values of redundancy again, half of the information that one carries is carried by the other, most of the neurons are very close to 0.
09:12:06 So again, the typical view is one that neur neurons are very kind of independent.
09:12:14 And this notion goes very well with one of the all the dominant ideas in neuroscience which goes back to borrow.
09:12:25 Yes, I hope you are not going to conclude from the previous slide that everywhere.
09:12:33 In what sense? Saying that you concluded that most of the neurons are independent.
09:12:38 I'm saying that the correlations are weak.
09:12:43 This was the written, and I'll show you this in many other places.
09:12:46 This is the canonical answer. In many different systems, that if you look at the distribution of pairwise correlations, or pairwise redundancies or noise, horrific, the typical answer is that things are on the order of a few percent not not much over many many many different things and i'll show you more
09:13:05 examples happiness ensues. Yes, so the so the single correlation field in general he should find on the actual side of stimuli, or whatever covariates that are relevant for the cells.
09:13:18 So that would surprising that so going that would surprise me if that was German.
09:13:26 True, because it seems to me that I can, loosing light, that we just make if the neuron is super correlated, if we are to a similar feature, because then there's no correlation.
09:13:34 So of course, sorry. The redundancy, as I understand, takes into account both the signal and the noise probabilities, and executes that based on essentially those 2 things.
09:13:45 Yes, so that I can imagine that by that, for especially for natural stimuli, that's going to be the case, or is your claim about signal correlation specific to natural stimulus?
09:13:56 So I didn't put up the slides, which I could maybe dig up later.
09:14:02 But the fact that you can find or come with still light that will result in strong correlations for particular groups of neurons.
09:14:09 That's, of course, true, but if you look at the response in natural settings, in many different systems, and you look not at your favorite pair, but the distributions over hundreds of neurons that you record the typical correlations.
09:14:21 And this is true for signal for noise, for redundancy are not 70% overlap.
09:14:27 You were talking about things which are would be typically the average of the distribution will be on the order of a few.
09:14:35 Then the joint information doesn't seem to be increasing, at least with that type of analysis.
09:14:43 Right? Because, you know, if you have 2% correlations, you know, once you get a population, then you know, basically you don't have.
09:14:55 Yeah, a lot of independence in the population, which I think there's plause of that argument in terms of information processing.
09:15:06 But it is something. That would be curious how to address. Okay.
09:15:11 So maybe I should. Okay. So this notion that the typical correlations are not very strong.
09:15:17 Goes very well with no Ballo's old idea that said, Well, if you have a limited resource in terms of the number of neurons on the badme that you want to try and encode, then to maximize the capacity of a population, you would want neurons to be kind of independent from one
09:15:33 another to give the rest of the brain more information about the world, and we have a lot of Blenecks where there are one population projector, it's much smaller one.
09:15:42 So it has to kind of try and compress things in some way.
09:15:45 And this notion of redundancy reduction also went very well with trying to make predictions about the single normal properties in different places, and nobody objected to this notion of optimality, because, of course, everybody is happy with this notion that their brain is great bono yes.
09:16:07 Said, a large fraction of the neurons.
09:16:14 Is it the case that large number responds to most of the stimuli that we present?
09:16:20 If we do find neurons, that the very small crash norms are expected.
09:16:31 And hence maybe I should. Okay. So Bono wrote a very nice paper 14 years later, in which he called Redundancy Reduction Revisited, where he shot his own argument, and he said, this notion of decoration or independence is nice from a theoretical standpoint but there's 2 things that would make
09:16:54 this? A very bad idea for the brain one is that if you need to overcome noise in one redundancy between neurons, otherwise you'll be lost, and then the other is that learning from examples crucially dependent on having redundancies among them, and so this bothered us a while back and we
09:17:12 thought about this notion of looking at this weak correlations.
09:17:18 And so this is again in the retina. But I'll show you other cases in a second.
09:17:20 That if we so this is now we zoom in on the activity of a group of cells, and we chop up time into small bins, and we turn this into binary patterns for 10 neurons.
09:17:30 Here, and if you and even despite the fact that the typical correlations between pairs is very weak in this case, even at the level of 10 neurons, it turns out that, ignoring these weak correlations would be give you a very bad view of what's going on because things are really strongly
09:17:49 correlated. So these are now individual activity patterns over these 10 neurons.
09:17:53 Every.is one parent. This is how often you see it in the experiment.
09:17:58 This is the prediction of a model that ignores this 3% correlation between neurons.
09:18:02 And on average, and the black line is equality. So this is what would be if the model was a good predictor of what's going on.
09:18:09 And you see that this makes orders of relative mistakes.
09:18:12 And so something, despite the fact that these typical correlations are the level of birds, are not that strong at the level of the group things are very strong, correlated.
09:18:21 If the corrections were even stronger, on the level of participants, this would be a even stronger structure than the level of population.
09:18:29 And so this bothered us in terms of how? Where does this come from?
09:18:34 Stuff, from, you know, quite some time ago now, he said.
09:18:38 Can we actually build the picture from all these seemingly weak, pairwise things, and ask, what is the collective effect of all these prayer?
09:18:45 Wise correlations, and then ask about maybe higher order stuff that might be generating these strong correlations.
09:18:50 And to do that we went back to an old idea from statistical physics and information theory, which is to say what we're trying to describe is a probability distribution over the individual parent.
09:19:00 So how often individual words in this language are being used! We know something about the firing rates of the neurons, and we know something about the correlation between them.
09:19:11 Let's build the minimal model, which is consistent with these properties.
09:19:15 And to do that. To do this in the most parsimonious way, because entropy measures how much you don't know about the distribution of how flat it is, what you can do, and this goes back to James is to say, let's find a probability distribution that has the correct firing rate the correct correlations but
09:19:33 is as flat as possible otherwise. And so this amounts to solving this somewhat scary looking optimization problem where you want to maximize the entropy, you want to keep the correct farming rates for each of the neurons as we designated the XI so this will be lagrange multipliers for each
09:19:48 of these, and for each of the pairs and this is just to make things normalized.
09:19:53 And when you try to solve this, then it's kind of nifty that this actually has a unique answer.
09:19:58 This is a convex problem. So you know, there's one thing that you can get to.
09:20:03 You cannot do this analytically, you need to do this numerically, but if you find the right set of parameters, this alpha, I beta.
09:20:10 Ij, which are the Lagrangian, multiplies here.
09:20:14 This is the minimum model that is consisting with the firing rates and the correlations of the nodes and doesn't make any other assumptions about structure.
09:20:20 And even for the case of these 10 neurons, this works surprisingly or nicely well, in terms of predicting what's going on.
09:20:28 So this is the same 10 neurons from before the blue dots are the bottle that ignores these weak correlations between the pairs and this is what happens at the level of the population.
09:20:40 Sorry for this pairwise model, where every dot here, now again, is one of these patterns, and if you're worried about the scatter that light here well, there's some noise in how well you estimate the following right here, the activity right here, so this is as good as it can probably get yes, how does
09:20:55 this depend on the size of the network, like 10 grand versus 100 versus 1,000, so good that you come.
09:21:04 So we've seen this in many, many different systems, and in particular, here's one for the discussion about the other places than the retina.
09:21:15 So this is recordings from the prefrontal cortex of monkeys in Roosebachianis lab at Nyu were doing the phrase classification of moving dot stimuli with delay and then need to saccade to the right direction.
09:21:28 So they fixate, and the things show up and the dots move, and then there's delay, and they need to saccade to the right place, and we can look at the nature of activity of tens of neurons recorded with data arrays in these monkeys and we again, we see that the
09:21:42 typical coalescent between neurons. Here at level of pairs is very weak.
09:21:47 Are then we try to build the map of the population code of tens of neurons.
09:21:53 And this data rather than looking at firing rate and Pca projections, we actually look again at the nature of these fine detailed activity tendence of the level of tens and later of more than a hundred neurons.
09:22:07 And we can even look at the nature of the sequence of these words that we see.
09:22:11 And again, what we find that when we build models that actually take into account just pairwise relations between the neurons, but connects all of them with this muchximum entropy, formalism, we end up with a model that has a term for every neuron one for the pairs that you know at the
09:22:31 same time, and pairs of neurons at different times and this shows, you know, huge improvement over what you would get from ignoring these recorrelations.
09:22:42 So this is likelihood ratios of groups of 50 neurons over many.
09:22:44 So every dot here with one group of 15 neurons. This is the scatter of many such groups in different parts of the task.
09:22:51 So for each of these different segments there's a super strong effect of these seemingly weak, pairwise relations between the nodes that add up to give orders of money to different in terms of the nature of the correlation of the population, so they're extremely far from being independent it's
09:23:08 just that the typical correlations of the pairs is weak, but things add up to have a massive effect on long story.
09:23:14 And so we thought, okay, so this is for 10 or a few tens of nodes.
09:23:21 Let's go to 100 neurons. You start seeing things getting a little bit out of tune.
09:23:25 Now again the retina I'll flip between different systems along the way, but this is now a long experiment recording from the retina response to a natural movie.
09:23:36 Every dot here now is a binary planet over a hundred neurons.
09:23:40 Again. This is how often you see it in the experiment. This is the prediction of a model that ignores the correlations.
09:23:47 These are the gray dots here, which suck, and this is the pairwise model which is doing a much better job.
09:23:51 But we were very worried about the fact that this particular points, which is the zeroeros panel with one neuron spike, are actually off the line here.
09:24:01 And if you say, Well, this is a small difference. Well, this is the most common thing in the experiment, which happens about 7% of the time.
09:24:10 So this mismatch is real, and means that there's something that the pairwise structure does not capture.
09:24:14 Yes. Do you have a sense for what happens when you just redo the same thing but in like 2 different states.
09:24:22 So you just measure Pm. For 2 streets of the data like how weiable is it?
09:24:28 If it wasn't then I wouldn't. So this is the.
09:24:38 And so if things become even worse when you start thinking about temporal patterns of graphs. So this is 10 neurons over 10 time steps so this is the overall.
09:24:47 Of the 200 ms of activity, and this is now each of these dots is this spatial temporal pattern over the populations.
09:24:57 And so now the pairwise models are starting to get out of tumor, and something is missing.
09:25:03 The correlations have to be somewhat stronger, because there's some other things that we're missing in terms of the structure.
09:25:08 Okay. The time being, both in this and the previous life. Just wondering how you're getting that counts with the measurements.
09:25:17 Sorry. How wide were the time being. So this is 20 ms bins, and in the previous, I think this was 10 but it's very similar.
09:25:27 So the fact that one gets enough counts at all of duplicate, exact duplicate firing of 100 neurons.
09:25:35 Yes, it seems fairly incredible. This is the was an extremely long experiment.
09:25:40 Yes. So then the experiment takes into account gradually, dying in preparation.
09:25:45 Yeah, so we, I'll get there. So, but also because we were worried about the fact that you can do things for too long, and whether but in any case it seems like there's something missing from the pairwise point of view, we thought Ok, so there's higher order. Stuff.
09:26:05 Going on because these pairwise models are the minimal thing you can do with all the the right or the minimal way to put all the pairs together.
09:26:14 So maybe we need just higher order terms. Now, of course, this would amount to things that look like this.
09:26:19 It would be terms for an individual neurons, pairwise triplets for group rates, and so on.
09:26:27 The problem, of course, is that the number of these things grows up painfully fast.
09:26:32 This becomes practically impossible in this system. So so this cannot really work and it's the notion, what can we do?
09:26:43 And how can one even make sense of what's going on?
09:26:46 And so, looking at the data, we notice that if you reorder the patterns in a way, that's kind of common in linguistics, you get something that's very similar to zipf and in terms of the usage of particular patterns, so again, every one of these dots is a word, in
09:27:02 this set of 100 neurons activity. This is racking them according to their frequency, and this is how often you saw them in the experiments and so well, this is the whole 0 spot patterns with one neuron spiking out of 100 these things decay very fast and you and it becomes very clear that while this is going to be
09:27:27 hard to work with. Maybe there's something you can do.
09:27:30 So they thought, Can we come up with a model that will rely on pairs of triplets and quadruplets, but without doing all of them.
09:27:38 Can we come up with some way to deal with this? And we notice that if you think about this model, and if you think about a 0 one representation of the activity of the X's here, then if you think about the probability of all the neurons being silent, which is the most common pattern that we
09:27:53 saw in these requirements. Then all of these things become 0, and this implicitly get this normalization term over here.
09:28:00 If you think about the activity of one neuron active and everyone being solid, then you end up with this.
09:28:06 And so if we catchy had a good estimate for these patterns, we could get and basically solve this simple equation and get these features just from the frequency of these patterns.
09:28:19 So if we have a reliable estimate, which is exactly what General Tenn.
09:28:24 Was read about. Then we cannot do this for everything, but there's some patterns that are so frequent that we can do this, which are the ones that appear, though hundreds or a few thousands of times.
09:28:32 And so when we do that, just for the very common pattern that we call the reliable pattern, we end up with an order N of terms that we actually end up with not all the pairs.
09:28:42 Many of the pairs, some triplets, a handful of quadruplets.
09:28:45 And this thing works like magic. So this is now the data from before every.is one activity pattern.
09:28:52 I'm plotting this a little bit differently to reflect the performance.
09:28:55 Better. So X-axis does the frequency of patterns in the experiment.
09:28:59 This is our arrow bars and patterns that had that frequency, and I'm plotting the log.
09:29:05 Likelihood ratio between the model and the data for 3 different models.
09:29:09 So, if the Mall is doing a good job, everything should be on 0 here the blue funnel, in all of these cases is 95% confidence interval because of something.
09:29:21 So basically everything that falls within this blue funnel exposure.
09:29:24 This is the independent model that we know is sucked, but just pre reference.
09:29:27 This shows you the pairwise model is missing already very common patterns, and this is what we get from this so-called reliable interaction model.
09:29:35 And if we do that for the temporal patterns, almost this is doing remarkably well, and because of the issue of stability and the amounts of data, this is all cross-validated on data that we did not train on so this is another part of the movie, of the same flavor.
09:29:50 But not the exact same thing, and so this means that the model is predicting things as well as doing the experiment.
09:29:57 Again at the level of individual parents. 400 miles.
09:30:07 Yes, thank you. This is us. True. Question. But have you tried this on natural language just to see what happens?
09:30:14 Do you get what your model here produced? Sort of readable, sensible word or sentences?
09:30:21 Or is it still at the level where maybe you recognize some correlations between letters?
09:30:27 Yeah, often, but not really so. We don't do sentence as well.
09:30:31 We do words really well for the coffee. And so we thought, okay, we went from 10 to 50 to 100.
09:30:41 What can we do with more than 100? And what we found is that every level we looked at the code looked something different.
09:30:48 Kind of organizing the structure of the code, and and in particular, it's not very clear how to expand these models, to go beyond 100 neurons.
09:30:57 Because if you just think about all the pairwise things, if you think about a thousand neurons, this is half a million corrections. You need to worry about.
09:31:08 And sampling issues, and so forth. This notion of looking with the reliable pattern goes out the window for that thing as well, because, as Jonathan quickly noted, when you go from really large populations, nothing will ever repeat this sounds like a super Zen riddle you don't cross the same
09:31:25 river, twice your brain never sees 1,000 neurons doing big, exact same things.
09:31:30 At the level of these detailed talents. So what can we do?
09:31:35 And so we thought about this for a while, and we figured well, maybe we should copy from someone who has to deal with this all the time, which is, namely, the brain.
09:31:43 So here's the soup I'm going to mix the different elements.
09:31:48 What is what we've learned from a model. So we know that there seem to be some sparse set of things, of dependencies.
09:31:54 Pairwise, maybe a little bit of triplets that actually seemed to do a very good job of governing.
09:32:00 What large populations, or this guy already do? There's some hierarchical and gang of these things that I didn't get into, but that's another hint from biology.
09:32:07 If we think about what the brain does in dealing with things, we know that connectivity is not really random, but that's kind of a very good starting approximation of the fact that how are things related to one another, and things are really sparse.
09:32:23 There's a lot of linear and nonlinear cascades and so that's the hints from biology and from machine learning and statistics.
09:32:32 We know that if you have something that lives in high dimensional space, the activity of large groups of neurons, and there is innate structure that means a retity of a much lower dimension than the actual size of the space you live in then taking random projections of these things.
09:32:49 Doesn't really good job of trying to understand what's going on in kind of the spirit of city reconstruction of objects.
09:32:56 So in that literature is kind of the birth of the giant field of compressed sensing, and so forth.
09:33:03 So we want to mix all these notions together, and we're going to use maximum entropy as a way to do this, because it's a useful framework to ask about minimum models that are consistent with certain things.
09:33:12 Let me see how we're doing on that. So here's the idea we're going to use random nonlinear functions.
09:33:20 So I'm going to take my neurons that I want to try and describe I'm going to pick a small set of them.
09:33:25 So it's going to be a sparse set and I'm going to give each of them a random number which I'll designate with this 50 shades of green here and then for this random group of neurons.
09:33:37 I'm going to multiply the activity on the silence of each neuron with this one of these weights that's going to be the color shade of green here, and with some threshold and feed this to a nonlinear function that we tried many different things.
09:33:54 But let's focus now on simple perceptrons and I'm going to use that as this h mu!
09:33:57 And I'll have several of these in a minute. Of course, is going to be these random set of functions that I'm going to compute over the population so I'm gonna do a lot of these.
09:34:07 And I want to ask, try and build a model that is consistent with what I've measured in terms of the average value of each of these random projections.
09:34:16 Okay, so this amounts to again solving a maximum entropy model which has the minimal entropy over the distribution.
09:34:25 And I need to satisfy the observed value of each of these projections.
09:34:29 So, instead of going with first order, second order, third order with my fingers as I ran out of fing fingers, I'm going to use random things as my random statistics to try and build a model so if you want to think of this in some graphical notions that's kind of the best way, we came up
09:34:46 with. So if you think about the population, it's as if you kind of picked random clicks and said, I'm going to compute some local thing about this group of them.
09:34:55 Things, and try to be consistent with all these random small things that I ask about what's going on.
09:35:01 Ok.
09:35:04 So we tried this. I'm going to show you this.
09:35:07 We're talking many different things, but I'll show you for 2 different sets recording is Antonio Morgan's lab at Nyu from drifting, grading, and recording from V one and V.
09:35:19 2, and the Pfc neurons from before in 2 different tasks in Rooseba County's lab.
09:35:25 And this thing works really well. So here's now the comparisons I'm showing things for now for 70 neurons, because here I have enough repeats that I can show error bars to make just not unhappy.
09:35:39 So so now this is v, one, v. 2. This is the prefrontal cortex. This is the independent model.
09:35:44 Again every.is a pattern in this data set over validated things that we don't train on.
09:35:50 How often you see the experiment, the agreement of the model, and the data in log-like, you terms.
09:35:55 So again, everything in the funnel is kosher. This is the independent model that sucks this is the pairwise model that's not doing a very good job for v. One v.
09:36:03 2 and a little bit better for the Pfc. And with these rather protection models we're doing as well as doing the experimenting.
09:36:14 This the front of practice, in a reram and resistive spatial really?
09:36:21 Rapidly. I do this. Yeah, the big, exactly. No, the picture there was to try and give you a fitting.
09:36:26 But I should have had certain general warning of the so so this is another way of asking how well it does.
09:36:36 I can look at the I'll just skip that and show you the kind of if you try to predict per-wise, Horace, and tripled corresponding quadruple correlations this is a really good job of describing what's going on.
09:36:46 What's really nice about this? Which is what I'm telling you about this at all is that, of course, if I had a gazillion of these projections fine, I can do everything with a lot of statistics, it turns out that you don't need a lot of them at all so this is
09:36:59 the performance of the model runs the function of the number of projections that we take.
09:37:03 And this is now for 170 neurons from the prefrontal cortex.
09:37:06 Here the thing we have to try and measure performance this right now is going to be the likelihood of the model.
09:37:14 So this is what you get from the independent model. This is what you get from the pairwise model that has.
09:37:19 172, pairs of things to worry about.
09:37:24 So this took a while to compute. This is a variant called care, pairwise.
09:37:27 That Gashper came up with. A while back that has some average high order, terms added, which I'm not going to get into.
09:37:34 This is what we get from these random projection models as a function of the number of projections for reference.
09:37:40 This is the number of pairwise terms here we do far better, with much fewer projections, and the error bars, and taking different sets of projections, is actually smaller than the size of these marks.
09:37:51 Here. So this is how accurate this is. So we can get super accurate models for 170 neurons in this case a level of individual patterns.
09:38:01 So this was to remind me that this is the variance of a different choices.
09:38:05 What's nice about this is that if the Mao is not doing as well as you'd like, you can always go and highly express and order more random protections, because the pairs in triplets you don't know which one to use, but here since I picked them like this I can just keep on going
09:38:21 and so this is performance as a function of the number of projections that we use.
09:38:26 You see that you basically, you can keep on going and improve things that we ran out of time, too.
09:38:35 There's one really nice feature of all of this cycling just saying, Ok, so we have a good model to describe the genericivity of large populations of neurons which many of you must have seen already, and that's the fact that the smell has an immediate implementation in terms of
09:38:49 a neural architecture that can actually do all of this.
09:38:52 So a psychon just saying, I have a statistical monitor to describe the activity.
09:38:57 Here's a simple circuit that can actually implement this very model.
09:39:01 So if I think about this notion of this random projections that I need build for this population of activity, that they are in this delta G way. And this is what I want to do, then I can think of trying to describe the activity over the population in the following way.
09:39:17 This is the neurons I wanted to describe. X, one to Xn.
09:39:19 And this is the binary activity pattern each of the projections I can think of as a you know, as a nonlinear perceptron, computing things on random weights.
09:39:29 It gets to this from this particular input that you want to compute.
09:39:32 And then what I need to do is to learn these sets of weights here.
09:39:36 These are the factors that I learned for the maximum entropy here.
09:39:40 This amounts to learning the values of these synaptic connections.
09:39:42 Here, so I have a model where this layer is randomly picked and fixed, and I don't touch it anymore.
09:39:49 All I need to do is learn one set of weights in one layer, so there's no more that propagation of any flavor here.
09:39:58 This is just one step back, and then I get something which computes basically the likelihood.
09:40:02 So it's a circuit that computes the likelihood of its own thing up to a normalization factor.
09:40:09 You can, of course, try and think of how this might be implemented, not even just as a simple set of neurons but because a degrittic architecture of some sort.
09:40:18 But basically, what I think is particularly nice is that you can ask us of, why would the circuit want to compute the likelihood of its own input and the answer that comes from the Bayesian religion, of course, is that this is the way you want it to try and do stuff in the
09:40:34 ultraviolet world. So if you had 2 models, one was computing the likelihood of the input being a dog, and this thing being a cat, all you need to do is to learn the likelihood of these 2 things.
09:40:45 And then you can do stuff with it. What's particularly nice about this architecture is that, of course, because these are are random functions, you can actually even try and do this with the same set.
09:40:58 So you don't even need separate architectures to do many of these things.
09:41:02 And so this is circuits that seem to be naturally computing the surprise, as at some times called, or the likelihood of their own inputs.
09:41:12 And of course, as I said, this is good. Any patient stuff that you want point of view, how much time do I have for, say, okay on yourself imposed time given that you are the sovereign you can sovereign yeah.
09:41:26 So take 10 h, but we might not stay at the expense of risking a revolution.
09:41:34 So don't tempt me. Okay, so I'll finish in 10 min.
09:41:40 So here's the no. If you want to think about this biological, I want to try and kind of put down some kind of duality here between thinking about a model that describes the data really well, and now, a potential neuros circuit that might actually do stuff with it.
09:41:59 Bless you! Then, I can now try and ask. You know which kind of projections can I use?
09:42:06 We tried many different variants and things. They all worked pretty well for September.
09:42:10 We worked surprisingly well for reasons that maybe we can talk about later but what's really nice about them?
09:42:17 We asked. You know, we tried things with different levels of how many of the things going each of these random projections?
09:42:23 It turns out that sparseness has a sweet spot so this is now plotting for different population sizes of the models.
09:42:31 How well we do as a function of the number of neurons that participate in each of these randomly chosen and then fixed projections.
09:42:42 So this goes from 0 to 15 members in each of these projections for the V one v.
09:42:46 2 data, we end up something between 4 and 7, maybe a little bit less for the Pfc thing.
09:42:53 But this is a very similar number, and for those of you who know ashore Crick and Kumar's work on trying to compute and also look at data in terms of optimal sets of random functions in thinking about connectivity, both in the cerebellum and in the olfactory
09:43:11 system of flies they come up with numbers that are between 3 and 7, as well.
09:43:16 So it's kind of nice. They all show up in kind of the same language.
09:43:20 One more thing is that particularly nice? Is that if you can ask yourself, so, how much data do I need to train random projections? Mess thing?
09:43:33 And the answer is not a lot. So this is how well we do as a function of the number of samples that we use.
09:43:40 This is the number of samples in log units. This is likelihood. Values.
09:43:44 This is independent models. The green and blue are 2 variants.
09:43:49 Again, of the pairwise female of things. So you need you do really well, but you need thousands and thousands of samples to do well for this is 17 year olds.
09:43:58 This is how well we do with this random projections, and this is for a version that had 1,000 projections and one with 10,000 projections, basically with a few 100 samples.
09:44:08 That's enough to get you much better than anything you would do with an independent model and getting close to pairwise models, with orders of value to the data samples.
09:44:18 So you can actually try and make sense of things that happen for particular stimuli, or even repeats with this kind of way of doing things.
09:44:28 We also played with. If we want to take the biological circuit metaphor seriously, can we come up with a learning rule that would actually help real networks to learn to do this?
09:44:41 So, rather than again, we're not back propagating anything, but we still need to figure out a way to to learn these weights.
09:44:48 And so we stole an idea from Mike Dewey's lab, something called the Minimum Probability Flow, in which we imagine that if you wanted to try and compute the likelihood of the particular pattern, let's you then, if we can think of us kind, of what if there was a similar pattern, or what we kind, of think of
09:45:12 as kind of what if there was kind of a similar pattern, or what we kind of think of as an echo, which is a small variant of this pattern to just follow?
09:45:18 And then we can use build biologically plausible way to learn these weights really accurately.
09:45:23 And this is the performance of the model that is based on just learning.
09:45:27 With this biological learning rule which converges to what we get from the model that we just did with the computation over all things.
09:45:33 And these are just samples of saying how well they do so we actually converge to the correct way.
09:45:38 With this biological problem. Maybe one couple of more things before I stop this segment.
09:45:49 So we kind of wondered about this notion again, if this is a real biological circuit, and we think about the way we connected things since we pick things, this may kind of randomly there's probably a better way to do this.
09:46:02 So, I'll maybe say something in the at the end about other schemes of trying to pick.
09:46:08 Not so random or improve random predictions. But here's a very naive thing that we just thought.
09:46:14 Now we know that there's a lot of pruning and replacement of synaptic connectivity in many systems.
09:46:21 So maybe we can use something like that. So what we did is to take a model, and at every other we learned the model.
09:46:27 We throw the least 5% of 10% projections based on the weight that the model gave them and replace them with new random projections.
09:46:36 So I basically bit like a portfolio selection over investors or something.
09:46:42 And so I randomly pick a new random projection, and I just end up with throwing the ones that don't seem to be giving much, because they were again completely random again.
09:46:54 This works really well. So this is the performance of the model.
09:46:55 Without pruning as a function of the number of projections.
09:46:58 I'm sorry this, the hard to see what there's noa no numbers here but this hundreds of projections in terms of how well they do with this pruning scheme.
09:47:09 You can actually do again, much, much better with fewer and fewer projections.
09:47:14 So again, the models can be super compact in terms of the number of projections.
09:47:18 The amount of data that we need and they're as accurate as running the experiment. Again.
09:47:24 And then it gives kind of nice features of these random random projection neurons that they become even sparse in their activity and decorrelated, which we can talk about later.
09:47:36 So, since we're not very deep people, we ended up with a shallow network, a model of the population that's doing a remarkably good job of describing the joint activity based on a lord.
09:47:47 A set of nonlinear, sparse things. If you think about the nature of this thing based on again this random connectivity we don't need a lot of data.
09:47:58 The things are sparse just without us building things into it.
09:48:03 Things are very scalable here. So it's relatively easy to try and add things up.
09:48:08 This pruning thing's, which is again biologically inspired.
09:48:12 We really well. And we have this realistic learning rule that, based on some notion of as synaptic noise which I didn't get much into.
09:48:23 But we can talk about later. And so this notion of how you can deal with really large populations trying to build this thing seems to have a way to try and address some of these things.
09:48:36 And and sorry. This was the last point that I wanted to make for this, which is again, if you think about the I made this before.
09:48:46 But I want to re emphasize that if I wanted to compute things I don't even need these different sets of predictions, because they are random.
09:48:52 The 2 circuits can share the same set of projections, and do their computation. So it's even compact with that.
09:48:59 So to summarize this part, there seems to be, even among looking at many tens or hundreds, and now many hundreds of neurons working together.
09:49:11 There's some low, dimensional structure that really governs what's going on.
09:49:15 Pick it up no, with no pairwise triplet things for small ones with random, sparse, nonlinear things for higher things.
09:49:24 And we can do this really well, and it's as good as doing these experiment again, this is the level of accuracy you get to this now gives a new family of models that it's on one hand, is kind of seems to be scalable.
09:49:38 And can we can try and deal with this looking even larger populations, and they don't eat a lot of data.
09:49:44 And they're efficient in terms of their size. And we have a biological circuit that may come with it, which is kind of not only a notion of how can learn these probabilities and do things with them, but maybe this is something about how the brain can learn its own.
09:50:01 Statistical models on the inputs that the neurons inside their brain are doing.
09:50:06 So I'll stop here for some questions before I give cash for the next 45 min.
09:50:21 If you look back at the yours stimulus tuning of the things you learn here are they interpretable like, you're just estimating, based on random predictions, end up estimate surprise go back and look like how does that neurot respond a different stimuli.
09:50:38 Yeah, I'll talk about this in the other half. So you may be very happy.
09:50:45 But I wanted to make this dimension that Vss. A couple weeks ago there was a Google, Carnegie Mellon that showed that if you take random projections, basically build a deep network and use random projections rather than train it, you do just as good a job account for the properties of intrinal
09:50:59 cortex if you change so, this should make me happy here, so I'll send you the send an architecture to everybody.
09:51:12 We'll hang an analogy out of paper, so it's not true. Sure.
09:51:20 I think there is something here which I'll say as a kind of a general point in this property.
09:51:30 Everybody will still be awake around. I don't know 1230, or whatever we get to the other side of this.
09:51:37 There is an issue here of of trying to describe this life populations, of what are the things that governed, and what hope do you have to try, and what we cannot?
09:51:49 How many think of as understanding what's going on? So if I give you all the pairwise features that govern a 1,000 neurons, there's a half a million of them.
09:51:58 Would you be happy in terms of you've understood? I give you this half 1 million numbers.
09:52:04 It explains everything really remarkably well. How happy would you be!
09:52:09 I don't know so there's some issue of when do you?
09:52:17 What do you really regard as understanding here the fact that we can do remarkably well with random stuff feels very painful at some level in terms of what do I know about what are these things?
09:52:31 And I'll push that point any further than in the second section.
09:52:34 But that's me, even Vss agrees with me now.
09:52:40 So what else can I do? Yes.
09:52:43 Add memory to these networks.
09:52:47 Like some Hitler association. I have a friend who answer.
09:52:54 You have to know, so you can think of the way if I think about a circuit that.
09:53:04 So there's 2 elements. Sorry, Kathleen. There's 2 things that could seems like natural way to think about this, but probably more.
09:53:16 One is that if you think about recurrent network rather than the feed-forward thing that I've described, then this kind of gives you projections or things that have to do with temporal structures, and then you can start asking what are the scales in which things will be remembered or
09:53:39 influencing what's going on. So that's one level that we've been trying to play with.
09:53:43 And the other is that if we again think of the ballad kinetic that implements these things, or the nature of dependency between these elements, then the if I if I think about tuning these particular weights, then this could have been the tuning you can think of as trying
09:54:07 to achieve a particular goal computer performance so if I've seen a lot of cats, and now I've trained a network that needs to deal with cats the weights, in a way, are the memory of my view of what does the concept of a cat look like the tuning of these things which is a little bit of what I
09:54:26 think Ashbro will say a little bit about, and that's why that connection there's some notion of.
09:54:31 Maybe the dependencies between things are not just fixed or random, but they're tunable, and that can serve us.
09:54:39 What do I know about the outside world? And then maybe you can even ask, What can I use this to say about things in the outside world?
09:54:45 Which is the question. I promised to go back to after the yes, there's more gambler observation that it's kind of nice to see that you end up with something that gets very close to, she also said, like the patient told me a calculating surprise almost get to something just like how persons of newbox
09:55:06 too. There's not something I would have expected from at the beginning of this talk.
09:55:10 Yeah, not sure. I would have expected that I would be a couple to 15 there's different pairwise relations of what we can think of the notion that the circuits can compute.
09:55:30 Their surprise is, this is kind of of course, a dominant idea in terms of how things might be computed at some level.
09:55:39 Once you wire things in this straightforward way, and you make this random thing.
09:55:45 You've built yourself some kind of a random projection model that computes the likelihood of the data for something we just don't know what it is.
09:55:53 Right. So if I just pick a random signal, and I have this random things of connecting them and this thing, it might not be computing the likelihood of the thing you care about, but it's compute the likelihood of something that maybe figure out that it's good for you for something so in
09:56:13 some sense. I don't know if the idea that it's compute the likelihood of stuff is as surprising as we thought at the beginning it would be it seems like this, almost natural.
09:56:24 If you wire things, just them, how are we?
09:56:31 Maybe I'll take one more, and then we'll Simon hasn't asked anything yet.
09:56:39 Thanks for the talk. So as an experimentalist, I'm always excited that they're models that can be tested somewhere.
09:56:47 But instead, correctly your last part you kind of asimated the contribution of the edges in the network.
09:56:55 The synaptic connections. You could take some away somewhat more important than others.
09:56:59 That's difficult to do experimentally, but removing nodes from the network, shooting out individual neurons.
09:57:06 That's something that can be done. So my question is is, basically, does your model your fit also make predictions about the contribution of individual neurons to the emergence of a collective firing pattern?
09:57:20 Is that they all, if they're part of the band, yeah, they're all equally important.
09:57:27 Or can we basically make predictions? What extent individual norms?
09:57:33 That's a cool idea. I haven't thought about you know, getting rid of particular neurons altogether.
09:57:39 Although in a way, when I, when we do these, when I replace projections, it's as if I threw this neuron completely.
09:57:48 And I get another neuron to come and connect all of you.
09:57:53 So so at some level, maybe my footing is closer to to getting rid of changing Urls rather than doing things with individual signups.
09:58:02 But yeah, I think that's maybe the best thing I can do for that.
09:58:06 Okay, so we can do more stuff later. So.
09:58:14 I think you can, whatever you use.