11:23:03 I already introduced myself yesterday, so I can jump in without an introduction. 11:23:08 So it's a third part of the three-part Bully Theatre. 11:23:14 I don't know what it is, so we kind of tried to coordinate a lot. 11:23:18 Gosh, Barnai, and try to have a more or less continuous threat to this presentation. 11:23:23 So I'll also talk about complex codes. And just explore a couple of different things that they have touched and added to more. 11:23:31 So this is what I'll try to do a lot, and hand gush preval mostly talked about the pairwise codes. 11:23:39 I'm done sometimes when you get to very large dimensionality. 11:23:42 Pairwise, it doesn't work well. You need to infer higher order structures in the code, maybe try to constrain them, and so I'll talk about something. 11:23:49 That we developed the unsupervised Bayesianizing approximation for detecting this higher-order features in large data set commutatorial features and large data sets. 11:23:58 Then I will view it and ask, why is that pairwise approximations work so well and I will show a couple of you know, simple statistical models that people may give as an intuition for why pairwise models work well, and and then I will this is going to relate to this idea 11:24:17 that sometimes latent variables emerge in large data sets, and I will sort of show other manifestations of latent features and and large. 11:24:27 You know, neural recordings. So talking about the complex codes, I mean, yes, we're all about decoding the brain. 11:24:34 The same picture the lodge showed the Razetta stone. 11:24:37 So what we are interested in is building complex conditional distributions. 11:24:43 Right. You know that this is the animal is going to do that. 11:24:46 If this and this and this and that, neuron are active, but this one and that one are not active. 11:24:52 So it's a large conditional distributions. So how do we do this? 11:24:55 And in many respects, of course, this is related to large language models. 11:24:58 It's complex distributions trying to figure out what are the this in this combinatorial spaces, what the correlations structures are. 11:25:07 So we are not going to actually build the distributions. What I will try to show is that you can simplify and make the problem much easier if what you are interested in is just detecting which particular features are anomalously represented alright so not try to build the model of a distribution but try to 11:25:25 tell which features in the distribution should be included in the model, and that becomes an easier question. 11:25:31 What should be included, as what and how and so we can make some progress. 11:25:39 With that we will. 11:25:44 This is, of course, related. We actually started thinking about this from after reading the Gunmore papers that Alot has presented. 11:25:54 So there's some relations to this reliable interactions model, except that at least on the surface the reliable interaction models is very difficult to extend to very large recordings, and we actually can extend to even very large data sets because we are not looking for coincidences between patterns of 11:26:13 neurons, over hundreds of neurons which never happen. But we're looking at and we'll go full sweep through all possible sub patterns that do or do not occur in the data set. 11:26:22 Okay, yes, it's just so you mentioned Lms in the first time you actually try to do sort of the opposite right? 11:26:31 So the Lms are trying to predict the most likely do you want to look at sort of outlines? 11:26:37 No, what I want to look is which features are overrepresented in data. 11:26:41 So if I am going to later build a prodigy, you know, predict what the next war on the next spike is going to be over this. 11:26:47 In my case animal is going to sing a high note or a low note. 11:26:51 I will know where to look. I will know if's basically trying to figure out the attention. 11:26:55 You know, like what are the places I need to be looking at? 11:26:58 What are the complex patterns? Right? Okay? So what I want to start with is that this is a problem of detecting overrepresented or anomalously represented, maybe overall, maybe underrepresented patterns is a problem that is everywhere in biology, right in spikes we want to try 11:27:13 to figure out which patterns are there or not. You can look at nucleotides. 11:27:21 You can look at, trying to align different proteins. 11:27:23 You can look at, I mean assets again, trying to align that brought in throughout this this would be Genes. 11:27:28 This would be proteins, and C, that the alignment whenever you get alignment, it's one. 11:27:35 Whenever you don't, it's a 0. Maybe there is a mutation that's one, no mutation. That's a 0. 11:27:38 And again, what you're trying to understand is, how do you build a model of a protein? 11:27:43 In that case, right? It's very similar problem. So the idea is that we are always going to take my sequences. 11:27:49 Maybe M. Samples trials for neural recordings, and we're going to put a 0 if there is no spike and one there, it really there is a spike. 11:27:58 We're going to find the most common ward, which is 0, which is going to be our default state. 11:28:02 And we're looking going to be looking at some deviations from this default. 11:28:05 Word, where everything is here right now. No mutations for proteins or complete silence. 11:28:11 And then once in a while out of 1,000 neurons, maybe a hundred are going to be firing at a given moment in time. 11:28:16 Still a small deviation from the base board. So then we're going to be looking at sort of this overrepresented patterns like this one. 11:28:24 One shows up everywhere, right? And maybe this pattern 0 one, even though 0 shows up very frequently and one shows up very frequently, is a 0. 11:28:33 One does not show up for 2 frequently, so it's a get anomormously represented button, and maybe you will see that this pattern 1 one is also over representative. Right? It's the purple one. 11:28:42 And now you have a couple of different things to think about. 11:28:45 You know I haven't 3 ones that are each individually overrepresented. 11:28:49 I see it, pair of ones and ones that are overrepresented, another pair of ones, and ones that are overrepresented, and a triple type is overrepresented. 11:28:58 So now I have to start thinking about whether this triple is more overpresented condition compared to what I would have expected had I just looked at the 1 one pair and another pair, and so on. 11:29:08 Right. So there are. If the words overlap, if these pieces of the words overlap, it's very complicated to see which ones of them are actually irreduced, it should be there irreducibly right. 11:29:24 And this happens for any words that that is, our completely parts of each other were in the opposite direction. 11:29:31 If you know that, for example, 1, 1, 1 one is overrepresented, you can equally ask whether the underlying pairs, 1 one which are parts of it, or even whereas that sequences are partially overlapping with it, maybe they're overrepresented simply because they are they 11:29:50 have pieces in them of some other object that is overrepresented. 11:29:55 Similarly and representative every time I say over you, just put over slash under right. 11:30:07 And so what if you have one pattern that's highly over-presented? 11:30:11 Does that induce automatically, and that's what we'll try to do for us. So the idea is so the idea is that what we will try to do is this problem probably is not solvable because it's not really complex. If we could solve this problem we'd be able to solve all of statistical physics 11:30:31 but we can start making approximations. But we can start making approximations right? 11:30:33 And so we're going to propose one specific approximation that works reasonably well, at least in the few cases that we consider. 11:30:40 So this is what we are going to do. We're going to look at the birds. 11:30:44 This is a bird which is singing. Sometimes it sings high peach, sometime that sings long, low pitch. 11:30:49 This is a sequence. We're not. This is not a calculation codes, that is, we're not reporting from many neurons. 11:30:54 But it's a single neuron which is reported for certain period of time before the bird sings mathematically. 11:31:01 There is no difference whether it's a single neuron many times, or you know many neurons at the same time slice and so there's going to be spikes on those spikes. 11:31:10 And then there's going to be a spike in this relevant variable, which is whereas animal is singing high or low, right? 11:31:20 So we typically will look at about maybe 20 slices of neural activity and one slice, you know, high or low, or maybe a few slices of high low for for for the for the subsequent behavior I guess mathematically they're similar, because you're not looking at patterns. 11:31:38 Across neurons. I just have a single neuron here many times right but you said that if you had other neurons recorded at one should be mathematically similar. 11:31:51 Yeah, because the activity is now one hundred-dimensional binary vector 100 neurons at 1 point in this case, activity is a hundred dimensional binary vector a single neuron over a hundred times slices. 11:32:02 I can equally have an activity of, you know, in 10,000 dimensional ductors, binary vector of 100 neurons times, 100 slices. 11:32:11 And then at that point I would just have not enough data and alcohol. 11:32:14 So you're assuming that case, you're just using the vector you're not looking at dangerous patterns. 11:32:20 We can. And we have this upcoming papers. 11:32:26 But I just want to present the. This is the simplest possible version of this, and so the. 11:32:35 Idea is that? Yes. So this is the Sigma 0 over here. 11:32:40 This is going to be my relevant bit. And then, Sigma, one through N are going to be all of the other bits and I'm going to talk for anomalously representative words that do not include this last bit. 11:32:54 Are not most represented. Patterns don't include as as words in a dictionary. 11:32:59 And then of those words, if those words, actually including the last bids and their code words right in this particular sequence of spikes, is overrepresented under presented, together with a high feature, low-pitch, so it codes for high feature voltage right so there are words in a dictionary and 11:33:14 then some of these words are code words. They represent neurological, so overall there's a joint probability distributed over this whole spins can be written down with no loss of generality. 11:33:24 As a very, very long sort of polynomial in the spins, right? 11:33:29 There's going to be 2 to the N. Minus one of these terms that are independent from each other. 11:33:34 Linear transferwise Terms, cubic terms, and as that point you have to throw up your hands right, because there's 2 to the end, things to infer from M. 11:33:43 Recordings of N. Neurons on the M. Times, N. Observations. And you cannot do this. 11:33:48 Right. So at some point you start, have to start making some approximation. 11:33:51 So what we are going to do is we are going to first say that we do not really care about the values of the status right, which is what we just care is whether those Thetas are statisticically significantly not 0. 11:34:04 So we're going to detect patterns that contribute to the probability distribution and that should also be kind of relevant to some of the discussion we had yesterday. 11:34:12 So we're trying to do some unsupervised learning of various features and the statistical learning behavioral context, how does animal figure out which features in the world matter that's the question we are asking which features are anomalously represented and so what I will do too, far I'm going 11:34:33 to so the type of the previous slide said with an excellent Yeah, it's said in the presence that not Mike's end, because sorry I was just going to ask that I thought that there was a duality between a financial family and accent. 11:34:46 Distributions and yes, so why don't? So it's not going to be Max End because of how we are going to choose which particular thetas are are are not 0 right? 11:35:00 We are not trying to build a model which is maximally has Mike Maximum entropy with a particular class of models, but we are just asking whether, in the probability distribution that you would build this particular. 11:35:16 Theta is 0 or nonzero right? And I'm not I'm not making any assumptions, or I'm not trying to approximate the true probability distribution with a maximum entropy truncated form with a particular ballot with the best chosen values of that right so the 11:35:32 distribution that I'm going to build from the data that I found will not be a maximum entropy distribution. 11:35:37 It will be distribution that has only the features that have high enough probability to need to be included. 11:35:44 So, I'm confused as much. But so you constrain the specific patterns. 11:35:54 But you have everything else to be flat. I have not constrained anything, so, for now I just wrote down this most general form of probability, distribution, now I'm going to start asking which of the terms should or should not be straight away. 11:36:04 So far I haven't done any constraint but then, once you do that, doesn't it? 11:36:10 Because it won't, because again, Mike's end means for a particular mycent is a prescription for how to find a particular satyr. 11:36:19 Right was in a particular class of problems. We are not asking that at all. 11:36:24 We are asking which of the status should should not be 0, and then, after that you can build Max and model. 11:36:29 If you for those 3. But we're not doing that right. 11:36:34 But we are not there. Okay? So what I'm going to do, I'm going to do something really silly in front of everyone of this status, right? 11:36:46 So this I had a on the previous slide. I had this products of sigmas, and each product that Sigmas had ata, which is a continuous, variable, real value, variable. I'm going to put another binary variable s so now have 2 to the n as variables which are binary 11:37:00 wire. Then I'm going to say that since I don't care about what Thetas are, I'm going to integrate out with status from my probability distribution and I'm going to make approximations because I kind of have to and so I'm going to assume that all 11:37:16 of the status are reasonably small, which means that none of the patterns is a dominant pattern. 11:37:22 Controlling the whole probability, distribution, those dominant patterns are easy to detect. 11:37:27 I would have detected that many hours away, so I don't care about really big effects. 11:37:31 I care about many small effects, so I'm only going to look at theta that are small. 11:37:37 So I'm going to put a Gaussian probability distribution of prorate distribution on the status. This Gaussian is going to be very narrow when I have a bit of Epsilon and given this Gaussian I can do a posterior calculation, for which for the 11:37:51 values of theta. Given the data that I have observed and marginalized over all of the all of the all of the seniors, and I will get them. 11:38:02 We do have we show there is some particular prescription that we show in the paper, which I don't want to go into details. 11:38:13 Basically what you want to have is the largest epsilon that is possible to you, constraining your models. The list right? 11:38:20 And so what we typically start doing is we start with very small Epsilons, and then keep running epsilon to larger and larger values until you break some consistency conditions and then at that point we stopped. 11:38:33 So we are getting the largest possible. So let me actually just wait for 30 s and I won't explain right. 11:38:40 So when you do this explanation, when you do the calculation, you will get probability of as giving all of the sigma which is going to be given as exponent of some Taylor expansion in Epsilon right so what you need to make sure is that 11:38:54 epsilon are smaller and absent, the higher order terms can be neglected. 11:38:56 And so you are auntie, excellent to the largest value at which the higher order Jones can still be neglected, and so what you will see is sort of things that look like this. 11:39:06 I didn't write down what the h's and J's would be, but they are some really nasty functions of data that you will observe typically the expectations that you would get from a new model, which is that you have you know, 0 as a most common thing and then you have independent once you know first order, terms. 11:39:27 Satisfied that would be expectation for what you expect for a particular pattern to occur, and then the real probability is a real frequency of every pattern to occur, and that would what would enter the Hs. And the J's. And so the story of it is going to be that the age is going to basically count 11:39:43 how different is the frequency of occurrence of a particular pattern from what you expected from a normal model. 11:39:52 And so, if H. Is, if the probability is too high or too low, then H. Gets to be high. 11:39:58 In both cases. So it means that you should include this S. 11:40:01 Has a probability of higher expectation value, so the probability that this pattern should be included should goes up, and the J's are interesting. 11:40:11 So the J's are interactions between the different patterns. 11:40:16 And they are explicitly dependent on data, but also dependent on the overlap between patterns. 11:40:22 And the most. The patterns overlap the more negative the James become. 11:40:27 So now, if you have a lot of patterns that 2 patterns which are mostly jointly overrepresented, and one is a subset of the other, then J. 11:40:35 Becomes a strongly negative term, and so there is this sort of Minneapolis situation where the patterns that can explain the data is going to suppress all other patterns which co-cur with it, even partially. 11:40:49 And so what I did so far it kind of seems silly right, because I had a model which was on 20 spins or 20 ballpark, 20 spins. 11:40:57 Now the number of Ss. Is 2 to the 20. So now I've got myself an Isaac model on 2 to the 20 spins, which is much harder right? 11:41:06 Except that both the h's and the J's are now have epsilon in front of them. 11:41:11 And so this problem can be solved through simple, mean field approximation. 11:41:17 We also did. Message passing makes no difference, and we can solve this problem on a laptop for a for example, for a data set which has 20 to 25 spin searched total of 2 to the 25 possible different long patterns. 11:41:35 Right. But then there are also sub patterns for each one of these patterns, and it's all solo on the laptop, because the problem is that we never have to calculate the we never have to calculate how likely or how anomalous is the pattern that has never happened. 11:41:53 Right. So we only need to look through our data set and pick up the patterns that have ever occurred in this data set and that number scales with the number of observations linearly, not with the number of spins exponentially right? 11:42:08 And so you just do some reasonable search through all of the possible patterns. 11:42:16 Check for all of the ones that have actually occurred. Do this analysis calculate the posterior probability for the pattern to be included in your data set in your in your model. 11:42:30 And you're done. And again it's a laptop scale. 11:42:31 Calculation on 25 spins, and we have played with us up to a hundred spins on Amazon clouds. 11:42:39 Reasonably, fast, so it's just some examples of appliances to the bird case. 11:42:47 What we were interested in is again Azer. Interesting patterns that control that show. 11:42:52 Whether the bird is going to be sinking high or low, and if you are coming from the motor control literature, most of the models in the motor control and the neural description of motor control are generalized linear models right. 11:43:05 So they would take a particular spin at 1 point and have an additive effect of all of the spins at every particular point, and that's what will create a motor motor response. 11:43:16 And the idea is that muscle circuit integrate over a long time. 11:43:18 So you really cannot distinguish 2 patterns, 2 neurons being disclosed to each other that, far away from each other. 11:43:25 And that's a long conversation in my group and working together with Sam, sober that muscles are incredibly precise or incredibly sensitive to the precise timing between the spikes. 11:43:38 And so here, for example, we see that. 11:43:40 That's okay. This battery. The problem is, I'm color blind I don't see the red light that I'm trying to show. So you would see it. 11:43:50 For example, the spiking at this point and at this point to an accuracy of 1 ms is, or 2 ms. 11:43:57 In this case is predictive of the animal singing at High beach, and this particular pattern is maybe predictive of animals singing at low pitch, and the absence of spikes. 11:44:08 At those 2 points is predictive of an animal singing at low pitch, and to the extent that we never see the same patterns occurring and predicting both the high pigeons and the low pitch, or there is a bunch of other charts that we could do we we know that 11:44:22 we're not bullshitting that this is kind of statistically significant. 11:44:26 We can also check whether it's biologically significant, are not showing it here. 11:44:29 But we can actually seeulate muscles with specific patterns that we have detected, and they do what we expect them to do right. 11:44:37 And so this is just the analysis of different neurons for each individual neurons. 11:44:43 You can see that there are a number of typical number of code boards the size of the dictionary. 11:44:50 As I was telling you, by building dictionaries is maybe 3 to 4 neurons are predicting patterns of activity of a particular neuron predict. 11:44:59 Teach the same neuron is going to maybe predict the amplitude, the same neuron is going to predict the amplitude. The same neuron is going to predict. 11:45:07 Maybe the spectral entropy, and that's kind of interesting, because people usually believe that different nuances in the Ra area, which is where this is recorded from are responsible for different parts of the of the of the of the song, and they would be if you only looked at the firing rate but one you 11:45:22 start looking at specific multipsy pattern correlations, you start seeing that the same neuron actually codes for different pictures of the different parts of the motorcode right? 11:45:32 This is general statistics, you know. There are almost no single spikes that predict. 11:45:39 The motor output. But there are quite a lot of 2 spikes and very few 3 and 4 spike combinations. 11:45:44 Part of this is that probably we are running out of statistics things just didn't happen. Higher order. 11:45:51 Things. But also I actually on some numerical simulations, we typically, if there are sort other partners. 11:45:58 We do recover them, and there should be a lot more sort of part in the second order, simply because there are lot more sort of thing that could have existed in second. 11:46:09 And so this is a bit surprising, and that's the thing that I want to sort of get to looking at connecting to the previous talks is yes, now we can detect higher. 11:46:19 Order patterns. We can model them. But why it is that the second order patterns are just most common sense. 11:46:28 But before it goes there I just show some interesting things. So this is a male Bengalis feature, according from they have 2 different types of singing behaviors sometimes they sing to a female, in which case they really like to sing the right, song, and sometimes they sing to themselves in which case it is a 11:46:44 rehearsal. They keep on practicing right, and so the variants of their behavior changes by quite a lot. 11:46:49 And so there is a question in the field where there's a motor control in learning versus motor control in typical performance, whether it's the same or not. 11:46:58 And so what we see, for example, this is the same bird singing to a female, or the bird singing to just to itself. 11:47:06 Right, the peach variance is much larger. And so what you start seeing is that there are a lot more code words and their statistical structure of cold words is much higher. 11:47:18 Is very different for this extreme deviations for pitches somewhere here and somewhere there are a lot more irregularities there, and a lot more larger scales, you know. 11:47:29 3, 4 spike code boards that are correlated with the animals singing in the tail right? 11:47:37 So that's what this Watts I supposed to show so there is some different encoding that goes on for typical behavior versus fourth Florida. 11:47:44 Okay. Still, most of the words are 2 spike combination, 2 spike boards. 11:47:49 And so let's now summarize this right, that higher order patterns are sometimes significant, and they can be detected from small samples I forgot to mention that here we have about 200 recordings, typically so 2 to the 20 is a total number of all different words that you could have had which is a million 20 11:48:08 on 20 bids, and only have a couple 100 recordings, and we can detect which higher order patterns are statistically analogous. 11:48:18 Patterns are statistically analogous to. And so maybe one can start thinking about merging this with what a lot was talking about random projections in a sort of try to not make projections random, but try to detect which particular Uran groups fire together or not fire together and then put the mic sound inference 11:48:36 models on those. So there's birds do not have. 11:48:42 Don't have that. Many higher order costs, but these birds are also extremely different from almost as the annuals. 11:48:48 You know our previous work has shown that for that that they almost don't encode anything by rate in their songs. 11:48:55 It's almost only the spike patterns. So there may be the worst case situation where to say that the higher order patterns don't matter, because they actually kind of doomed the first-order rates do not matter at all right, for the positives type of neurons and so i'm just reminding 11:49:13 you that this is a plot that that a lot showed right, that higher, that second order distributions seem to explain most of the stuff, at least in the small neural networks, and there is a list of other papers that came out later with very similar recordings and so what I want people to want 11:49:34 is like now that we can detect this higher order patterns. 11:49:36 Why it is that we're not detecting that many of them. 11:49:45 Occurrences in the data that then might contain that are encoding by the code, working with more spikes, with more spikes and even more code boards, as well. 11:49:51 I see is there some link to the kind of optimal encoding as oppression, etc., that people have more articulate I just don't right. 11:50:03 We were asked to stipulate on the paper, and I just I have no idea. 11:50:07 All right. I was not the one , so why do playwise? 11:50:15 Yeah, so far, you basically, first, look at this dictionary by. 11:50:22 And then, somehow, you condition it on different behavioral behavior. Behavior is also a spike is at 0. 11:50:29 One. That's my question. Can you, from the very beginning, somehow operationalize your it's that's exactly what we do. 11:50:40 That's exactly what we do, right? It gets to be a bit complicated, because for the behavior, the way that we have classified that behavior would be is a 0 1. 11:50:48 We? 50? 50? Probability. So there is no baseline right? 11:50:51 And so it gets slightly harder in just top applies it. 11:50:58 But if you were to ask for example, behavior is the tale of the activity. 11:51:04 So it happens, only 10% of the time. And so it's one if it's in the tail and 0 everywhere else. 11:51:11 And the code just works in exactly the same way as the rest of the obsolete. 11:51:18 For your activity. Okay? So I'm going to try to give some intuition with deterministic models. 11:51:26 For why pairwise models should be good enough for medium-sized neural networks, and so let's suppose I have an X or gate right? 11:51:40 So the 3 spins Sigma, one Sigma 2, and Sigma 3 are related by an x or let's say sigma, one is an X or of Sigma 2, and sigma 3 or any other way around the xor is completely symmetric in this way so this 11:51:51 gate cannot be modeled by pairwise interactions, because correlations of mutual information but with any pair of this 2 is 0. 11:51:59 And yet there is some interaction so it's a sort order term in the probability distribution. 11:52:04 In this case distribution is deterministic. It happens to doubt right, and if I were to try to do an inference model on this system, the only way to explain this would be with business but now let's suppose I'm going to add another sort order gate so now, maybe Sigma 4 is also an 11:52:22 X, or of Sigma. 3 of Sigma, 2. 11:52:24 Maybe it's negative X or whatever in the gigation of that. 11:52:28 So this in the true model, actually has 2 x or 2 sort order gates. 11:52:34 But of course, Sigma one, and Sigma 4 and the situation is equal to each other. 11:52:38 Opposite from each other, and that's a second order gate. 11:52:42 Right. You are going to add to your joint distribution that the term which is very high. 11:52:48 Jj. Times Sigma, one Sigma, 4 positive Georgia, which means that when Sigma one has high signal force, should also be high. 11:52:54 So the effective model of the system looks like this. I add another higher-order interaction, and the effective model. 11:53:00 Now Sigma 5 is equal to Sigma 3. 11:53:03 So here it is, I add another interaction with other interaction. Maybe. 11:53:08 Now I'm not adding an extra spin, but I'm just editing an extra interaction, coupling 1, 2, and 4. 11:53:15 And suddenly I am breaking down this model. The effective description of this model is that Sigma 2 always has one state. 11:53:21 Whatever it is, I forget is a 0 or one. But all of these variables are equal to each other, and this's a pairwise case. 11:53:29 And so the idea is that if you are aiding higher order, Yates, higher order, interaction to the system and the number of interactions starts approaching the number of variables that are being coupled, then the system starts freezing, not everything is possible. 11:53:44 Not every combination of state is possible, and the frozen state is fundamentally a pairwise description. Right? 11:53:52 You can think of it as having something which is extremely probable. 11:53:55 I can model the probabilities near those valleys as a hopeful network which is a pairwise network by construction. 11:54:01 Okay, so let's try to see if the same story holds for random probabilistic interactions rather than just not just the determines the case. 11:54:17 So what we are going to do is we're going to build a bunch of probability distributions which are just the sort order or just the force order. 11:54:25 And they're going to have M interactions on n variables. 11:54:28 And to avoid issues with just whether, with sampling things correctly or not, we're going to stick to ends of about 20 to 25 ish, which means that we can just put everything on the computer in memory you know, and just in a minute full all States completely and find them minimum, ground, states everything every 11:54:44 single, and then the number of interaction terms is going to go anywhere between half of N. 11:54:52 And about 10 am. So weakly coupled to very densely coupled right. And I'm going to try to fit this distributions to using maximum entropy fits to either linear models or pairwise maximum entropy models. 11:55:06 Now let's see how well, we are going to do so. 11:55:07 If I started with third order interactions. Of course, if I have what I'm plotting here is entropy per spin. 11:55:14 So as I'm changing the number of interactions, the entropy is going to decrease. 11:55:21 The system gets to do more and more and more frozen, and so, when I have very few interactions, a very small M system is just. 11:55:31 This will be disconnected, spin so almost fully disconnected. 11:55:34 Spins, linear models are going to do quite well. In fact, non interacting model is going to do quite well right explaining it. 11:55:40 But that's the linear models become worse, right? So what's potted on this axis is the Kl divergence between the true distribution and the approximation that you've built. 11:55:53 But then, when the system freezes, linear models become start working better, because now for every one of the States, if it's really a single frozen state, you can tell that you should be up, meaning that you have your value is equal to one right, or you should be down your values, equal to minus one so it's 11:56:12 a probability of individual spin. Probability, distribution interestingly for pairwise interactions, you always do much better. 11:56:19 That's may be expected right? But what is interesting is this part? 11:56:23 Is that you approach to 0 much quicker, and in a convex fashion, fashion. 11:56:30 And there is this range of where entropy percipine is still reasonable. 11:56:35 You know 0 point 1.2 bit whereas a Kl divergence between those 2 distributions is basically 0. 11:56:42 Right. And if you're looking at force order interactions, and of course, linear models don't do anything because there's a 2 symmetry and first models don't have it. 11:56:50 But pairwise models do even better write the approach 0, even cluster. 11:56:54 And here I'm taking this black curve, and I'm breaking it down with this sort of the measure of how that constrained the interactions are. 11:57:04 It's a barrier of individual. J's, that I put in the system. Times. 11:57:07 How many there are, and and divided by the number of stams. 11:57:11 So it's a number of interactions in the and the size of them peraspim. And so the largest. 11:57:21 This alpha is the earlier you oper to even when the antib per spin is still 0 point 25 right. 11:57:34 Meaning that the probability that a spin could be every individual spin could be up or down is still pretty high. 11:57:39 That's actually the value that you would get for it from a typical neural recording where 90% of the time you're not firing. 11:57:45 And you know a few percent of the time you are firing, you would get entry, Perpin somewhere here. 11:57:50 And so the idea here is that if I have large number of constraints, system freezes, and pairwise models, start working. 11:57:59 So what exactly happens is the following, you can see if I have very few very small entity entry per spin. 11:58:06 In this particular case a total entropy of about 2, which means that per spin of 0 point 1 or something. 11:58:12 This is one particular realization. What I'm plotting is an absolute value of the correlation between my spins. 11:58:18 And so the system just breaks down into a bunch of clusters, where each cluster has its own behavior. 11:58:23 Right. There is a latent variable that tells me that this cluster is all going to be up or all down, right, and another cluster which is going to be either up, or down, and this 2 independent variables completely control the behavior of the system as my number of beats first in the systems constraining gets 11:58:43 weaker and weaker clusters become less well-defined, and the approximation breaks down right precisely when you stop having sort of this overlapping latent features in your clusters that tell you how this whole cluster should be behaving like at this 11:59:01 point the approximation is bad and the Kl divergence is 0 point 6. 11:59:06 You would say, that this is maybe not very good, even. Maybe it's actually still good enough. 11:59:11 But definitely, worse than what we had before. And that's precisely when the number of this clusters has become extensive, right, and as much as you can say, with the 2020 spins. 11:59:21 And so the idea here is that with this dense interactions, clusters of correlated spins emerge. 11:59:27 This is equivalent to a latent, variable model, where each particular spin is coupled to a representative, to a field that is controlling this particular cluster with some j, which can be positive or negative notice. 11:59:40 I'm plotting here. Absolute value of the correlation. 11:59:41 So some of the spins can be an anti phase right? 11:59:46 And it doesn't matter. It really is how strong with their correlated with each other. 11:59:50 So this J's can be positive or negative, and we'll recently published papers showing that a very similar situation happens in typical Ei networks at more or less excitation. 12:00:01 Inhibition balance, that if you really puts them next to excitation, inhibition, balance, the networks developed. 12:00:11 This modes of activity, correlated neurons that are either all active together or not active together. 12:00:18 And you can describe the activity of this network's by a handful of latent variables, driving whole clusters. 12:00:25 There is more that paper that we did with Audrey is purely computational, but there is very similar analysis. 12:00:32 Analytic now coming from David Clark and Larry Abbott's and Larry Albert Group, using just random matrix calculations. 12:00:40 So. Yes, that's similar observations down there. 12:00:48 And so my statement here is that pairwise, very approximations work reasonably well in networks when there are a lot of interactions, and it works well. 12:00:55 Precisely because you get sort of this emergent latent variables which are controlling clusters, groups of spins, right? 12:01:04 This could be the same random variions that that Elad was talking about. 12:01:09 So maybe what we are trying to explain is why these random groups actually emerge right? 12:01:13 It's because there are so many interactions and say they have to emerge. 12:01:16 So now, what I want to ask is, how do I know that the particular data set actually has this particular latent features driving its activity? 12:01:28 Can I look at the recordings and tell that? Yeah, there are latent variables there, and I should be modelling system, not in terms of spin spin interactions, but in terms of spin latent variables, interactions. 12:01:37 So another way of asking this is, there is a whole lot of interesting observation that people have made about large-scale neural recordings some of them seem to be extremely surprising. 12:01:48 But are they really so? What I will say is that many of the things that people view as really shocking, really surprising are actually results, simple direct results of a latent feature, latent, variable description of a latent latent variable description being a good description, of the data okay, so i'm. 12:02:06 going to start with some of the slides. This picture you have seen before in the last picture. 12:02:12 This is a z floor. This is from gasper speed paper in 2,007. 12:02:16 This is the same zipf in the retina. This is from our fly, where I'm looking at the now not population recording. 12:02:27 But this is again temporal sequence of up to about 50 ms of 50 one-bit unit. 12:02:32 And I am going to plot the frequency of a particular board versus the rank of this word. 12:02:39 In all of this, especially in these 2 cases, what you will see is that as the length of the ward gets larger, so you're recording from more and more neurons so for longer and longer temporal sequences, you start approaching the zip behavior, right and the wise university where all of this was coming 12:02:59 from right as a critical situation in physics. When we hear about criticality, which would immediately jump and say, there is something excitability which would immediately jump and say, there is something exciting going on like why criticality should not be present everywhere it gets even worse. 12:03:11 So this is a picture, for example, from whole body. HIV, sequencing, and this is different proteins, and HIV in a single human. 12:03:20 You just sequence them out. What? How frequently this thing happens versus versus rank, and you get the zip flow. 12:03:27 You do for antibodies, for B cells, and is it presented by this house? 12:03:34 Same story. You look at things like patterns of chromatin accessibility in east cells. 12:03:41 You have Zip flows there as well. So anywhere you look at this very large recordings. 12:03:46 You get this thick loss right? And so like it cannot just be something very special to you. All recordings. 12:03:52 So what's going on? So we're gonna do a very simple model, which is very similar again to what you've already seen before. 12:03:57 The the random projections, except that here we are where I'm calling this latent, variable model. 12:04:05 So I have a bunch of non-interacting spins, and there's non interactions. 12:04:09 Things are coupled to a variable which I don't know what it is it's a latent, variable. 12:04:14 I'm going to marginalize over it right? 12:04:15 And so probability of spins given. H is just a product, because there are unconditionally independent and then total probability of Sigma. 12:04:25 I'm going to need to extract to marginalize over sigmas over h's, and I'm going to call this the energy of the Stem. 12:04:36 The log of B is going to be called the energy of the spinical square. 12:04:38 In a Physics Institute. And so what we can show with rather straightforward large deviation, calculation, cultivation, theory, calculation is that in a model like this, when you have either a single H or you have multiple age. 12:04:56 But they are much fewer of them than there are Sigmas you don't need that many again. 12:05:02 Coming back to what Alad was saying. You have fewer groups. 12:05:07 It's really, relatively sparse models, right? 12:05:10 The number of fields is variables is smaller than the number of neurons you will always have the zip, for you just cannot avoid it. 12:05:18 So, if there is any stimulus in your data, if your data sort of looks. 12:05:23 And this in this way you will always have a zip block, and there is a simple explanation for this, which is I'm going to skip intuitive explanation because I am running out of time. 12:05:33 But you can ask me later if you want, and so we now have another paper which I can share. 12:05:39 The draft is pretty good, not yet submitted, but I can share a few you want, so what we can claim now is that every time you observe a Z floor in this large dimensional recordings, then the only way, not just that it can be explained for latent by existence of latent variables 12:05:57 but the only way you can observe the flow in a from realistic datasets on realistic number of neurons is if it's drinking, if it's happening from this latent features, latent variables, models, there is no way of absorbing a Zip flow if the if the only 12:06:21 thing that is happening is internal interactions between the units and the system from realistic data set sizes to achieve it just by local infraction. 12:06:32 No, you will need infinitely large datasets, even if you got yourself to a really critical state, you will not observe zip flaws until your words are insane. 12:06:43 Ly, large, and there is some bounds that we can that we calculate. 12:06:48 And if you want to calculate the rank or a distribution of extremely large boards, they don't happen more than once right? 12:06:55 So they will have to wait for Embar to get enough of them to actually be able to build the Zip, the rank water plot right? 12:07:03 So for a realistic data set size, even if the system is fine-tuned, you will still not absorb, as it form. 12:07:09 You will observe some some different scaling behaviors, and so on. 12:07:13 And so this latent variables. They figure that it could be either emergent, which is what I've been trying to say, that if you put enough of the interactions you get this emergent latent feature descriptions. 12:07:27 Right, so external field is external. Variables are driving your neurons, or they can be some simple things like what Jonathan was saying. 12:07:38 It's preparation that is aging. So if you're all of your neurons in the, in your dish are just getting older and older in your dish are just getting older and older as you're recording from them, you will get a zip flaw. There is just no way. 12:07:48 Of escaping, so the only thing which are important for getting the zip flow is that couplings between the latent variables and the neurons are random. 12:08:05 You cannot have all exactly the same couplings. You will not get as if you will get a beautiful staircase, which is not what we absorbed in experiments, and you need to have the individual neurons to be adaptive or sensitive to this field so think of it if you have an 12:08:21 external stimulus. And you're not responding ever to the stimulus, whatever the stimulus is, and you're always class one, you will not get it anywhereability and therefore knows it. 12:08:30 Flows. You have to be responsive, right, which is not a given that this would happen for any particular stimulus, but maybe to aging it does, or maybe in a sensory system it does. 12:08:42 And the number of latent variables has to be much smaller than the number of neurons that you are recording from. 12:08:49 So it should be subleeding order in the number of New York that you're recording from. 12:08:55 So I can think of again that maybe the sparse, random projection model that a lot was talking about it should always produce a Zip flow. 12:09:08 If I understand correctly what I've talked about and maybe if the z-flor is there, then then then the way to model that data is the way that a lot has modeled that data, and maybe we on top of that add our eba, to figure out which particular fields. 12:09:24 To include, but but so all of this together might be a useful approach. Right? 12:09:30 And so this is just a slide to show you that this black line is the same as this line over here, and then the blue line is the reproduced from our model of a fly, whereas a. 12:09:49 H one neuron is responding to just a variable stimulus with good sensitivity, and you get the floor just not was no any kind of points behind you get steps simply because the number of times you will have one spike single spike can be seen at the beginning of the war in the middle of the war. 12:10:10 And the end of the war, and they all have more or less the same probability. 12:10:13 My last, thing that I want to say is that this model is a simple, latent, variable model, explains a million other things right. 12:10:21 So, for example, there was a series of papers by Leno, Michel Lamond, with the allek and tang and Broadie, and they showed that recordings from shipppocampus and the coarse graining when you take multiple neurons and you put them together and you call them a hyper neuron and 12:10:41 you look at correlations between those clusters, they they scale you, get scaling of neural activity. 12:10:48 You get non Gaussian limit probability distributions. And so they report. 12:10:52 For example, this is a number of neurons put together in a cluster. 12:10:57 This is invariants of the activity of this cluster. 12:10:59 There is a linear. There is power law scaling with the if. 12:11:02 The neurons were independent, they would add a. The variance goes linearly with the number of neurons. 12:11:10 If they're fully dependent, variance would go as a square of the number of neurons and in their experiments you get a power law which is 1.4. 12:11:18 This model gives us 1.3 6 7, right? I mean, I've never seen better agreements in between sort of simple theory, with no adjustable parameters and and an experiment. 12:11:31 What's more interesting is that even the full probability distribution, with all of its tails, of the activity of this cluster of neurons, is exactly the same within this model as an experimental data, you everybody is aware of neural aviation, right? 12:11:49 So this model produces neural avalanches with exactly the right relation between the avalanche. 12:11:55 S. Alpha and gamma exponents, plants and bags have been absorbing and done since hundreds and hundreds of other people. 12:12:07 The latent variables give you exactly the same scaling. 12:12:12 Tanya Sharpe has been working on hyperbolicity of neural codes, showing that they're all hyperbolic. 12:12:18 These codes are produced exactly the same Betty curves as kind of measures in the recording that she's working with. 12:12:28 So all of these things to me basically say that the experimental data set that we're dealing with should be modeled as latent variables. 12:12:40 Or how do we find those which are the better ones? I don't know maybe it's random variables. 12:12:45 Maybe we should detect the the combinations, but it's latent, variable, model. 12:12:48 S. So every time you see of a lack of superbicity scaling, and the coarse graining zip floor, your immediate reaction should be the dominant part to the description of this neural code is a latent variable model description. 12:13:03 With a q-tent variables. And so that's where I am going to add. 12:13:06 Can read this as much as well as I can, so. 12:13:16 Yeah, please, I mean, this is kind of a big picture question, like, there is a whole cottage industry on using data. 12:13:28 Dynamical systems approaches the model. You know your population. 12:13:32 Is this kind of a physicist way of saying that? Yeah, that's the right way. 12:13:36 Tomorrow. I mean I don't know but there are a lot of good papers and lot of bad papers, right? So I don't know which specific descriptions you are. 12:13:57 Things like sens sensitivity to to inputs, dense couplings. 12:14:03 Right, a large number of interactions, diversity of couplings, and so on, and so forth. And if yes, if that's what your model includes, then this is probably a good description to. 12:14:15 2. 12:14:18 To 380 thanks. 12:14:23 Yes, please. So actually, my question is for all 3 of the speakers today. 12:14:30 Is that so just in context with yesterday, where we were talking about more examples of even our animal Burma, that all the models that you put forth are ones that are relying upon some degree of stationary, I'm just kind of curious. 12:14:50 You know what might be either expansions or other things, that you might incorporate to, you know, address. 12:14:58 You know, if that system is especially undergoing a relatively quick learning process, slow, whence me seem easier, but I ultimately seems that all these models can tell us a lot about what, theivity or neural interactions is changing or not changing with but I just don't know how much of you 12:15:21 you've done this. 12:15:24 I can take it for one on slide. Okay? Okay? 12:15:30 So for fast learning. Specifically at one of the directions that my students are working on is that you actually don't need learning synaptic learning for fast, for modeling neural activity in fast learning scenarios, mostly working with the birds, that you have a new random ish neural network with an 12:15:58 external drive, which is a motivating activity, and by changing the modulation you can replace the changes in the statistics of of neural activity over over time, over short periods of time. 12:16:11 So that's one potential expansion you just introduce an extra latent variable, which is some kind of a modulation right? 12:16:20 And then you can play with it while leaving all of the synaptic structures unchanged. 12:16:25 Just as a common. We definitely were thinking about it. But the primary, I mean, from data driven side, the primary constraint was to get any of these to work already on pretty long, and I think I mean, there is progress being made in the models which are more sample efficient so that one could just set 12:16:41 slides the data through time in C, but the structural interactions he's changing any walk away. 12:16:47 And maybe at that point it makes sense to think of more complicated what put some prior of the utilization of how these interactions are changing right? 12:16:56 Maybe they are not training, so are recognizable bench, and we set themselves right. 12:17:01 So I think we might be getting there.