09:16:06 Ods are basically pick a different format, a different kind of interpretation of the organizer's instruction. 09:16:14 I'm going to stay very close to their instruction, which they said, don't talk about your work, and I'm not going to talk about my work unless if we do have time. At the end. 09:16:23 And there is interest. Then I can talk about 2 projects in the lab which are related. 09:16:29 One of them is a Danish project, which is basically reward free. 09:16:32 Learning of structure, sensory structures, and another one is basically embedded in a reinforcement learning task. 09:16:39 How statistics can be learned and exploited if there is interest, and there is time. 09:16:44 Tell me I can talk about those things, otherwise I'm going to basically talk about this topic of a statistical learning versus reinforcement learning. 09:16:55 And because it's not my work, I probably won't be able to answer your questions, but we'll see. 09:17:01 So? Why this topic, whenever I start basically giving talk about my work, I usually start saying that as we navigate the world around us, we constantly learn the complex structures in the world, and there are 2 different types of learning one is reinforcement learning when an agent engages with the environment makes an action 09:17:25 action has an outcome. Action has an outcome, and from the outcome we can learn about the state of the world. 09:17:30 And there's another learning mechanism which is a statistical learning, and it's incidental. 09:17:35 It doesn't have to be instructed or reward driven. 09:17:39 And we basically can just like learn implicitly about the structure. 09:17:43 But always in the back of my mind. I always immediately ask myself, Is it really true? 09:17:49 Are you sure that these are basically 3 different systems or different kind of a mechanism? 09:17:54 So when Libya, Joseph, and Mattee, they asked me to basically organize a day, I just like, Ok, let me face my demons and let me just basically kind of discuss this topic. 09:18:07 At least hopefully, it will help myself to be more clear about where we are standing in terms of the relationship between these 2 learning mechanisms. 09:18:18 So and so today, just a little bit about the format. 09:18:23 I'm going to present these these talk. But, please, I really want it to be as interactive as possible. 09:18:30 So take every slide, everything that I say, just as a kind of a guideline to ask questions and start a discussion, because the plan was for me to go through this talk, and then, at the end earlier, when I will give it basically the outline will be I'm going to talk about reinforcement learning different 09:18:52 algorithms, the biology of it, the predictions. And then some examples that it thinks at least the existing algorithms cannot explain certain statistical learning behavior or certain learning behaviors. 09:19:07 And Ilia is going to basically give also 2 exam of that from bird and sail against behavior. 09:19:14 And after that there will be a break, and we come back, and Ilia, Kishore and I. 09:19:20 We would like to have a kind of a panel discussion, but with your participation. 09:19:24 But I really would like to start to open a kind of a panel discussion right from now. 09:19:29 And we work. But another big one. Error based. Yes. 09:19:46 Yes, you see it as touch on that. Ok, so the thing is that, here is the pointer. 09:19:57 Okay, so. 09:20:00 I kind of in my mind, I think if we really kind of step back and forget about the existing algorithms in rl, okay, just think about a prediction problem. 09:20:11 And a kind of error, minimization, procedure I think I can kind of forget about the brains just mathematically. 09:20:21 I put error, driven learnings into reinforcement, learning. 09:20:25 Okay, mathematically. And so the first thing is that. 09:20:36 There was really nothing now. 09:20:47 Clear process for a. 09:20:52 You know. 09:21:00 Bye worth of mind, you know, but I think not provide. 09:21:06 I love it! 09:21:30 , so so you're learning, and the kernel model. 09:21:53 Correction of online. So in verses, let's say. 09:22:03 We will get to that. Yeah, yeah, yeah. So, but before that, let's just I think that's the place. 09:22:14 There's a lot of under. 09:22:20 Sure. That's where. Yeah. But for now. That's as a starting point. 09:22:27 Let's forget about biology and let's forget about the brain. 09:22:30 Let's just think about the problem. The problem is that some structures, something needs to be learned and usually learning ends up being able to right. 09:22:42 So we can formulate all of this problem problems as the problem of prediction I'm not talking about predictive coding. 09:22:48 I'm not talking about anything. Basically, biology related or or brain related. 09:22:53 I'm just talking about basically there is a learning problem and prediction usually is a very important outcome of that learning right? 09:23:04 So I think statistical learning basically also similar to other types of learning, is basically the ability to be able to predict and extremist. 09:23:15 The next event, the next basically. 09:23:19 What is the difference between prediction and guess? Oh, sure, yeah, no, yeah. 09:23:26 In my mind everything is probabilistic. Yeah, so, and in that regard some prediction reducing prediction, error should happen right? 09:23:36 We need to minimize the prediction error. 09:23:46 No no everything again. I'm talking about very abstract. 09:23:52 Just like, take that, get the input whatever input, is a sensory motor, whatever replaces it with a variable, it can be internal instead, whatever right? 09:24:02 It can just like replace it with that X, and then X. 09:24:05 Some prediction about what we need to. We want to come up with some prediction about X, and we want to minimize some error. 09:24:21 Can you me as something that you do this by that? 09:24:33 I said, you mean that we want to be able to predict at the end of the day, or basically, prediction is the way of learning. 09:24:40 Yeah, that's a very good point. I don't know whether there is a real distinction at the end of the day, again, when we start talking about biology, I'm sure these things matter a lot need experimental measure. 09:24:56 Glorious book! 09:24:59 Right. But it can be basically the learning can be during learning. 09:25:05 Prediction can be the driving force of that learning. Basically, yeah, yeah, yeah, yeah, no. That's it. 09:25:16 That's that's a good point. So. 09:25:17 But in anyhow, I think you agree that that at that very abstract level probably we can formulate the statistic on learning as a reinforcement learning problem. 09:25:28 We want to minimize some prediction error. And this prediction, error doesn't need to be linked to any kind of a real reward value or whatever it's just like we want to minimize that. 09:25:40 And it can be an rl, okay? But then the thing is that now we have, we are neuroscientists, and we do care about brains. 09:25:49 And we do care about real agents. Right? So then, we need to start talking about relevant algorithms. 09:25:58 That would basically kind of make certain certain behaviors possible. I'm sorry matter for choosing you here, but if anyone I knew that you won't be offended if I put your picture here, the the current notion of reinforcement learning is very much related to basically an 09:26:21 agent, exploring environment, right and so we do have an environment. 09:26:26 We know that there is the courtyard, and if you run to the courtyard a few minutes before 30'clock, or whatever we can get cookies, so we take an action. 09:26:40 Action has outcomes. And we also observe states of the environment. 09:26:46 Whether it was morning. Late in the evening. What we get. And we basically kind of learn about the environment. 09:26:53 And we can form poly right? We can basically optimize our actions in the environment in order to maximize the war. 09:27:04 This is basically how reinforcement learning is being studied these days. Right? It's then. 09:27:09 Now, we are far from that, like very abstract notion of, we want to just minimize any type of prediction, error, right? 09:27:17 It is linked to this agent that wants to take actions and makes the premake policies in the environment right and. 09:27:24 So, when we don't think about objectives of such reinforcement, learning paradigm. 09:27:33 Now, we might see some sort of a kind of a maybe a difference between reinforcement learning. 09:27:39 The objective in reinforcement learning is to develop an optimal decision-making policy through interactions with an environment right but the objective of a statistical learning might be just predicting or estimating relationships between variables doesn't need to have this component of a decision making a 09:27:58 component of the outcome of the right, at least not immediately, I think. 09:28:04 Again, evolutionarily speaking, statistical learning exists because at the end of the day it will pay off right. But it's not immediate. Immediately. 09:28:13 It's not related to that kind of a taking action and making policy right? 09:28:20 There's this distinction. Okay? But. 09:28:26 When we again look deeper in reinforcement learning algorithms, we see that like reinforcement learning uses some sort of a sequential experience with the states of the environment and the outcome of our actions to assess what are the best actions right and there have been basically 2 main classes 09:28:51 of basically algorithms. This paradigm one is called Model 3, and the other is model-based. Right? 09:28:58 So model, free uses this sequential experiences directly, meaning that I want to again take an action. 09:29:05 The action has an outcome. I evaluate the value of that action, and I learn again what is the best act. 09:29:13 So it learns directly. And it's tightly linked to reward prediction, error. 09:29:18 You want to estimate values. Okay, so it's a value based basically kind of a a learning of of the model base is different because it uses this sequential experiences to build model of the state transition in the environment okay, and then also the outcomes that were associated with the certain basically policies in this 09:29:41 environment. Right? So you do see that now is model based has the component of at the end of the day the agent that uses this model based is going to form some knowledge about the environment and the knowledge about the environment is not necessarily just direct directly linked to these I want to make an act basically make an action. 09:30:06 And maximize, the reward of that outcome right? 09:30:09 So, if it's I want to learn about the environment. 09:30:13 And interestingly. So there is this, wait, wait for this. So there's this interesting work from. 09:30:23 Not quite a long time ago. That's they put subjects, human subjects in a probabilistic Markov decision task. 09:30:32 It really doesn't matter the task, but what they found is that in the brain the strident structures, particularly ventral stratum, shows them the what prediction error this kind of a model free. 09:30:50 So a model free value based? Orl. Okay. In contrast, if an agent is going to learn about the State transition, what important thing is to instead of reward prediction, error, to have state prediction? 09:31:10 Error, I want to see that if I'm on a certain estate and then go to another state, does it match my prediction or not? Right? 09:31:18 So now it's about just the knowledge of the environment. 09:31:22 It's now kind of a distinct from there. That's a link to the reward. 09:31:26 And then specific structures like interpparatial circulus and lateral prefrontal cortex shows up in terms of coding. 09:31:36 This state. Prediction. Error. Okay. Distinct from strife. 09:31:39 So it seems, and and then that was basically the the start of the suggestion that, like maybe in the brain, now, we do have 2 different structures or 2 different mechanism, that one is model free is more related to this kind of I want to maximise, reward, and I want to just like learn from my 09:32:02 immediate actions, and minimize reward projects. There is another system that is related to basically knowing the relationship between different states of the of the environment and their state. 09:32:15 Okay. But he may ask that, how can we better basically test whether and how this model? 09:32:25 So this means, I went to say, model-based way of acquiring knowledge. 09:32:31 Can account for statistical learning. Other types of statistic learning. 09:32:35 Maybe the classical notion, the historical notion of classical, specifical learning, which is this incidental learning of sensory environments by mere exposure. 09:32:46 Okay. So then, there is this beautiful work from Math Rushworth, a group at Oxford that they basically put human subjects in a. 09:33:03 Task. That's you see that on the on the screen the yellow square, it's a target that shows up. 09:33:09 And then they basically need to kind of click and follow that target. 09:33:12 But then what they did is that they put 2 different sequences. 09:33:17 One is Abcd that leads to rework. The other one is that a prime B prime, a prime B prime that doesn't lead to reward. 09:33:25 So the subjects are not instructed. Anything about what is basically whether there are these 2 different sequences, whether there is any sequence or not. 09:33:38 They were just instructed to follow the target, and in in one case data target is followed by this, A, BCD. 4, always. 09:33:47 Okay. And then leads to rewards. And in another case, again, a repeating sequence, without any reward. 09:33:53 So you can now imagine that we can study potentially statistical learning of a sequence in one condition is linked to reward in another condition is reward. 09:34:06 Free. Okay? And then. So they first trained subjects on this task for 2 days. 09:34:14 Once that learning was done, they brought them back, and then put them in scanner and wanted to see what's happening in their brain when they are again kind of a studious. 09:34:26 These sequences and. 09:34:31 What they found was very interesting, which is again some sort of a distinction between 2. 09:34:39 So basically kind of a network of brain regions. 09:34:45 One is temporal pole, and or Ifc over the frontal cortex that is related to this basically are unrelated knowledge of the task, meaning that the sequences that lead to reward. 09:35:04 Why, in Pfc. And amygdalaan hippocampus it was an it basically quotes for that other knowledge which is the kind of a sequence that is not linked to any rewards right. 09:35:17 They are learning flexibly that backwater structured. 09:35:23 And and that is the structure that is basically basically kind of a reward. Free. Yes. 09:35:33 Aspect. People, yes, seems to be. 09:35:44 So! 09:35:57 It's it's not passive. They are doing the task. 09:36:03 So they are doing this this task basically right to to follow the a target right? 09:36:13 And again in one context, it's explicitly a reward to even Rl, basically agent. 09:36:22 You can model it by that, but then the rest of the session, they are also exposed to this sequence, that they don't read to any, and because they really wanted to based study these, the title really shows the summary of the work perfectly they're just like that are multiple structures 09:36:43 that's were created by reinforcement versus statistical learning. 09:36:47 They wanted to embed these 2 tasks together right? 09:36:51 They are doing a task, that task in one context is related to Rl. 09:36:56 But then that task has also this kind of, as you said, passive exposure to structured sequences which doesn't have anything to do with their action. 09:37:07 So! 09:37:12 Hear that the and II share that when II think yes. 09:37:25 Album. 09:37:29 Not even a meeting following the target or. 09:37:39 Task that they will naturally do that in this case, not necessarily saying anything about censorship. 09:38:00 But with the task aspect of it. It's hard the to do. What? What do you mean? 09:38:09 So start to engage with. Of course I see what you mean. 09:38:15 So you are pointing at again that yes, so there is this distinction in the statistical learning. 09:38:21 Learn. You may want to kind of give a bit back more background about. 09:38:25 That is about this implicit aspect of the incidental learning, slightly Starbuckle, if it is development. 09:38:33 And here my this morning Paradise language learning, heard why they were design, how they? 09:38:41 This argument possibly learn without feedback the face of what were much larger. 09:38:51 It comes from that. And so the task for people who wanted to look at learning approaches to language development had to have. 09:39:02 That was free of speech. 09:39:05 Okay. 09:39:10 Back and says, bullying as well. Task, because tacular learning. 09:39:16 Now all of that has been not challenged, but messy. 09:39:21 In lots of there are ways that parents get feedback, both positive and negative to others. 09:39:25 And it's very interesting to think of how they can part of the medical motivation. 09:39:34 So have a task in camp, for example, health. And you need to learn. The structure is also historical work by reever. 09:39:45 To be learned here. They do not learn as well. So for these you. 09:39:52 Yeah, so of a disorder Mayor of you asked, what is the task? 09:40:01 Exactly if I recall it correctly. So subjects are staying on screen a greed of stimuli like this. 09:40:09 And they have to basically with the arrow keys follow wherever there is this purple, purple silk. 09:40:23 Okay, move the arrow, kid there and press something like that. 09:40:24 Then, when the yellow square appears, they will get real. So you are right that I think in this task it's not that, like the subjects are asked to form a policy in order to maximize a reward. 09:40:39 Because they are basically following wherever it's shown to them, right where to to click on the screen, and then at a certain point, they receive yellow score and in Yellow Square sometimes appears after the reward appears after this sequence. 09:40:59 This very specific sequence? Right? So they are not hold that what they can learn, and then sometimes some other times, there is a similar sequence that always repeats but it doesn't lead to this yellowest square around there on the screen. 09:41:13 So no reward. They are instructed to follow the process. 09:41:18 Yes. 09:41:22 Is the instruction itself. 09:41:29 But it still is instruction. Yeah, yeah, maybe exactly. Yeah. Yeah. 09:41:40 Right? Because. 09:41:46 Right like it doesn't. I mean, yeah, we don't know. 09:41:49 We don't agree or know what the exact definition is. 09:41:52 There's a sequential structure. We will. 09:42:01 I'm going to present a task similar to that, too. 09:42:08 The it's a good thing to have in mind that the task design, because I'm going to present something else which they concluded different thing, and it would be interesting to see why they see these different results. 09:42:21 Maybe that then they really dispers of the task matters. Yes, yes, the special. 09:42:31 They have this context. They have blocks. 09:42:36 Right. So I get that the subjects are showing put on 3 different blocks. 09:42:47 Okay. In one block, they see this read of objects on the screen. 09:42:52 Okay. On the screen. They see this. This objects right? 09:43:00 No, no, no! All at a time. Yes, all at a time, and then they are moving either a drastic or arrow key. 09:43:09 I don't remember that, okay. And then they have to follow the purple square and at a certain position, right? 09:43:18 A yellow. They follow click, follow, click, right? So they have to move. 09:43:23 They with this imagine statistic, we just stick. They can go anywhere and disagreed right? 09:43:30 But they are instructed to follow this, and eventually they are given reward. 09:43:37 Here, and in context one. Sequence, it's basically it always. 09:43:43 By. They don't show its hair there, but is look at the context to context, to the 6 goes like A, BCDA, BCD, so the pair prot the pair per square appears here here, here, here, here, always right and if they get they receive A, reward but then there is also another sequence 09:44:05 that here, is the example of a non-rewarded sequence in another context is that if they always go here here, here, they repeat that. 09:44:13 But the yellow scroll doesn't appear in d prime, so they don't get rewards. 09:44:19 No, they are not. So. You see, they are just like kind of optimizing. 09:44:23 They are using each context to show to make another a different point of the task in the context. Today they are sure in the reorgan sequence in context 3 day are showing the example of a non-rewarded, basically a sequence. 09:44:38 Here. Yeah, yes. Oh, yeah, they didn't were right. Yeah, exactly. 09:44:47 Yes, yes, yes, in each context they do have. They do because because they want to basically expose the subjects to different sequences. 09:45:03 So here in this way they will have 3 sequence of rewarded and 3 sequence of non-rewarded. Basically. 09:45:14 A lot of. 09:45:27 It's up there acting with. And so. 09:45:35 There is a lot of different. 09:45:39 Process. So I mean this stuff so. But then maybe then, but, on the other hand, then, if something survives, some average effect survives, then it's not due to these differences, because those will just average, out average, just reflecting. 09:46:03 But I think the okay spent a lot of time trying to. 09:46:10 We estimate. 09:46:14 But the motor component in different contexts. That all of these things are happening. 09:46:21 A complicated way. That is not, yeah. 09:46:26 So 2 things. 2 things are so, first, when they are called back to the for the Fmri station. 09:46:38 They are not doing any they are not moving, they are not doing this task anymore. 09:46:43 They are just basically said that once they are showing the streamline to them, and some of them are random sequences. 09:46:50 Some of them has these these learned sequences, and then, once in a while, they just like, tell them whenever a flower appears on the screen, just to have their attention right. 09:46:59 So there is no motor movement in the Fmri session in this context. 09:47:04 Well, still the. 09:47:09 This is where, but because the motor sequence don't. 09:47:13 One context is different to the other. 09:47:20 They're on the path as opposed to. If you did this with 200. 09:47:23 Doesn't apply to. 09:47:37 The sequence of every it does apply. 09:47:40 What is this? 09:47:45 Okay, just in reading the control sequence. 09:47:54 No, they don't know. Okay. 09:47:59 That it's equally frequent as the yes, exactly. 09:48:03 Yeah, it's not. 09:48:08 Through these, but they always follow the the purple circuit right? 09:48:14 But then sometimes the purpose signal follows this. Sequence. Repeated sequence. 09:48:21 Sometimes they may continue going, but they don't. The end. 09:48:25 Yes. Sorry. Yeah. I have to be more clear. Yeah. 09:48:33 Here is no, but continues. But then the changes from one context to end up one. 09:48:51 Because their job is to follow the purple right. And then, once in a while they get reward, and that reward sometimes follows this sequence in context a and in context to follows this sequence. 09:49:03 But then, in context a this, they still follow a sequence that doesn't give them reward. 09:49:11 But it's just like they're all mixed. 09:49:20 It's true, but they are not told that there is a sequence right? 09:49:24 So. Now the point is that. 09:49:38 No, no, it's the purple. We need to follow the perfect circle. 09:49:41 Make sure that at some point here we get report sequence will be embedded in the thickness of all things, that the subjects don't need to do anything clever here. 09:49:52 The subjects just need to follow the purple circle, and following the purpose in here, sometimes with lead to reward, and sometimes no, perhaps. 09:50:04 It is okay. Yes, yes. So we can debate about the. 09:50:11 So since the summary is that it might be difficult to really conclude that this is the best paradigm to study statistical learning. 09:50:22 Ok. But given that cavat. 09:50:27 Please. 09:50:30 You know, so given that caveat, they concluded that if we sorry sorry if she tell you how to cross side. 09:50:44 No, no! Go ahead. The question I had with. 09:50:52 Yes. 09:51:04 No, not really, not in this. In this specific task design. It's not, but it seems so. 09:51:30 What they want to claim. Given all the caveats what they want to claim is that if they look at of Fc. 09:51:43 Versus Psd. On Igdala network. 09:51:44 They respond differently in the recall part of this learned task, meaning that Ofc. 09:51:53 Recalls the sequences that lead to reward, whereas Alic de la Hippocampus preferredal cortex, they basically encode for sequences that are reward free okay, so that's what they want to to play. And then somehow. 09:52:12 Kind of conclude that then maybe this network is more for flexible statistical learning of knowledge about the environment and the other one is more related to value based. 09:52:26 Okay. Oh, so. But how about classical paradise and statistical learning? Right? 09:52:36 So this is a, this is a lot of you. Know, this paradigm. 09:52:44 It's basically a continuous presentation of syllables. 09:52:51 But then there is a hidden structure, meaning that the transition probability between 13 syllables but then there is a hidden structure, meaning that the transition probability between 13 syllables is different is higher than the rest. Right? 09:53:03 So you can by just like being exposed to this sequence, and it seems like babies. 09:53:07 After 2 min in that in a bit quite fast, they can basically show sensitivity to these world boundaries. 09:53:16 If something deviates the structure of the word boundary, they are surprised. 09:53:20 Okay, but how about so this type of statistical learning? 09:53:25 There are other examples of that Turk Brown has a lot, and a Shapiro has a lot of studies on that with human adults in visual domain that you can pair stimuli. 09:53:40 And then you can basically embed again, the same type of structures. 09:53:43 They can be community structures. They can be just like in terms of their item probability and there is a lot of basically theoretical claim. 09:53:53 Also some supporting evidence from Emri and lesion studies that hippocampus is important for that type of learning. 09:54:01 But thinking about hippocampus. It's an interesting thing, because usually the input from internal cortex this hippocampus is basically via 2 different pathways one is a trisynaptic path. 09:54:16 Okay. And one is monosynaptic pathway to Ca, one, okay. 09:54:20 And there has been these, this idea about different functions of this pathways in learning different types of structures. 09:54:32 Okay, so when we talk about hippocampus, the very first thing that comes to mind is just episodic memory, right with the hippocampus isn't important to form to learn basically, at episodic memories autobiographical memories. 09:54:46 And usually for that, we want to have distinct separate memories. 09:54:52 We want to maximize distances between between our memories. 09:54:56 Right so, and for that this tries notic pathway have been proposed to be important, particularly because. 09:55:10 There's this idea, that input from entrinal cortex to dente driveers, and then dente dryers to say, 3 would make the the representations orthogonalize just basically increase the distance between them, make them independent. 09:55:26 And that's really good for again remembering separate episodes. 09:55:32 But statistical learning that you are talking about is opposite to that. 09:55:36 We want to basically look for commonalities. Look for some structure that would survive. 09:55:44 Even if we basically average out different circumstances, different instances, right? 09:55:49 So, and for that the monosynaptic pathway have been proposed to be important. 09:55:56 Ok, and particularly again, because in this pathway you can have overlapping representations and overlapping the presentations. 09:56:08 Theoretically can be good for learning common structures, right? 09:56:13 Whatever survives is just the common structure, everything else that we just like for that. 09:56:17 So there is this kind of a theory related to statistical learning, of very kind of again, type of statistical learning, which is closer to the historical notion of a statistical knowledge and coming from language, acquisition. 09:56:38 But there's this study that it's a recent study that they were putting human subjects in a in a task. 09:56:48 They call it a nad task, which is related to it's basically non-adjacent dependency task. 09:56:55 So you see that? And 3 words each ward has a 2 syllable. 09:57:02 Okay. Appears like Jupu, by De Rooney. And what happens is that those a ex. 09:57:09 C, so this is non adjacent, meaning that the middle word can be whatever, but always they are here. 09:57:18 Jupu. Whatever Rooney is fixed right, and subjects are asked to basically respond whenever they hear their target. 09:57:27 Which is this? Looney here? Right and x can vary right. 09:57:30 So they are again, not told anything about the structure of the of the input they are just like listening to this continuous, non-stgmented auditory. 09:57:40 Input which has this one syllable after the other. So there is no key. 09:57:45 Basically queuing the the world boundaries. 09:57:49 They are exposed. And what happens is that slowly, slowly, human subjects, we learn this is structures they can guess that like, it's always a XC, okay. 09:58:01 And so, and that's basically they measure that by looking at the reaction time in respond to the target sound. 09:58:10 And the reaction time gets smaller and smaller, because A is predictive of C. 09:58:16 Now, so as soon as they hear a they can guess that it's going to be X and C. 09:58:22 So they respond much faster to C, right? So they show that that's basically goes down, which is not the case for a random sequence, because if they are like, I don't know their moral component is important, are just like, I get more proficient in in in doing the task, which is just pressing the key but you don't 09:58:41 gain in reaction time for those random sequences only for this structural sequence. 09:58:47 Axc the reaction time goes down. 09:58:51 This is just in in service of. 09:58:59 Part of it's the part of its familiarity. So there is, of course, small amount of I mean, missed female there's some masses going and happening there at the individual's items, but not the so just by note, it's always the school learning. Present I look at that and I say, of course they're learning something but you 09:59:18 see? Yeah, stronger perspective. A stronger demand, yeah, yeah. 09:59:28 Right? So what authors did here? They basically like the previous study that I showed from my Rushworth lab, they had a separation between the learning of the task and then putting their subjects on the in Fmri and basically pro probe don't recall of the sequence here they are imaging 09:59:54 subjects while they are learning the task. Okay? And what they do, they basically take a Tdrl model. 10:00:02 It's basically Tdd, Rl, is a temporal difference. 10:00:05 Model. It's a temporal difference model. It's a model, free reinforcement. 10:00:08 Learning. And it's basically want to minimize so in the Td, you want to basically have a prediction of the future value right? 10:00:20 And if there is an error between what you predicted and the reward that you get, then you basically kind of you want to minimize that error here, the error is. 10:00:33 So there is no reward for stops really, except for just like finding the target right. 10:00:40 But they they fit this Tdrl, and they see that like they kind of can show that them the Td model follow us pretty well based objects. 10:00:52 Alex I don't remember whether they asked afterward. I think they did. 10:01:04 Maybe I don't remember. 10:01:09 I mean, this is similar to what a lot of say yesterday about. 10:01:17 The experience in demons, even if you could read the households. 10:01:20 First of all, you know, compensate what's going on. 10:01:26 It's not at all necessarily a correlation between whether they can see what was going on. 10:01:30 And they actually exactly on the past. 10:01:35 Going on. 10:01:52 Okay. 10:02:07 Explicitly. Yeah, that's a good point. II don't remember whether in this, in this excluded. 10:02:34 Always the case. 10:02:42 No, but don't say that, that's. 10:02:45 Origin of Sickness. Version of the Physical. 10:02:56 I'm saying is that if it excluded people who had of the what's up the statistics that we were, that the. 10:03:10 It's okay. In this case, her I haven't checked the supplementary. 10:03:14 I don't know whether they are reporting subjects by subjects, but the first of all, there is this objective report at the end, and then whether that correlates anyhow, with telecom, I don't know. 10:03:25 Yeah, that's a good thing. Yeah. So. 10:03:27 But then the thing is that so? The first of all, they do have this 3D model model. 10:03:32 That explains these behavioral findings. The learning of the sickness, possibility leading to this faster and faster reaction time. 10:03:40 And then the interesting thing is that their brain areas that show up even learning of these, our strategy. 10:03:51 We're talk, and it's a it's a question why, it's not hippocampus and and they just kind of bring it up in the discussion. 10:04:01 They don't have any explanation, except saying that the task design which is always the most disappointing answer. 10:04:08 So we find some results that are different between Paradise say that all part of and has this difference. 10:04:14 But why that matters? It's not clear. Yes, right? 10:04:29 For a see, create. 10:04:36 Often visions. 10:04:40 Sure. Yes. 10:04:46 Why do you think what is the link between that category learning and this this? 10:04:53 So I'm not really surprised at this structure as they show up. 10:05:02 But I'm kind of surprised that people campus and prefrontal cortex, and those and lead dollar those areas. 10:05:09 They don't show what what right? So exactly. That's a very good point. 10:05:15 So this non-ejacent structure, it seems, can be important for the for what happens in hippocampus. 10:05:23 Versus others. Right? So if it was just like A/C, which you, you make this task from non adjacent dependency to adjacent dependencies, then hippocampus can show up. 10:05:35 Yeah. 10:05:38 You know long standing debate in Mike Burgundy, Cisco, and whether or not sustning is a modal, or whether or not it shifts depending perceptual modality or stimulus type. 10:05:50 Cetera. When I first started doing that kind of work, I felt like it was ultimately probably a very boring question coming up in a way that I think ultimately probably matters quite a bit. 10:06:02 Not. Everyone agrees. Some people. 10:06:06 Modality, mutual respect for those show up in statistics. 10:06:13 Never seen a compelling example of living. 10:06:17 That's what I would have guessed, really as surprising to me because there's only a lot of studies looking at odd traces to the learning. 10:06:29 The one interesting that I think directly bears on it helps her make, which is looking at people, visionions. 10:06:37 Yes, exactly. There's one auditory. Cisco learning. 10:06:41 Test, and they do significantly worse than chance it's hard to know how to interpret that. 10:06:45 Right? Basically, that's the only database that it's a good one. 10:06:50 Say significantly, worse than chance. You probably have learned something, but that didn't show up in any other. 10:06:56 That's interesting. Because from from like rodent experiments, they are not statistical. Learning related necessarily. 10:07:02 But it's just about like presentation of auditory stimuli in hippocampus case that there's a lot of basically kind of a recording of our auditory sensory input in hippocampus. 10:07:13 So then why doesn't show up in humans, auditory studies, coloring task. 10:07:17 Sorry. 10:07:21 Acts and context. 10:07:27 So right. So now that you. 10:07:35 As those who often you know things that you need, the details of what's learned. 10:07:48 I mean again in ways that are on the surface and interesting to me. 10:07:53 Adopted this discussion. Are you moving forward? And that would statistics are. 10:07:59 Know the subcomponents where you see it altogether different. People who are interested in different questions are investigating different tasks, different modalities, different stimuli. 10:08:10 So it's hard to piece it together. What exactly does he want another? 10:08:14 Imagine that actually matter to see difference, auto play versus what is learned when it's learned, and my suspicion to work. 10:08:24 Really related work. I welcome corrections would be I suspect, a hippocampus will work with probably fairly tiny. 10:08:30 Learning is happening and those representations are being created. It's taking a dramatically different time course for auditory might miss it. 10:08:38 Interest ceremony, and my 3 to be slower. 10:08:43 So okay, what? 10:08:52 Problem 40. All the that people do necessarily model. 10:09:03 Yes, yes, exactly. Yes. So. Why, James, do you say that if it was adjacent to dependence, the hippocampus would show theoretically. 10:09:19 What a nice but but your hint was based on some experimental data I mean, that's fine. 10:09:28 Yeah, yeah, yeah, just see images who has images? Some actual reasoning structure. 10:09:38 So in the case, there is situations where you have. 10:09:42 Auditory signal, followed by a visual signal. 10:09:51 But do you think it's theoretically what's going on? 10:09:54 Why? Why do you campaign us? That it's in person? 10:09:59 Okay, yeah, yeah, yeah, yeah, okay. 10:10:07 Of that activating new striatum as well. So it's really complicated. 10:10:15 Right. It is complicated. 10:10:22 I'm sure it's very active, right? But what they right? 10:10:25 No, that's a good point. I basically. No, I didn't go through the details of the analysis. 10:10:33 They basically want to follow these reward prediction, error in the Td. 10:10:41 Following that signal, and that doesn't show up in stratum is important. 10:10:48 Okay, so that was a kind of I showed a couple of studies related to presentation of this. 10:10:57 Possibly kind of a value based Rl. Versus a reward. 10:11:03 Free structured learning in human brain. But how about other aspects of the physiology which is all these? 10:11:13 Probably you know about the link between dopamine responses and report prediction. 10:11:20 Ever classical work from shows that basically. 10:11:32 There are these, it's it's related to. 10:11:36 Then maybe I should. So it's related to basically the phase diagramming responses in in in Vta and then showing that it can follow the reward prediction that comes from like Missouri Wagner, model of reinforcement learning right? 10:11:58 So there is a at some point the monkeys were presented with the reward, and you see that there is a huge response in these documents in these dopamine neurons in Vta. 10:12:12 In response to the work which was unpredictable. 10:12:15 It was a big surprise to animals, and there was this huge response. 10:12:19 But then, after that, you make the conditional stimulus predictive of the reward, and then the Bp response to the reward goes their way, and it moves to basically Cs. 10:12:32 This is very classical, right? And then, if you omit the reward again, there is this negative prediction, error, that, just like the neuron, spin, the fire below baseline. 10:12:44 Okay, cool. So these type of basic component responses, they can basically account for a lot of model free value-based decision-makings and policymaking. 10:12:59 And it's based on some sort of a chashed value of certain actions. 10:13:04 But what is missing? There is their identity of basic stimuli or identity of future events, right? 10:13:13 So it's just like it's just represents the the value as if, like in our mind, we basically assign a good value to the to the core drive. 10:13:22 It's just like, whenever we go there something good will happen. 10:13:26 Either cookies or cheese and wine, but we would like to know what are they like? 10:13:32 Not just a value, but the identity of stimulus, of sensory representation. 10:13:39 That is kind of linked to that value. So candor coming face detector, common responses have some information about the identity or the sensory aspect of the environment around us, and for that there is this nice work from Soyenbaum that they wanted to test exactly that and they use the sensory 10:14:10 preconditioning paradigm. What does that mean? 10:14:12 That means that if you just put exposed animals in this case, rats to a sequence of stimuli, first c. Then X. Cx. 10:14:25 There's no reward. It's just like again, to smear exposure. 10:14:28 X always follows the X always fall of C, and again, they were just like 2 auditory input here. 10:14:34 Okay? And then, in the later phase, you you connect X to a reward. 10:14:43 Okay, so now ex is is followed by the Us. Okay, now, the question is that? 10:14:52 Because C was always followed by X before, and now X. Becomes predictive of us. 10:15:02 X is also credited to us. Right? So this sensory preconditioning paradigm can show that this association between C and X is learned right. 10:15:14 So in this specific experiment, that's their first exposed to this. 10:15:20 Multiple pairs of auditory input cnx, cnx. 10:15:25 And then they were also put in an arena, that there was a foot port, and then measure that they really don't go to their foot port, because there is no association between either of these sounds to the pork but then, after you basically make a link X to the foot port explicitly, they start basically are 10:15:48 going for the reward, and when you probe with C, they show the same learned behaviour. 10:15:54 So this learned behavior, then you use it to infer that in that preconditioning phase they had learned the sensory association. 10:16:02 Okay, now that you wanted to ask, what's the rule of Facebook comedy response during this, just like pure sensory association learning there's no reward involved. 10:16:15 Right. And then, whether that's transient physic, diplomatic response is important for that. 10:16:19 The bottom. The summary is that it is how exactly they tested that I need to explain an underparting which is blocking, and if you want to know that tell me, and I can go through the steps, how they prove that basically first, they did both basically in terms of blocking the dopamine response. 10:16:43 And also stimulating the funding response to show that it is necessary and sufficient for learning this sensory association so that's very cool. 10:16:53 Now, it seems that in reality it's not that, like dopamine response should always be linked to some sort of a reward. 10:16:59 It can also be play some role when we just like want to learn sensory associations, which again, just like can be used for that. 10:17:08 Model based learning, right. 10:17:19 No, I agree, I agree. So again. So right so right here again, if we move, always one step up abstract out that reward we're just like to talk about some sort of a objective function. I want to. 10:17:35 Internally, I would like to kind of learn this structure, and I would like to minimize error or make predictions there is this kind of a maybe you can kind of formulate an internal reward and you're reducing that. 10:17:49 And maybe document is always doing that. But then again, it's kind of a then sometimes we really talk about as some monetary stalar value given by the environment as an outcome of your action. 10:18:05 Right. So, whether for the organs, and really there is a distinction or not, we don't know. 10:18:27 Likely is. 10:18:30 Yes, yes, behavior password. If these. 10:18:41 Yeah. 10:18:47 During this. Yes, you know what we're common, that some award in. 10:18:57 But they were not really giving you a reward. 10:19:01 During this space the animals are just moving around. There is no report. 10:19:04 They are just being exposed to the sounds in this experiment. 10:19:10 It's very clear that there is no external reward exactly to the foot point. 10:19:17 Yeah, that's important. Yeah. 10:19:22 Relation to this insufficient for learning. Yeah, yeah. 10:19:28 Now rewarded association. Is there any kind of sense of what the scale is? 10:19:34 Right, because again, you would have said that the scale of how many people, or how you would. 10:19:40 Opt, yeah, because it might be important. But it might be we can call these false if if you were to put all that together right to say, I see. 10:19:50 Yeah, in this paper. They are not addressing that, as far as I remember, from there from the work. But yeah, that's very. 10:19:59 Okay, so. 10:20:06 Sorry. Sorry. 10:20:14 Sure. Yes. 10:20:19 Yes. 10:20:24 It is, it is totally totally so. I mean, there are some core that some theory is about the possible role of those neurotransmitters in the specific aspects of learning like learning. 10:20:39 And there's always some certainty associated to that learning, learning. 10:20:43 I'm learning something. How? What is my my certainty, and how am I basically tracking uncertainty in the environment it's actually you have a joint Phd student who is working with in the lab of Andrew Mcasill and she's specifically looking for basically the role of Northern 10:21:00 Ireland versus acidicoline, in coding different types of onsetertainty, expected uncertainty versus unexpected uncertainty. So there these all the things that learning. 10:21:16 Model. 10:21:29 Yeah. 10:21:37 , and that's why I like your and I would love to talk to you more. 10:21:47 Okay, so okay, I try to go quick. Ilia, do you know, how long do you need to go through those exam? 10:22:06 Okay. So I tried to go quick. In basically a couple of paradigms that it seems they have different results. 10:22:15 Again, in statistical learning. Context, one content, one famous effect in reinforcement learning task is blocking. 10:22:26 Okay. And one of them is forward blocking or caming, blocking. 10:22:30 And it's basically in one face of the X follows a okay and X can be the real part. 10:22:37 Right. So XA becomes predictive of X, then later on, you basically compound a with B right? 10:22:46 And now you want to see whether B becomes predictive of X, and the result is No. 10:22:51 So it's the Qq basically Q competition. When when we want to basically learn and sorry. 10:23:03 And. 10:23:08 A model that can explain this beautifully is basically the classical rascal of Agner model. 10:23:12 Because we want to have a conditioned stimulus. 10:23:16 You want to predict unconditional stimulus, you might want to minimize again the prediction error. 10:23:21 What happens is that it had the full predictive power over X. 10:23:27 Now that we add B, he's already predictive. I don't care about B anywhere, so there is no learning of B in being basically becoming predictive of X. 10:23:40 So it's a forward. And again it can be explained by the classical it's going to write them up. 10:23:45 There's another phenomenon which is backward blocking or retrospective evaluation in that. 10:23:52 In this paradigm A. B. They were all first compounded, and Ab was predictive of X. 10:23:59 Okay. Then you put it on the face that, like a alone, is predicated of a X. 10:24:07 If after this space you test BB is blocked right? 10:24:12 Because again, you're just like in this training phase, you gave all the predictive power to A and B becomes unpredictable. 10:24:23 Just a random question, but but I haven't thought about it now, because thinking of sistical learning ones forward walking the eighth that he would not be predictive. And I know that's one of the big fine ages. 10:24:37 If you just presented A. B. Possibly independent of the stint index. 10:24:42 That's the next slide. Okay? So follow blocking can be explained by this model. 10:24:48 Backward, blogking cannot. With the classical version there are variations of that meaning that the famous one is the is one ambassador modification of the model which basically, what changes is that the memory of that compound and the compound stimulus can be also basically 10:25:16 re-weighted right, meaning that A, B we're predictive of X when one of them alone is presented, you assign also a nonzero stincy to them to the memory of their of what was compounded with with a right and you kind of and you negatively weigh that and that 10:25:36 basically results in this blocking. You can explain this with the Karman filter as well, like a Bayesian notions of Rl, yes. 10:25:50 I am not, I'm sure people have looked at that. 10:25:53 But I'm not aware of it. I'm sorry, so. 10:25:58 What happens if, as Lauren suggested, we basically are embed. 10:26:06 A kind of test for this forward, locking backward, working in a statistical learning, paradigm, so people in fluorine, the lge lab they did exactly that. 10:26:18 So I don't go through the details of the experiment. 10:26:21 But again they have different phases of forward plugging. 10:26:26 Stimuli appears and then the subjects were told to basically, whenever to categorize a stimuli based on saying that they are either electronics or non-electrics. 10:26:38 And then what happens? That? So they put them in this kind of a phase. 10:26:43 A just a is predictive of x, and then, in phase 2, you you pair them with another stimulus, and then sometimes that a stimulus is extremely is always is, sometimes random, sometimes it's always like predictive of basically be and then you test them. 10:27:03 Okay. And what they show the standard is that there's no forward blocking in this paradise. 10:27:11 What those backwards looking, so what can we do with this result? 10:27:17 Okay. And it's pretty interesting. And the funny thing is that so? The this pre-print came out last year, and I had read it like about few months ago, and the title was that statistical learning is not error-driven. 10:27:33 Okay. And they were only talking about forward blocking and showing that former blocking doesn't happen in the statistical learning. 10:27:42 But preparing this talk, I went back to the print, and I realized that like about 2 weeks ago, they uploaded a new one which now the target target is forward and backward, blogging in statistical learning showing that if you are now test this basically use the same pattern test objects in the in for backward blocking that 10:28:00 happens, so you cannot conclude that statistical learning is not ever done. 10:28:05 But we also really don't know how to how to reconcile these 2 effects. 10:28:13 There's not a lot involved in. There is no reward involved. 10:28:20 No, they just like categorize the visual input as like, whether this is electronics or non electronics device. 10:28:30 But then the thing is that no sorry, but then, in the in phase 2, they can use their knowledge of. 10:28:39 That's how they are. They are testing the statistical learning part that they can use the knowledge of a in order to know what is B right, and they use that. 10:28:51 I think they get feedback. II actually I don't remember. 10:28:57 I have to. That's a good point. I don't know why I don't know that, but I don't know that. 10:29:04 But I think but I don't think they get stuck. 10:29:09 You can check. I'm pretty sure they don't, because in the intro they really go through the importance. 10:29:15 Of the feedback and reward and stuff. As I say, if it almost becomes a little bit of a non-adjusting scenario where you can just knock it on to. 10:29:27 See, that's how I would map it on there, you know there is no reward, that is, there's a stimulus. 10:29:37 Correct prediction of that is is. 10:29:42 Yeah. 10:29:55 Yeah. Okay, okay. So at this point, I wanted Ilia to give us 2 more examples of learning paradigms that it seems we cannot really reconcile them with reinforcement learning. 10:30:45 Driven learning to me. Reinforcement is in some sense a positive. 10:30:50 Her error. Driven is a negative error. But anyway, so, as an example, where it's actually going to be relative, because I have no idea how to view it in the context of this previous discussion, this is bird learning this is experiments that are done by sam sober a couple a decade ago, but a few 10:31:10 years ago we analyzed those experiments, and so, just to good things. 10:31:14 If you don't know the model systems, males sing, they at about 2 to 3 months age. 10:31:25 By that time they are listening to their fathers, or whoever else in the cage it could be a, you know, a speaker, and then they start, remembering that as a son song or some different species, bird, they will start remembering that as a song, but they basically listen to the role model that is next to them. 10:31:43 Memorize this. And somewhere inside the brain this is stored for at least that's our understanding of the system stored for their entire lifetime. 10:31:50 And then they start bubbling just like children right and over the course of about 2, 3 months this bubbling becomes very structured, very resembling to the father's song, and then some species crystallizes song at that point, and some other species keep on updating the song for the rest 10:32:11 of their life, so that if something changes with the their motor planned, if something changes, and the female doesn't quite like it, they can make some, you know, small adjustments to it, and things like that. 10:32:24 So a specific species that we work on is Bengalis Finch, which is one that actually is very plastic as an adult. 10:32:31 And so there are so the experiments goes the following way, take a bird you surgically implant headphones on this bird. 10:32:42 You. The bird is in the cage, the bird sort of leaves for a couple of days till it gets adjusted to the surgical implantation of headphones. 10:32:51 There is a microphone in the cage, the microphone picks up whatever the bird is singing, filters out everything else. 10:33:00 Within about 10 ms, which seems to be not perceptible to Ab Broad. 10:33:05 At least we think so. That might not be perceptible to a bird. 10:33:07 It raises all over the pitch of what the bird is singing, and plays it back to the birth. So the bird is singing when it is. 10:33:19 Rehearsing, you know, practicing himself with no female in the cage. 10:33:26 It is singing, and it sings at its singing a certain peach, and then suddenly, we'll lower that beach. 10:33:37 And so it here is something which is totally different, right? And so it tries to correct, and the usual story isn't about about Bothpark. 10:33:49 7 days it would correct anywhere from. 10:33:51 Maybe 60% or 50% to 0% of that beach. 10:33:57 Of this picture. Right? There are multiple other interesting questions about generalization. 10:34:03 You can sort of target individual syllables and then it will start correcting some of the syllables and not the others. 10:34:11 And there is interesting stuff that we are doing there to try to understand that. 10:34:13 But let's leave it aside for a second, and so there is this interesting couple of observations so the first one, if I plot on this axis time, and this is, let's say this is 7 days, and this is 7 days after we have started the beach shift experiment and this is about 2 weeks 10:34:34 after the implantation has happened, and there was a week where the bird was adjusting and figuring things out and developing how it's going to be singing with the headphones, and so on and on. 10:34:44 This axis. I'm going to plot percent compensation. 10:34:50 Okay. And so you can shift the syllable. You can shift the pitch of the bird by, let's say, one semitone or let's start with half semitone. 10:35:05 Right. So the half semitone, the correction is going to look like this, and so maybe this is one. 10:35:12 There is also interesting stuff going on with the bushout. There are multiple timescales. 10:35:21 So there's wash out their savings, and all of those things that some people in their room are going to like. 10:35:25 But let's leave the society. This is not necessarily crucial to the main point. 10:35:30 Yes. Here. Yes. Yeah. Yeah. What is your? No. This is what the bird compensates. 10:35:39 Right. We. 10:35:44 We will always move by one note. So what? So the bird, let's say, sinks at a pitch of one. 10:35:52 We make it so. It believes that it's sinking at a pitch of one and a half right. And then the bird says, Oh, crap! 10:36:02 I'm singing too high, so I'm going to start singing lower. 10:36:04 And so it goes, maybe down to 0 point 7 ish, which? What will be that? 10:36:10 The board is actually hearing 1.2. Alright we are, whatever it's singing, we are lowering it by 1, 1 one node right? 10:36:19 The offset in, except just wait for a second. Okay? So so so the experiment goes like this, right? 10:36:29 And so this is what the bird does. So this is at half a semitone, at one semitone the bird is going to do this at 2 semitones. 10:36:41 The bird is going to do this, and that's 3 semitones. 10:36:45 The bird does not come and say it all. Thank you. So there is this interesting issue here immediately, that if you were looking for any kind of error driven. 10:36:59 Correction right? You would expect larger error to resolve in a larger compensation. 10:37:06 So here, not only fractionally, you're compensating less, but you are compensating actually less in absolute values. 10:37:12 Here. It's just the 0 compensation. So for some one can start thinking about that, the animal is maybe throwing away Samsung. 10:37:23 As this is just wrong, I am misfred. It's an outlier things like that. 10:37:27 And so you can try to model. People have tried to model this with a simple sort of reinforcement, learning-like protocols where the animal sings something has an error that it measures. 10:37:43 This error is fed back into the into what animal is going to be producing, and the next step is going to produce a slightly smaller error on average, and which hand loser has done a lot of work on this type of models. 10:38:01 So you can explain at least partially this type of experiments. 10:38:05 Not quite, but all. If what you say is that the animal at some point evaluates whether this error is even closausible. 10:38:13 Given what I've and it will reject some experiments with just crazy, not not something that that could have arised from me, singing and producing a small error and then those experiments were not rejected. 10:38:29 Those instances that were not rejected, and you use simple negative feedback through sort of reinforcement, and then you can produce almost this type of curves, and then the experimentalists do a different experiment. 10:38:41 So what's they do is instead of shifting the peach by one unit at a time, and then keeping it fixed. 10:38:52 What you do is you start shifting the pitch by a tiny amount every day or every few days. 10:38:57 Okay. And so here. Now, I'm going to plot, not present compensation, but total compensation. 10:39:05 And here I am going to put days, time, and this is going to be about 40 days. 10:39:11 The experiments take slightly longer, and what happens is this is going to be the perturbation. 10:39:24 In the same units as this sings, and so it goes up to about 3 semitones. 10:39:32 And this is what animal does. 10:39:43 So at this point it's a bit less than that. 10:39:47 There is no evidence for situation. It keeps on going, as far as we know, but at 40 days it had phones fall off. 10:39:53 So if you've had of the work at this point, we only have one or 2 birds left. 10:39:58 We start with 6. So there is no situation, as far as we know. 10:40:03 It gets to about 40. So there's a different 40% compensation. 10:40:08 This difference is about as much as 2 ceterones, which were completely or almost not corrected in this original experiment. 10:40:18 And suddenly the 3 semitone era, where the close to 2 Simaton residual era is still driving, zoning. 10:40:26 And so this now gets to be very difficult to explain. With any kind of simple variations. 10:40:34 Of reinforcement. Learning techniques, reinforcement, learning models. 10:40:41 And that's what we tried to play to try to to, to explain what goes on with this in this system. 10:40:47 The first thing that you realized to try to explain this is that the animal, as I mentioned yesterday, when it sinks to the females, it deliberately introduces error into the signal. 10:41:04 Okay. So if it's a when it doesn't seem to females when it sings to females, the variance of the beach is very small when it sings to himself. 10:41:17 The standard deviation is about a factor of 2 large as variances, a factor of 4, therefore larger. 10:41:22 So it looks like this error that is being produced by the animal is not really an error, but it's a deliberate injection of noise into the system of trials, into the system. 10:41:37 In fact, there is a part of the brain Al man, which is considered in this field as basically a random number generator, right? 10:41:45 Which injects noise into a system. I'm not quite like my personal feeling. 10:41:48 It's not quite correct for you in it, but that's sort of the general law in the system. 10:41:54 And so, and one can ask what it is that the animal actually does so if you are sampling, are producing this noise, and maybe it's reasonable to say that what the animal is trying to do, it's not trying to hit a particular target. But it has some kind. 10:42:10 Of a probability distribution of which particular motor commands may result in the targets that I am interested in producing. 10:42:19 And maybe I'm not hitting when I'm singing to a female. 10:42:21 I'm picking up the highest mode of the most likely value of the motor command. 10:42:28 But when I am not singing to a female, and I am doing just simple Bayesian inference, I'm going to sample from the probability distribution of things that could have resulted in a reasonable in a reasonable behavior. 10:42:45 I'm going to sing it. I'm going to see what I produce. 10:42:47 There's going to be some error signal, and I'm going to update sort of do a Bayesian update right? 10:42:53 Yes, you see. 10:43:06 Sorry. I'm not sure I understand. You said that for birds they have for the males, but they're singing to themselves as much larger, larger, larger variants. 10:43:17 Yeah. Across all species, regardless of how classic. I don't know if it's true across all species. But it's true. 10:43:24 Across the 2 different types of feeds which are basically where most of his experiments are done, and one of them is very plastic. Bengalis. 10:43:32 And the other one is not as black stick as adult, which is zebra. I. 10:43:44 The errors are smaller still so, and zebra is also a plastic. 10:43:49 It's just less plastic, right? Not at all. But now they're actually it is on the scale. 10:43:58 So they for the Bengalis finches the amount of hours that you produce is about when singing to themselves is about depending on specific individual but one half and one semitone for the standard deviation. 10:44:14 Yeah. Yeah. The first way? 10:44:21 Essentially 0 for all practical purposes. Each next syllable is, I mean, it may be, 3 or 4. 10:44:31 Renditions. It's it's also kind of complicated just to to like, how do you measure curlation time? Right? 10:44:40 Because they don't sing on the clock. They sings and they rest, and they do something else right? 10:44:43 So are you going to look at time in units of time or times? 10:44:48 Units of renditions, or at times, since you're lost at times since you lost 8, and how many eating things have come, and there's all sorts of interesting things that come in. 10:44:58 It turns out that overnight there ares something that happens so waiting overnight versus waiting during the day for the same amount of time creates very different dynamics, and I think we can now explain some of that sort of but that's a different story. 10:45:13 So, but football, but, roughly speaking, so the correlation time of this fluctuations is much smaller than one day, and a typical dynamics that we're talking about here is on scale of days, not on the scale of of individual orientation right so anyway, so 10:45:34 how one can try to model this. I'm just trying to be skipping everything very, very quickly, but the idea is that what we're thinking about is that the animal is not really doing error, correction, error, correction. 10:45:48 But rather here is a probability of what I can be singing. 10:45:52 I'm going to pull something at random. From that probability. 10:45:56 I'm going to sing it. I'm going to see what I have produced this will create for me a standard sort of likelihood term of how likely it is at that command, actually, which result in something that I wanted to do. 10:46:08 And then you do this Bayesian filtering, and the interesting part. 10:46:13 Why this would, if your probability distributions and everything involved were just Gaussian, it would make no difference. 10:46:25 Inference. You just have the where you are tracking the mean, or your tracking the whole distribution would result in the same in the same thing. 10:46:35 But it turns out that these animals have extremely long tails in houses. 10:46:39 In the probability it's the probability distributions that they sing. So if you were to plot on the log units the probability of a pitch. 10:46:52 And this is going to be Vlog. And this is a speech. 10:46:57 So this would be Gaussian thing right, and the animals saying something like this. 10:47:06 So we have pitches that are produced that are 6 7 standard deviations outside of what a normal. 10:47:18 What if I were to calculate, Senator deviation of the signal? 10:47:21 There are still pitches with probability, 6, 7 standard deviations outside, right? 10:47:26 And so this creates an interesting dynamics in this system. 10:47:33 So when you are dealing with just simple Gaussian insurance, probabilistic inference, you can sort of Victorially think about it as the following way, I just don't even have to write anything. 10:47:48 You will have a Gaussian distribution of your prior what it is that you have just saying, and you're going to get a Gaussian distribution which comes from a likelihood product of Gaussians is a Gaussian that sits somewhere in the middle between the 10:48:04 2, right? And so in that way you are going to your Gaussian with time is going to slightly shift, and with linear Gaussian learning all of these curves would have to be on top of each other the larger the earth, the more shift you're going to produce and it's always at the point 10:48:24 of linear learning twice as big as error. You adapt twice as fast right? 10:48:29 And so all these curves are going to be on top of each other. 10:48:33 However, if I am dealing with distributions which have long tails, if I have distribution, which is, let's say, cache distribution, and I have my prior, which is kashy and likelihood, which is maybe also cashi. 10:48:47 If they happen to be close to each other, the product is still going to be somewhere in the middle, and you're going to nicely, linearly, adjust. 10:48:55 It like basically near the peak, every thing looks like a Gaussian. 10:49:01 Right. And so the same structure is going to work. And that's what happens when you are in this region for a small error. Right? 10:49:08 When the error is comparable to the weeds of this individual Gaussian distributions, which we estimate from data, and so on and so forth. 10:49:15 However, if you have 2 cache, one looks like this for the prior and the other one for the likelihood, and they are outside of their own wits, and you have multiplying them to produce a posterior. 10:49:34 Your overall distribution is going to look like this. Okay? 10:49:39 And so you will start developing 2 humps. And in this, in this, in this, in this picture. 10:49:46 And so, if you are sitting, if your prior was somewhere here, and the likelihood is small. 10:49:54 And now I wrote them down as 2 equal size signals. 10:49:56 But let's suppose that this one is much smaller than that than the thing that you are going to develop is going to look like this. 10:50:06 You are going to basically not move. And you are going to change the shape on the left side, becoming skewed. 10:50:13 The distribution is going to become skewed and maybe with time you will even develop some kind of bymodal structure of what you might be singing. 10:50:24 And so long. Story short, for this type of distributions, this is very well-tated, that even though the animal is not correcting, it's actually developing some tail on the left side, there's almost no correction. 10:50:35 But the animal is developing something here and then for this distribution of what happens here is you eventually developed this bimodal structure. 10:50:45 And really what happens that some birds sing at almost fully corrected or close to that? 10:50:50 And some birds sing not corrected at all. It just happened to be in one of those cases, and the. 10:50:59 Interesting part there is that the entire shape of the distribution of what an animal is changing over the animal is singing is changing right? 10:51:12 So in standard models based on Kalman filters, on some linear, predictive, different types of models, reinforcement models, etc. 10:51:21 Usually you are going to take a variance as a given. 10:51:25 So here, as the experiments, especially in this case, as the experiment progresses, the variance of the of what's the bird thinks increases as well, and that's why the animal is still correcting is because when you are sitting here variance of your song is now so big that it catches 10:51:44 sometimes even the true perturbation, and so it's not rejected in some sense, right? 10:51:50 Because the variance has grown and so you end up in a situation that looks closer to this rather than closer to this. 10:51:58 And so the model has. Once we developed it has 1718 parameters. 10:52:03 That's a lot of parameters, right? Because see distributions you have to describe in terms of. 10:52:08 This is not not means and variances, because they are not Gaussians, so you have to describe them in terms of some kind of shape, for parameters were describing them as Alpha. 10:52:18 Stable distributions. Because hopefully, what happens anymore learns from day to day to a day. 10:52:25 And so the shape gets to some kind of limit, non-gaian version of. 10:52:31 For the situation. It's called Alpha Stable distributions of different shape. 10:52:38 Yes. 10:52:42 Yeah, as. 10:52:53 Just the distribution of all, so that. 10:52:59 There is, you know. 10:53:04 Variance, that model is what you hear about another would be. 10:53:11 Here's what I just reviewed. 10:53:14 So has not been done. It has not been done right. So. 10:53:23 Yeah, yeah, just to finish in the previous sort. And then I'll come back to this right? 10:53:28 So, even even though I said, the model has 17 or 18 parameters. 10:53:37 7 days. Here there is also washout that I didn't describe, so 15 total days at different perturbations, and at each one there is a probability distribution of the songs that the bird produced, and there is 40 days plus watch out here, and so overall we're talking looking at like 100 10:53:54 and 50 different probability distributions, and so, and and fitting 150 probability distributions with the same 17 parameters, is kind of impressive, and the kids are very good right there within statistic. 10:54:08 You know the agreement between experiments and the model, isn't there of us right? 10:54:13 So all of the thing that you are asking are very difficult just now, switching gears to do because of how difficult the experiments are like when they have 6 birds in each one of those lines, and after 40 days the headphones fall off. 10:54:30 It's a very slow experiment. So now what we are trying to do is to just throw away the birds here. 10:54:36 First of all, the birds can also learn much faster if you, instead of letting them play on their own. 10:54:45 If you punish them so every time they sing something wrong, you bing it with electric shock, or you just make a really loud noise in the cage. They don't like that. 10:54:56 Then? What usually takes about 7 to 15 days happens in a few hours. 10:55:01 And it's not clear whether these are same passways. 10:55:05 And that's something that we are trying to figure out before we try to do faster learning. 10:55:10 But another thing that we are playing with, we're trying to develop a very similar model for things like gate adaptation for humans. 10:55:20 And there shockingly, you can generate as much data as you want, because I have access to 100 psychologists, students who are required to take participate in psychology experiments as part of their psych 101 class right? 10:55:34 And so we can put them on treadmill and split treadmill, which is in a different in the steps, in the speed of the 2 belts, is equivalent to this turturbation. 10:55:46 I can track an individual pure and see what is their distribution of Asynchrony between the steps, and then try to develop a similar model and and making 10,000 steps. 10:55:59 Is not that hard? Right? 10,000 total renditions of a song is the entire Broad's lifetime. 10:56:04 10,000 test steps is one day for us. Right in an experiment, and I can take 100 students to do it. 10:56:13 So I don't think we're going to be able to do the things that you are asking in a bird, but working on humans turns out to be much faster and much easier right? 10:56:22 Okay, so that's it. And it, yeah. But I could talk about the warrants. 10:56:26 But let's just because we Yup, it's also. 10:56:45 Online mission. 10:56:49 Of course, 12. 10:56:58 For a long time. Yeah, but. 10:57:12 Oh, yeah. 10:57:19 And we've all been using. 10:57:24 Ontario. Remember that you single cushion, plus, together with few authors. 10:57:29 Yeah, yeah. 10:57:33 Yeah, yeah, we actually, we did both in the appendixes. We did all of the stuff right? 10:57:46 100 repeat, but. 10:57:52 We come back at 1130.