
Would you believe it if we told you you don't actually learn from your mistakes?
Learning doesn't occur from a mistake happening, but from when the result differs from your expectation.
The first person who looked into the role that expectation plays in learning was Ivan Pavlov. By studying the amount of saliva produced by a dog, Pavlov was able to infer that the dogs were learning what to expect from certain situations.
Pavlov integrated learning in four stages, as seen below. Initially, Pavlov would give the dogs meat, and the response the the meal was increased production in saliva. Pavlov would also create a sound with a tuning fork for the dogs, which created no response. Next Pavlov tried to condition the dogs by using the tuning fork and then giving them food, and clearly the dogs salivated for this stimulus, because of the food. Eventually however, the dogs began to salivate after only hearing the tuning fork, without having food given to them.
The dogs were conditioned to know that the tuning fork meant food was coming in the near future. The dogs began to learn from their expectations.
Learning doesn't occur from a mistake happening, but from when the result differs from your expectation.
The first person who looked into the role that expectation plays in learning was Ivan Pavlov. By studying the amount of saliva produced by a dog, Pavlov was able to infer that the dogs were learning what to expect from certain situations.
Pavlov integrated learning in four stages, as seen below. Initially, Pavlov would give the dogs meat, and the response the the meal was increased production in saliva. Pavlov would also create a sound with a tuning fork for the dogs, which created no response. Next Pavlov tried to condition the dogs by using the tuning fork and then giving them food, and clearly the dogs salivated for this stimulus, because of the food. Eventually however, the dogs began to salivate after only hearing the tuning fork, without having food given to them.
The dogs were conditioned to know that the tuning fork meant food was coming in the near future. The dogs began to learn from their expectations.
Reinforcement learning works on the basis of the creation of prediction error. When the actual outcome following a decision is different from what is expected to occur, the brain (in the basal ganglia) calculates a prediction error. When there is a difference in value between the current state and the preceding state, the brain calculates a prediction error.
Prediction errors can be positive and negative. If you expect something to happen, but things turn out to be better than expected, then a positive prediction error is calculated. When something bad happens that is unexpected, a negative prediction error occurs. However, when exactly what you expect to happen happens, there is no prediction error.
Learning from these prediction errors occurs in a two step process:
1) Calculation of the prediction error
2) The brain updates the previous value of the stimulus.
A study by Schultz in 2007 found that there a specific neurons that produce internal reward signals. These reward signals then influences the brain activity that controls actions, decisions and choices. However, the stimulus is paired with a fully expected outcome, no learning can occur! This proves that in order to learn, there must be a difference in what is expected and what happens.
Prediction errors can be positive and negative. If you expect something to happen, but things turn out to be better than expected, then a positive prediction error is calculated. When something bad happens that is unexpected, a negative prediction error occurs. However, when exactly what you expect to happen happens, there is no prediction error.
Learning from these prediction errors occurs in a two step process:
1) Calculation of the prediction error
2) The brain updates the previous value of the stimulus.
A study by Schultz in 2007 found that there a specific neurons that produce internal reward signals. These reward signals then influences the brain activity that controls actions, decisions and choices. However, the stimulus is paired with a fully expected outcome, no learning can occur! This proves that in order to learn, there must be a difference in what is expected and what happens.

Dopamine is the currency of prediction error. Dopamine is the neurotransmitter that is responsible for feelings of reward or pleasure. When the brain calculates a positive prediction error, or more simplistically, things are better than expected, there is an increase in dopamine release. When there is a negative prediction error because things are worse than expected, there is a decrease in the release of dopamine. Dopamine scales to the amount of reward or consequence is given. For example, the more gold stars given per math calculation done correctly, the higher the dopamine release.
However, as learning occurs, dopamine firing adjusts with learning. So as learning occurs, and the subject begins to expect a certain outcome from a certain scenario, the prediction error becomes smaller and smaller. Dopamine is sensitive to prediction errors, and not the reward itself. Schultz's study proved this by looking at dopamine release at the time of stimulus and at the time of reward. As seen below, when people do not expect any reward and get one, dopamine increases at the time the award is given. When someone has learned that reward will follow a certain stimulus, and then reward, as expected is given, then there is a slight peak of dopamine release following the stimulus, this is the recognition that reward is to come. However, when the reward is actually given, there is no increase in dopamine release because of the lack of prediction error; everything occurred exactly as expected. Finally, when a reward is expected but then not given, the initial spike of recognition occurs, but when no reward follows, dopamine release decreases to below normal levels, leading to feelings of negativity.
However, as learning occurs, dopamine firing adjusts with learning. So as learning occurs, and the subject begins to expect a certain outcome from a certain scenario, the prediction error becomes smaller and smaller. Dopamine is sensitive to prediction errors, and not the reward itself. Schultz's study proved this by looking at dopamine release at the time of stimulus and at the time of reward. As seen below, when people do not expect any reward and get one, dopamine increases at the time the award is given. When someone has learned that reward will follow a certain stimulus, and then reward, as expected is given, then there is a slight peak of dopamine release following the stimulus, this is the recognition that reward is to come. However, when the reward is actually given, there is no increase in dopamine release because of the lack of prediction error; everything occurred exactly as expected. Finally, when a reward is expected but then not given, the initial spike of recognition occurs, but when no reward follows, dopamine release decreases to below normal levels, leading to feelings of negativity.
This proves that over the course of learning, dopamine response to the preceding stimulus is learned. Schultz was able to prove this in behavioural experiments on learning. He found that the orbitofrontal lobe is able to distinguish the difference between reward (positive prediction error) and punishment (negative prediction error). In decision making moments, the dorsolateral prefrontal cortex the carries movement planning signals to the motor planning areas of the brain, all the while coding the expected reward for the movement that is currently being planned.
Learning, however, occurs at different speeds for different people and different tasks. The problem with reinforcement learning is that though the subject learns a little bit each time, however, this process takes a very long time. Trails upon trails could be completed, each time updating the prediction that a subject has for a situation. Though reinforcement learning is highly esteemed for being the best way to learn something as it creates the largest differential in dopamine release, its time consumption leaves much to be desired for learning a task quickly. Due to this, other types of learning are utilized in educational environments to ensure the learning of specific tasks or skills occurs at a faster pace.