Holroyd, C.B. and Coles, M.G.H. (2002). The Neural Basis of Human Error Processing: Reinforcement Learning, Dopamine, and the Error-Related Negativity, Psychological Review, Vol. 109, No. 4, 679-709.
It is clear that our superior ability to learn from consequences makes us quite exceptional animals. But what are the neural substrates for reinforcement learning? This article takes a stab at that question.
Researchers have inferred the existence of a generic, high-level error-processing system in the brain for some time. When human participants commit errors in a wide variety of psychological tasks, a negative deflection is witnessed in EEG data, deemed the error-related negativity (ERN), which appears to be generated from anterior cingulate cortex (ACC). On the other hand, researchers have argued that the mesencephalic dopamine system conveys reinforcement learning signals to the basal ganglia and frontal cortex, where they are used to facilitate development of adaptive behavioral programs. This article proposes a hypothesis which unifies the two – specifically, when human subjects commit errors the dopamine system conveys a negative reinforcement learning signal to the frontal cortex where it generates an ERN by disinhibiting the dendrites of motor neurons in the ACC (hence the negative potential seen).
To dig a bit deeper... first the mesencephalic dopamine system. This is a small collection of nuclei including the substantia nigra pars compacta and the ventral tegmental area (VTA) that project diffusely to the basal ganglia and frontal cortex. The consequence of stimulation from this area appears to reinforce learning, solidifying behavior. After learning how to complete a task properly, presentation of a reward elicits a phasic response in dopamine neurons. When a reward is better than predicted, a positive dopamine signal is elicited. And, as expected, when (i) an expected reward is not delivered, (ii) a reward is worse than predicted, or (iii) punishment is administered instead, mesencephalic dopamine neurons decrease their firing rate, falling below baseline. Interestingly, over time and practice on a task, the presentation of the reward no longer elicits the phasic dopaminergic response; instead, the conditioned stimulus predicting delivery of the reward elicits the phasic activity. As such, the phasic dopaminergic activity is said to propagate “back in time” from the reward to the conditioned stimulus with learning. Thus, the mesencephalic dopamine system can be understood to produce predictive and critical error signals which can be used by other parts of the brain for reinforcement learning.
Now the ERN. The ERN is a negative wave pattern witnessed during commission of an error, essentially the brain’s “Oh shit!” signal. The amplitude of the ERN increases with incentive (e.g. financial reward) and with the degree of error (i.e. very wrong errors as opposed to only slightly incorrect). The ERN can be elicited (i) by presentation of negative feedback to the participant, or (ii) by detection of error commission itself.
The theory that this paper offers is that the ACC, which generates the ERN and receives input from numerous semi-independent command structures, is responsible for conflict monitoring in the brain – detecting competing choices from these multiple motor controllers and resolving the response conflict. Its job is therefore to identify which of its input are best suited for carrying out the task – serving as a motor control filter – and finally transforming multiple intentions into a unitary action. They contend that the ACC is “trained” into choosing the correct controller by the mesencephalic dopamine system’s reinforcement learning signals, and that the ERN essentially reflects transmission of this reinforcement learning signal to the ACC. The mesencephalic dopamine system, then, plays the role of an adaptive critic, assigning a “goodness” or “badness” to witnessed response outcomes, and communicates its opinion to the ACC to bias future decision-making.
The article used behavioral data (a probabilistic learning task) to support this hypothesis, as well as computer simulation modeling which predicted the observed behavioral results.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment