The International Journal of Comparative Psychology is sponsored by the International Society for Comparative Psychology. It is a peer-reviewed open-access digital journal that publishes studies on the evolution and development of behavior in all animal species. It accepts research articles and reviews, letters and audiovisual submissions.
Volume 27, Issue 2, 2014
Persistent memory retention of reward events and proactive interference in reward series learning by mice
This study examines acquisition of a single alternating series of reward quantities in mice. Four male ICR mice, trained in a straight runway, showed deferential response to items in a 3-0-3-0-3-0-3 series, constructed from a varying number of 0.045 g food pellets under inter-trial intervals(ITI) of 30 s (Experiment 1) or 20 min (Experiment 2), by running more slowly to nonrewards than rewards. Although mice showed reliable item anticipation under 20 min ITIs, nonreward anticipation became poorer in later serial positions than in earlier positions. It is possible that gradual deterioration of nonreward anticipation in a series is caused by proactive interference from previous item memories, since the nonreward anticipation was improved when the target item was divided by a long 120 min interval from prior items that were a potential source of proactive interference (Experiment 3). In Experiment 4, mice learned to respond differentially to the second item of 5-0 and 0-5 series with an ITI of 180 min. These results suggest that mice can discriminate reward magnitudes by forming item-associations between adjacent items and retain information of a previous item for a long interval, and that proactive interference occurs among item memories in a series.
Rats exposed to a downshift in the concentration of a sucrose solution from 32% to 4% exhibit a transient suppression of consummatory behavior relative to an unshifted control always exposed to 4% sucrose. One explanation of this effect, known as consummatory successive negative contrast (cSNC), explains consummatory suppression as arising from an emotional state of frustration that redirects behavior away from the source of the devalued solution. A preliminary selective breeding protocol consisting of three experiments was performed. Experiment 1 reports results from 5 generations of selected breeding for either high (H) or low (L) recovery rates from cSNC. A control line of randomly (R) mated rats was included. cSNC was reduced in H rats, but L and R rats did not differ across generations. H rats also provided no evidence of behavioral activation in acquisition or increased persistence in extinction after partial reinforcement, rather than continuous reinforcement. L and R rats, by contrast, showed both of these effects. H rats were also significantly smaller in body size than R rats, but did not differ in terms of water intake, sucrose sensitivity, open-field activity, or responding to sucrose solutions before the downshift. In Experiment 2, H infants from the sixth selected generation showed increased bandwidth in vocalizations induced by mother-infant separation relative to L and R rats. Experiment 3 showed that H rats failed to show increased response to incentive downshift after treatment with the nonselective opioid antagonist naloxone, as done by L and R rats. The results, if replicated, may provide support for the interpretation of a significant role of frustration in cSNC.
Special Issue Introduction
A rule about the control of variability is reducing expectation of reward increases variation of the form of rewarded actions. This is a rule about how food-getting knowledge is gathered, something we know almost nothing about. Almost all instrumental learning experiments are about how food-getting knowledge is used. Not only do we know almost nothing about how such knowledge is gathered, the question is almost never studied. Similar gaps exist in the study of human learning and economics. In human learning, how the environment controls curiosity is never studied. Likewise, economics theories are almost entirely about how people use economically-valuable knowledge. How the environment controls creation of that knowledge is a mystery that textbooks and researchers ignore.
Volition has been debated for thousands of years: what is it, how is it possible for biophysical beings to behave in a voluntary manner, indeed, does volition exist? Evolution of volition has rarely been part of the discussion. In this paper, I argue that operant-conditioning studies provide evidence for evolved volition. Three attributes are common to operant and voluntary behaviors. One is that responses are goal-directed, purposeful, some say rational, or controlled by reinforcing consequences. A second is that the responses vary – from random-like to repetitive – with predictability (or unpredictability) depending upon contexts and consequences. A third attribute is that responses appear to be self-generated or, in operant terms, emitted. These attributes are found in many species, simple to complex, but species also differ in details. Taken together, the evidence supports an evolutionary basis of volition.
What effects reinforcement is assumed to have and what data are collected depend on what behavioral variability means. It has extremely different meanings in molecular, molar, and unified behavior analyses. In molecular analyses the term relates reinforcement and moment-to-moment behaving of an individual organism, as when hand shaping creates new complex paterns extended in time or as when cumulative records show complex patterns. Molecular behavioral variability is easy to see, as in these two examples, but is hard to describe quantitatively. Behavioral variability in the context of molar analyses requires first aggregating behaviors, then counting them or finding their cumulative durations, and finally quantitatively summarizing the aggregate by a statistic, usually an average rate of occurrence of, or an average time allocated to, the aggregated behaviors. The statistic can also be a measure of variabiity, like the U statistic, rather than of central tendency. Molar behavioral variability can also be quantitatively defined as the variabiity of a statistic describing some property of an aggregate as a function of time, individuals, or, most commonly, experimental parameters. Some molar accounts interpret the aggregate statistic itself (average rate, time allocation, or variaibity) as an operant response. Quantitative theories account for over 90 percent of this kind of variability in thousands of molar analyses. Molar variability, however, seldom describes or explains molecular variability, and a common molar interpretation of free-operant behaving is that molecular behavior varies ony randomly over time with a constant probability. There is little, if any, evidence for this interpretation and a considerable literature that suggests it is incorrrect. A unified analysis combines automated shaping of molecular, quantitative patterns of behaviors, a molar aggregate of those patterns, and one or more statistics descriptive of the aggregate. A unified analysis involves both kinds of quantitative behavioral variability: moment-to-moment variability of shaped patterns resembling target patterns, and molar variability of a statistic defined over an aggregate of such shaped patterns, such as the variability of the average rate of, or time allocated to, a shaped pattern. Only simulation theories seem sufficiently powerful to produce a general and unified theory to account for both moment-to-moment behaving and statistics that describe molar aggregates.
An n-armed bandit task was used to investigate the trade-off between exploratory (choosing lesser-known options) and exploitive (choosing options with the greatest known probability of reinforcement) human choice in a trial-and-error learning problem. A different probability of reinforcement was assigned to each of eight response options using random-ratios (RRs), and participants chose by clicking buttons in a circular display on a computer screen using a computer mouse. To differentially increase exploration, relative frequency thresholds were randomly assigned to each participant and acted as task constraints limiting the proportion of total responses that could be attributed to any response option. The potential benefit of increased exploration in non-stationary environments was investigated by changing payoff probabilities so that the leanest options became the richest or the richest options became the leanest. On the average, forcing participants to explore at moderate to high levels always resulted in their earning less reinforcement, even when the payoffs changed. This outcome may be due to humans’ natural level of exploration in our task being sufficiently high to create sensitivity to environmental dynamics.
Use of self-organizing maps for exploring coordination variability in the transition between walking and running
This study investigated multi-dimensional coordination instability and variability in the transitions between walking and running for a 26 year old female runner using self-organizing maps (SOMs) in three experimental procedures. We found different multi-dimensional coordination patterns for walking and running using the output from SOMs as stride trajectories on U-matrices and attractor diagrams. In transient conditions, the participant showed multi-stability, or instability, in the transition region for decreasing but not for increasing speeds. She also clearly showed increased multi-dimensional coordination variability around the transition region only for decreasing speeds and only in transient conditions. These findings may not be general across runners nor were they conclusive enough to support variability as a facilitator of the change from running to walking. Self-organizing maps provide us with a tool to study multi-dimensional coordination (and coordination variability) and to reduce its complexity to relatively simple map outputs, including basins of attraction and attractor landscapes.
Pigeons can learn structured sequences of cued responses and perform them quickly, even when random variability is later introduced into the originally learned sequence, making some cue locations unpredictable. In order to determine if initial learning shows the same tolerance of spatial variability as steady-state performance, naïve pigeons were trained on random distortions around a structured sequence without having seen the original sequence itself. Learning was possible, but accommodated less variability than did performance of the same sequence previously learned in an undistorted context. Analysis of results indicated that performance of a randomly distorted sequence is best when birds are initially trained with little or no variability, and randomness is later introduced in a gradual fashion.
The goal of this study was to quantify the inter-individual and intra-individual variability of manual (digits) skill in adult macaque monkeys, over a motor learning phase and, lateron, when motor skills were consolidated. The hypothesis is that several attributes of the stable manual dexterity performance can be predicted from learning characteristics. The behavioral data were collected from 20 adult Macaca fascicularis, derived from their dominant hand, defined as the hand exhibiting a better performance than theother. Two manual dexterity tasks were tested: (i) the modified Brinkman board task, consisting in the retrieval of food pellets placed in 50 slots ina board, using the precision grip (opposition of the thumb and index finger);(ii) the reach and grasp drawer task, in which the grip force and the load force were continuously monitored while the monkey opened a drawer against a resistance, before grasping a pellet inside the drawer. The hypothesis was verified for the performance of manual dexterity after consolidation, correlated with the initial score before learning. Motor habit, reflected by the temporal order of sequential movements executed in the modified Brinkman board task, was established very early during the learning phase. As mostly expected, motor learning led to an optimization of manual dexterity parameters, such as score, contact time, as well as a decrease in intra-individual variability. Overall,the data demonstrate the substantial inter-individual variability of manual dexterity in non-human primates, to be considered for further pre-clinical applications based on this animal model.
- 1 supplemental PDF
There has been a recent surge in the experimental investigation of the control of behavioral variability. Currently, it is understood that variability in behavior is predictably modulated by reinforcement parameters (e.g., a probability of reward delivery and reward magnitude). In two experiments, we investigated how spatial proximity between response and reward locations impacts the production of behavioral variability in both response rate and lever press duration. Rats were trained to lever press on two levers in a standard operant chamber that only differed from one another in their proximity to a food niche (i.e., Near vs. Far); a second experimental factor, the probability of reward, was signaled by an auditory cue. In Experiment 1, trials with a high-probability stimulus terminated with reward on 100% of trials, while trials with a low-probability stimulus terminated with reward 25% of the time. We conducted a similar procedure in Experiment 2, but reduced the likelihood of reward on low-probability trials to 10%; additionally, we collected data in a post-acquisition extinction test. Overall, reduced proximity and probability increased variation of response rate, whereas only the probability factor affected lever press duration. Proximity also interacted with probability to influence variation in response rate. These findings extend the factors modulating behavioral variability to include the spatial proximity between a response and reward.
From a stimulus-response (S-R) point of view, or even with an intermediate step, involving cognition (S-O-R), the existence of behavioral variablity in organisms, even under tightly controlled experimental conditions, suggests that 1) the relevant inputs to the system have not been fully characterized, 2) even the most minute difference in system inputs can produce vastly variable behavioral output, or 3) that behavior is fundamentally variable. Any of these possibilities leads to the conclusion that precise behavioral prediction, at any given moment, is virtually impossible. One can, however, re-conceptualize the challenge of understanding behavior such that it involves not what the organism will do from moment to moment, but what the characteristics of the system that governs the behavior of the organism are. In this paper, I outline a closed-loop cybernetic approach to understanding behavior, for which behavioral variability is actually a requirement. Findings are presented from a series of experiments across species, and using computer simulations, that support a cybernetic interpretation of behavior. I argue that behavioral variability provides adaptive advantages to organisms – regardless of whether that variability is produced by noise, or is actively generated by nervous systems. Finally, I discuss some ideas from embodied cognition that impose constraints on the variability of behavior.
This paper provides a summary of a 1969 report (Pryor, Haag, & O’Reilly) of the spontaneous emergence of innovative behavior of a dolphin, a replication of this event through training in another dolphin, and the effect this work has had on current animal training technology. This paper provides a review of laboratory based research in support of some of the procedures found effective in modern animal training in developing innovative behavior, specifically use of the conditioned reinforcer to mark a behavior, differential reinforcement of variability, and intentional use of positive reinforcement procedures. The authors describe specific processes for establishing innovative skills, practical applications presently in use with animals, consequent human and animal welfare benefits, and suggestions for further research.