International Comparative Psychology Effect of qualitatively varied reinforcement on response rates using substitutable consequences

In order to test the effects of qualitatively varied reinforcement on response rates, 3 experiments were conducted. The goal of the first experiment was to assess the level of substitutability between 2 reinforcers. Eight female Wistar rats kept on a diet consisting solely of turnip and millet seeds were exposed to a concurrent FR5 FR5 and then to a FR4 FR8 program. By the end of the experiment, there was a shift in consumption, albeit to a small degree. During the second experiment, 8 female Wistar Rats were exposed to a 3-component variable interval program: 1 during which only millet seeds were available, 1 in which only turnip seeds were available, and a third component in which both kinds of seeds were delivered randomly. By the end of the second experiment, the highest response rates were recorded during the component in which only millet seeds were available. Finally, a third experiment was implemented in order to assess whether the particular way in which the substitutable consequences are delivered (i.e., random or simultaneously) has an effect on response rates. The program for this experiment consisted of a VI 60`s with 2 components. During one of the components, a mixture of millet seeds was delivered when subjects responded after the interval was reached, while, during the second component, either millet and turnip seeds were delivered randomly. By the end of the experiment, no differences between components were found. Results are discussed in terms of their implications for the study of reinforcement. 5 to 8 responses for each delivery, while the cost of turnip seeds was lowered from 5 to 4 responses per delivery. In order to ensure that subjects would be able to discriminate the alternatives, the cost of the most preferred was doubled. Thus, during this phase, a concurrent FR-8 FR-4 schedule of reinforcement was implemented. The new income for each subject was the result of the mean consumption of each alternative for the last three sessions of each block times the new cost for each alternative. For instance, if a subject consumed 40 deliveries of millet seeds and 15 of turnip seeds on average, its income for Phase 3 would be calculated by multiplying 40 (the highest choice) times 0.8 and 15 (the lowest) times 1.6, and then adding the results together. Sessions ended once subjects reached the total amount of responses programed, thus ensuring that all subjects could afford the same number of reinforcers as in Phase 2. and turnip seeds. The results yielded that turnip and millet seeds are substitute goods albeit to a small degree.

There is considerable interest in the variables that affect the reinforcing value of an alternative. Thus, different dimensions of reinforcement, such as probability of occurrence (Eckerman, 1969), delay (Mazur, 1997), or magnitude (Lowe et al., 1974), have been manipulated in order to promote faster learning and greater resistance to extinction. However, while most studies on the subject focus on a single dimension of the same reinforcer, there exists a line of research which seeks to elucidate the effect of delivering two or more consequences against the effects of reinforcing behavior with a single consequence; such is the focus of the present work.
The earliest study in which two different reinforcers were delivered within a single alternative was conducted by Wunderlich (1961). In this study, 60 rats were divided in four groups according to the consequences they would receive at the end of a T-maze. The first group received only food on each trial (F group), a second group received only water (W group), and a third group received water on half of the trials and food on the other half (F/W group). Finally, a fourth group received water and food on each trial (F+W group). Subjects in Groups W/F and W+F ran significantly faster than those that received a single reinforcer. They also exhibited greater resistance to extinction. However, the only two groups that exhibited significant spontaneous recovery were Groups F and F+W. Steinman (1968aSteinman ( , 1968b suggested that an alternative that delivers qualitatively varied reinforcement (QVR) may have a greater reinforcing value than an alternative that delivers a single reinforcer. Steinman (1968a) conducted an experiment, which sought to replicate Wunderlich's (1961) findings using a variable interval (VI) schedule of reinforcement; 12 Long-Evans rats were exposed to a multiple three component VI 45-s schedule, with each component signaled by a different tone (T1, T2, and T3). In the presence of T1, responses were reinforced with food pellets; in the presence of T2, responses were reinforced with a solution of water and sucrose (sucrose from this point on), and T3 signaled the presence of either pellets or the solution of water-sucrose delivered randomly. Steinman took response rates as a measurement of preference. By the end of this experiment, rats responded at a higher rate during the component in which both food and sucrose were available, followed by the sucrose component and the food pellet component, respectively. Steinman (1968b) stated that his previous work did not take into account the different reinforcing strengths of the reinforcers available. Thus, he devised a procedure that attempted to test QVR when reinforcers had similar reinforcing strengths. Steinman trained 12 experimentally naïve Long-Evans rats. Subjects were trained to press a lever, which on a VI 45-s schedule resulted in the onset of a 0.5-s light, which was in turn accompanied by the presentation of a food pellet.
During the first phase of the experiment, a tone (T1) was presented six times for 5-min periods within each session. Pressing the lever during T1 was reinforced with food pellets. At the end of a 5-min period, the tone stopped, and no further reinforcers were delivered; when a 5-min period ended, a 15-s time out was implemented in which no lights were on, and responses did not lead to any kind of reinforcement. Phase 1 continued until response rates appeared to be asymptotic during time out periods for 3 consecutive sessions. During Phase 2, subjects were introduced to a second tone (T2) which signaled the presence of the watersucrose solution. During this phase, 5-min periods were randomly followed by periods of T2. Time outs were presented right after each 5-min period. As in Phase 1, sessions ended after the presentation of six 5-min periods. When subjects reached stability, the concentration of sucrose was decreased until response rates during both tones completely overlapped. Finally, during Phase 3, a third tone (T3) was introduced. While T3 was on, responses were reinforced by either the presentation of the sucrose solution or food pellets. These reinforcers were delivered randomly, and, during this phase, sessions consisted of nine 5-min periods that could present one of the three tones. By the end of the experiment, higher response rates were registered during the presence of T3, the tone associated with QVR. Steinman (1968b) concluded that stimuli associated with QVR may elicit higher response rates, even if reinforcers had a similar reinforcing strength when delivered individually. Roca et al. (2011) conducted two experiments using a similar procedure as those conducted by Steinman (1968aSteinman ( , 1968b. Roca et al. (2011) attempted to answer two different questions; first, given the fact the subjects in Steinman's studies (1968aSteinman's studies ( , 1968b were exposed to the constant reinforcement, both with pellets and the sucrose solution before being exposed to the varied component; results may had been affected by the previous exposure rather than an effect of delivering QVR. Secondly, the study attempted to answer the question as to whether the effect reported by Steinman (1968aSteinman ( , 1968b could be replicated if subjects were exposed to QVR and single reinforcers in different days, rather than presenting the three alternatives during the same session. For the first experiment, a group of 4 rats was exposed to a multiple VI 60-s schedules with two components. Each component delivered either food pellets or a solution of water and condensed milk. After stable responding was observed, a QVR component (delivering either food pellets or the solution) was added to the schedule. A second group of 3 rats was exposed to the three components from the beginning of the experiment. For most subjects and in both groups, the highest response rates corresponded to the component in which only the condensed milk was available, followed by the qualitatively varied reinforcement component and, lastly, the component in which only food pellets were available.
In the second experiment, the delivery of either QVR or the constant reinforcement was presented in sessions that took place in different days. The highest response rates occurred in sessions in which only the condensed milk solution was available; the second highest rates were registered in the QVR session, and, finally, the lowest response rates were registered during sessions in which only food pellets were available.
Applied studies on QVR also report mixed results; while some report greater reinforcing value for the QVR alternative (e.g., Bowman et al. 1997;Egel, 1981), some report greater reinforcing value for the constant reinforcement alternative (e.g., Koheler et al. 2005), whereas others do not report significant differences between constant and varied reinforcement (e.g., Najdowski et al., 2005).
Studies like those by Roca et al. (2011) and Steinman (1968a, 1968b, which employed similar procedures but yielded different results, appear puzzling and call into question the effectiveness of QVR for acquiring and maintaining an operant response. On this regard, Roca et al. (2011) suggested that the particular interaction between consequences may be able to account for the mixed results reported in the relevant literature. Green and Fisher (2000) suggested that concepts borrowed from microeconomics provide a useful framework to account for phenomena related to decision making and how organisms assign value to an alternative. For instance, algorithms taken from microeconomics to quantify the particular relation between two goods has been proposed as a useful tool to shed light over the role of variability in reinforcement value (Green & Freed, 1993). The following algorithm is traditionally used in microeconomics to calculate the particular relation between two goods (Parkin, 2006) Where exy represents a quantitative value corresponding to the particular relation between two goods, namely the grade of substitutability; ΔQ represents the total consumption of a good (or reinforcer) under a specific budget; ΔP represents the total price that has to be paid in order to obtain ΔQ under the same budget. That is, cross elasticity of demand is measured by taking the percentual change in total consumption of a good as a result of a percentual change in cost of a second good. When two goods are consumed in a rigid fashion (i.e., an increase in the cost of a good x results in the decreased consumption of both goods), said goods are considered to be complementary goods, and the value of exy is reflected in negative values. In cases in which an increase in the cost of a good x results in an increase in the consumption of a good y, then x and y are said to be substitute goods, and the values of exy are positive. Finally, when an increase of the cost in x has no effect on the consumption of y, then x and y are considered to be independent, and exy equals 0.
It is worth noting that substitutability and complementarity relations do not constitute a rigid dichotomy. Rather, substitutability is a continuum where two goods can be substitutes to a higher or lower degree. Thus, if the increment of the cost of a good causes an increment in consumption of another good, both are considered substitutes even if the decrease appears to be low, as long as the value of exy is greater than 0.
The calorie-filled, condensed milk used by Roca et al. (2011) could have acted as a substitute good for food, while the food pellets and water used by Steinman (1968aSteinman ( , 1968b could have acted as complementary goods, providing each other with added reinforcing value. The particular relation between two different goods could account for the mixed results reported in the relevant literature. The goal of this study was to assess the effect of qualitatively varied reinforcement when the two goods delivered are substitutable. In order to achieve this goal, three experiments were conducted. The purpose of the first experiment was to identify the extent to which two types of seeds (i.e., millet and turnip) are substitutes for each other. Experiment 1 followed a similar procedure of the income-compensated paradigm as described by Green et al. (1993). Experiment 2 sought to contrast the value of QVR against that of reinforcers delivered individually by employing a similar procedure as the one described by Roca et al. (2011). Finally, a third experiment was implemented in order to control for the way QVR was delivered. All experimental procedures were approved by the local ethical committee of the Centre for Studies and Investigations in Behaviour from the University of Guadalajara for animal experiments, and they followed governmental guidelines.

Subjects
Six experimentally naïve, female Wistar rats (Rattus norvegicus) were approximately three months old at the beginning of the experiment. All subjects were inbred in the Center of Behavior Studies at the University of Guadalajara. Each subject was housed individually. Fifteen days prior to the beginning of the experiment, commercial food pellets were removed from subjects' individual cages, and subjects were then exclusively fed with a mixture consisting of 50% millet and 50% turnip seeds. These seeds were selected due to their size uniformity and the fact that they do not tend to break apart in the food dispenser. The seed mixture was replaced each day until the beginning of the experiment. The amount of seeds subjects were fed was decreased gradually until all subjects reached 80% of their ad libitum weight.

Materials
Sessions took place inside experimental chambers (ENV-022MD) manufactured by MedAssociates®. Each chamber was 19 cm long, 23 cm high, and 23 cm wide. Lateral walls and ceiling were made of transparent plexiglass, while frontal and posterior walls were made of solid aluminum. Experimental boxes were placed inside sound-proof chambers and equipped with fans which were activated during experimental sessions. Each experimental chamber was equipped with two MedAssociates® food dispensers (ENV-203M-45), one for each kind of reinforcer. Each food dispenser was equipped with a tube; both tubes converged into a funnel in order to deliver both kinds of consequences into a single receptacle. Food receptacles were located on the center of the frontal wall. Two levers were installed, one at each side of the food receptacle; levers were 2 cm away from the receptacle and 4 cm above the floor. Colored light bulbs were installed above each lever; one of the lights was green, while the other one was white. Lights served as discriminative stimuli. Levers were calibrated so that the minimum amount of pressure needed to activate them was 0.14N. A white noise generator was located on the back wall (8 cm above the floor) and remained active during experimental sessions. Events that took place during experimental sessions were recorded using a MedAssociates® interface.

Procedure
Before the beginning of the experiment, all subjects were magazine trained under a concurrent Fixed Ratio (FR)-Fixed Interval (FI) program. The FR program was divided in 5-min components in which only one of the levers was made available. During the first sessions, subjects were require to lever press just once for the delivery of either millet or turnip seeds. Once the subjects emitted at least 100 responses per session, the criteria for delivery of reinforcers was increased to 3 responses per delivery. Finally, once subjects emitted at least 100 responses per session during the FR-3 program, the cost for each reinforcer was increased to 5 responses per delivery. The FI program delivered either millet or turnip seeds. The time for delivery was progressively increased from 30 s to 45 s and, finally, 60 s. The interval was increased according to their performance in the FR program. Magazine training sessions lasted for 30 min, and each component was presented three times. Once subjects emitted at least 100 responses during the FR-5 component, magazine training was concluded. Note. Each column represents a different phase. During Phase 1 subjects had an Unrestricted Income (UI), during Phase 2 subjects had a Compensated Income (CI) and, finally, during Phase 3 subjects had a Compensated Income (CI). During all phases, subjects had concurrent (conc) access to both kinds of seeds. The first row from the top show how many sessions subjects spent in each phase, while the last row represents the amount of lever presses subjects would emit for a single reinforcer.
Experiment consisted of three phases (see Table 1). Throughout the experiment, subjects were kept in closed economy (i.e., food was only available during experimental sessions). During Phase 1, subjects were exposed to a concurrent FR-5 FR-5 schedule of reinforcement, in which both levers were available simultaneously until a lever was pressed 5 times. At this point, both levers retracted until the reinforcers were delivered. Responses were reinforced with the delivery of either 0.225 g of millet or 0.225 g of turnip seeds. Phase 1 was implemented in order to establish the total number of reinforcers subjects would be able to attain during Phase 2. During Phase 2. there was no limit on the amount of responses subjects could emit, and sessions lasted two hours.
The goal of Phase 2 was to determine the consumption of millet and turnip seeds under a restricted income. Thus, an upper limit of 200 responses, which granted access to 40 reinforcers per session, was established. Additionally, in order to control for lateral bias (Stephens, 2008), the position of each alternative was switched after a first block of 5 sessions. Ten more sessions were conducted with the positions of the alternatives interchanged.
Phase 3 was implemented in order to calculate the degree of substitutability between millet and turnip seeds. Given the fact that all subjects preferred millet seeds over turnip seeds, the cost of this alternative was raised from 5 to 8 responses for each delivery, while the cost of turnip seeds was lowered from 5 to 4 responses per delivery. In order to ensure that subjects would be able to discriminate the alternatives, the cost of the most preferred was doubled. Thus, during this phase, a concurrent FR-8 FR-4 schedule of reinforcement was implemented. The new income for each subject was the result of the mean consumption of each alternative for the last three sessions of each block times the new cost for each alternative. For instance, if a subject consumed 40 deliveries of millet seeds and 15 of turnip seeds on average, its income for Phase 3 would be calculated by multiplying 40 (the highest choice) times 0.8 and 15 (the lowest) times 1.6, and then adding the results together. Sessions ended once subjects reached the total amount of responses programed, thus ensuring that all subjects could afford the same number of reinforcers as in Phase 2.

Results
In the conditions where both kinds of seeds had the same cost, all subjects exhibited a preference for millet seeds. However, there was a small shift in consumption when the cost of millet seeds was doubled with respect to that of turnip seeds. The consumption of millet and turnip seeds across the three phases was analyzed comparing the average consumption during the last three sessions of each block for Phases 2 (Restricted income) and 3 (Compensated Income) (see Figure 1).
At the end of Phase 1, the relative preference for millet seeds was 0.88 (on average, the participants got 45.25 millet seeds against 5.92 turnip seeds). A similar relative preference for millet seeds was recorded during Phase 2, averaging a 0.9 probability of choosing millet seeds. During Phase 3, both the amount and probability of millet seeds delivered decreased with respect to Phase 2. By the end of Phase 3, subjects obtained on average 33.53 deliveries of millet seeds and 9.81 deliveries of turnip seeds, and the probability that subjects would choose millet seeds decreased to 0.77.

Average Consumption of the Different Reinforcers During all Three Phases of Experiment 1
Note. The y-axis shows the reinforcers obtained, and the x-axis represents the three phases. Black bars represent average consumption of millet seeds, and the striped bars represent average consumption of turnip seeds. Each bars includes the standard deviation of the mean.
For all subjects, consumption shifted when the total amount of responses required to obtain millet seeds was increased from 5 to 8 and the cost for turnip seeds was decreased from 5 to 4 responses per delivery. It is worth noting that although there was a clear shift in consumption, said shift was low. If millet and turnip seeds were perfect substitutes, a complete reversal in consumption ought to be expected as the cost of the most preferred reinforcer was doubled. When the cost of millet seeds was increased, its consumption decreased while that of turnip seeds increased; thus, results show that these kinds of seeds are substitute goods, albeit to a low degree.

Experiment 2
The goal of Experiment 1 was to assess the relation between millet and turnip seeds within the taxonomy used in microeconomics. Because the results showed that millet and turnip seeds are substitute goods (albeit to a small degree), the goal of Experiment 2 was to evaluate the reinforcing value of an alternative using a multiple schedule of reinforcement with three components. During one of the components, lever pressing was maintained by the delivery of millet seeds; a second component in which lever pressing was reinforced by the delivery of turnip seeds, and finally, the QVR component in which responses were reinforced by the presentation of either millet or turnip seeds which were delivered randomly.

Subjects
Eight experimentally naive female Wistar rats were used as subjects. Subjects were fed exclusively with a mixture of 50% turnip and 50% millet seeds for 15 days prior to the beginning of the experiment. The supply of food was gradually decreased until rats reached 80% of their ad libitum weight. Subjects were approximately 3 months old at the beginning of the experiment.

Materials
Experiment 2 was conducted in the same experimental chambers described for Experiment 1. Again, there were two food dispensers that converged into a funnel at the center of the front wall. The food receptacle was located on the center of the frontal wall, and a single lever was installed on the front wall 2 cm to the left of the receptacle and 4 cm above the grid floor. Three different light bulbs of different colors (red, white, and green) were installed on the front wall, the lights served as discriminative stimuli. Two lights were located on the left and right of the front wall; the third light was located in the middle of the back wall. Every light bulb was installed 12 cm above the grid floor. The color light associated with each component was counterbalanced between subjects. A whitenoise generator was located on the back wall, 8 cm above the floor, and stayed activated during the experimental sessions.

Procedure
Subjects were exposed to a magazine training period similar to that described for Experiment 1, with the sole exception that each box was equipped with a single lever; since there were no distinct components for the two kinds of seeds. Millet or turnip seeds were delivered randomly when the FR was completed. Magazine training ended once they emitted at least 100 responses within 30 min. After each training session, subjects were fed with a mixture of millet and turnip seeds, and, in order to ensure animals would press the levers during experimental sessions, their food supply was gradually decreased until they reached 80% of their ad libitum weight. Subjects were exposed to a three-component multiple VI 60-s program. One of the components delivered 0.225 g of millet seeds, the second component delivered 0.225 g of turnip seeds, and the third component delivered a mixture of 0.225 g of turnip and millet seeds in the same proportion. Each component lasted for 120 s; components were programmed to appear in a random fashion. Sessions ended once each component had been presented at least 10 times. All subjects completed 40 sessions of 60 min each.

Results
By the end of the experiment, the highest response rates were registered during the component in which only millet seeds were available (M= 0.5217), followed by those registered during the component in which both kinds of seeds were available (M= 0.5200) and the component in which only turnip seeds were available (M= 0.3814), respectively. Figure 2 illustrates the mean response rates of the 40 sessions for all subjects during the three different components. Means were compared using a repeated measure ANOVA of one factor, F(2, 78) = 51.94, p<0.01. Bonferroni post hoc tests did not show statistically significant differences between the response rates of the QVR component and the millet seeds component (p = 1); however, they did reveal statistically significant differences between the QVR component and the turnip seeds component (p = 0.028) and between the millet seeds component and the component in which only turnip seeds were available (p = 0.028). These results appear to be consistent with the results of Experiment 1, in which subjects developed a preference for millet seeds over turnip seeds. They also appear consistent with the findings reported by Roca et al. (2011), in that the stimuli associated with the delivery of QVR did not elicit higher response rates than stimuli associated with the presentation of a single reinforcer.

Figure 2
Mean response rate for E2 Note. The y-axis represents the mean response rate for all 8 subjects across 40 sessions. The x-axis represents each component according to the reinforcers subjects received. Each box represent the mean, range, and standard deviation of the mean per component.

Experiment 3
Because the results of Experiment 2 showed no added reinforcing value for the QVR component, a third experiment was conducted in order to test whether the way in which the consequences are delivered in the qualitatively varied reinforcement alternative has an effect on response rates. For the purposes of this experiment, QVR was delivered in two different ways: 1) as a mix of both consequences in the same proportion, as was done in Experiment 2, or 2) presenting each kind of seeds independently in a randomized order using a similar program as the one described by Roca et al. (2011).

Subjects
For the purpose of this experiment, 8 experimentally naive female Wistar rats of approximately 3 months old at the beginning of the experiment were used. Subjects were fed exclusively with a diet consisting of a mixture of millet and turnip seeds for 15 days prior to the beginning of the experiment, and they were kept at 80% of their ad libitum weight by restricting the amount of food they received before the experiment. Subjects were exposed to the same magazine training as the one described for Experiment 2.

Materials
The same experimental chambers used in the previous experiments were used with a very similar setting, with the sole exception that the general light placed on the back wall was removed. The two lights in the frontal wall served as signals for the two different components.

Procedure
Experimental sessions consisted of a multiple VI 60-s schedules of reinforcement with two components, each of which lasted for 120 s. Responses in one component led to the delivery of a mix of 0.225 g of millet and turnip seeds (millet/turnip component) in the same proportion, while responses in the other component were reinforced by either 0.225 g of millet seeds or 0.225 g of turnip seeds (millet or turnip component). Both components were randomly presented 15 times in each experimental session. Components were signaled by two different lights (white or green) placed on the right or left side of the frontal wall, and lights were counterbalanced to signal the different components. Components were presented in a randomized order without any restrictions. A total number of 16 sessions was conducted. Each session lasted for 1 hour.

Results
After 16 sessions, the mean response rates for the component in which both kinds of seeds were delivered simultaneously (M= 0.321) were higher than those recorded during the component in which the seeds were delivered randomly (M= 0.315). Figure 3 shows the mean rate of responding for all subjects and for both components during the 16 experimental sessions.
Although the mean response rate for the component in which both kinds of seeds were delivered simultaneously was slightly higher than that recorded during the component in which seeds were delivered randomly, a one-way ANOVA revealed no statistically significant differences between the response rates recorded during both components F(1, 15) = 0.54, p = 0.475. Thus, the distinct ways in which the varied consequences alternative was presented appeared to have no effect on response rates.

Figure 3 Mean responses for E3
Note. The y-axis represents the mean response rate for all 8 subjects across all sessions, and the x-axis represents each component according to the reinforcers that subjects received. Each box represent the mean, range, and standard deviation of the mean per component.

General Discussion
Behavior science analysts have always held a vested interest in discovering new strategies that promote the faster learning of new behaviors, as well as how to make those behaviors more resistant to extinction. The present work seeks to contribute to the understanding of reinforcement by considering not only the use of varied consequences but the particular relation between the different consequences delivered by a single alternative. Roca et al. (2011) found no added reinforcing value by delivering QVR; rather, subjects responded with higher rates during the presence of the stimulus associated with a single reinforcer. The authors attributed this effect to the high-calorie condensed milk acting as a substitute for food. By this logic, when a subject is exposed to a set of reinforcers that act as substitute goods, no effect of QVR ought to be expected; however, if there is a higher preference for a reinforcer A, this preference ought to be expected to remain, even when another alternative provides both A and a second reinforcer B. Conversely, two complementary goods ought to provide each other with extra reinforcing value, and a stimulus associated to those complementary reinforcers should elicit higher response rates than stimuli associated with a single reinforcer. The goal of the first experiment was to identify the particular relation between millet and turnip seeds. The results yielded that turnip and millet seeds are substitute goods albeit to a small degree.
The second experiment attempted to test whether a component that delivered QVR would have more reinforcing value than components which deliver a single reinforcer. By the end of the experiment, there were no statistically significant differences between the response rates recorded during the QVR and those recorded during the component in which millet seeds were available, while response rates during the component in which turnip seeds were available were the lowest. Steinman (1968aSteinman ( , 1968b asserted that an alternative that provided several reinforcers should be expected to have more reinforcing value; that is, stimuli associated with QVR ought to elicit higher response rates than those associated with a single reinforcement. The results of the second experiment of this series do not lend support to that claim. There are at least three possible ways to account for the results of Experiment 2: 1) the specific way in which reinforcers were delivered, 2) salience of the reinforcers, and 3) interaction between reinforcers.
In Steinman's (1968aSteinman's ( , 1968b studies, qualitatively varied reinforcers were delivered randomly during the same component as opposed to the QVR of Experiment 2, in which millet and turnip seeds were delivered as a mix. Thus, the different results could be attributed to a program-induced bias instead of the type of reinforcers or their particular interaction. Experiment 3 of the present series was implemented in order to rule out this possibility. During this experiment, response rates for a component in which QVR was delivered randomly did not differ from those recorded during a component in which turnip and millet seeds were delivered as a mix. In addition to this this result, it is worth noting that Roca et al. (2011) performed two experiments, one in which QVR was delivered during a component within the same session as single reinforcers and an experiment in which QVR was delivered in a different day on a subsequent experimental session; in both experiments, response rates were higher when a single reinforcer was available. Taking these findings into account, there is no reason to suggest the way in which QVR is delivered has an effect on its reinforcing value.
A second possibility is that the inherent properties of reinforcers, such as palatability, size, consistency, or color, may make it harder for subjects to discriminate between two different reinforcers. In Steinman´s (1968aSteinman´s ( , 1968b experiments, as well as in Roca et al.'s (2011) experiments, subjects received a liquid solution and solid food as reinforcers, while, during Experiment 1, subjects consistently exhibited a preference for millet seeds even when their cost was twice as high as that of turnip seeds. As expected, during Experiment 2, subjects responded at the lowest rates when turnip seeds were available. It is possible that it is harder for subjects to discriminate between a component that delivers a highly preferred reinforcer (such as millet seeds) along with low preference reinforcer (like turnip seeds) and a component in which only the highly preferred reinforcer is available compared to discriminating between a component that delivers the highly preferred reinforcer and a component in which that reinforcer is not available at all. In order to control for this possible effect, it would be necessary to deliver sets of qualitatively varied reinforcers that share the same relationship (e.g., two substitute goods or two independent goods) but differ in salience.
Finally, a third possibility is that the way reinforcers interact with each other determines whether the QVR alternative will be endowed with extra reinforcing value. Roca et al. (2011) hypothesized that the highcalorie condensed milk solution used in their experiments may have acted as a substitute of food, making the alternative in which condensed milk was available the highest valued. By that logic, when reinforcers are substitute goods to a high degree, no extra reinforcing value ought to be expected for the QVR alternative. Two reinforcers that function as substitute goods to a small degree may provide each other with extra reinforcing value, and, if the reinforcers act as complementary goods, they should provide each other with extra reinforcing value.
The results of the present series seem to fit the second scenario: Two reinforcers that act as substitute goods provide each other with enough reinforcing value to rival that of a highly preferred reinforcer but not enough to surpass it. Thus, the results of the present experimental series do seem to lend support to the notion that it is the particular relation between reinforcers that determines the value of an alternative in which several consequences are available.
However, due to the lack of research on the subject of QVR, making claims about the effect of substitutability on reinforcing value would be premature. At this point, all that can be asserted with confidence is that the merely delivering two or more consequences for the same operant behavior does not necessarily translate to greater reinforcing power. In order to elucidate the extent to which the value of a reinforcer is determined by the type of interactions it has with other reinforcers, more manipulations, such as introducing different sets of reinforcers that encompass different values of the substitutability spectrum, are needed. The results of the present series only allow for drawing conclusions about QVR when reinforcers are substitute goods to a small degree. Future research ought to focus on reinforcers that are complements in respect to one another as well as reinforcers that rank higher in the substitutability scale.