Choice Behavior

Will work for food?

Choice Behavior

August 5, 2010

Joyce Kesling, CDBC

Choice responding refers to the manner in which individuals allocate their time or responding among available response options” (Fisher & Mazur, 1997).

Everyday life presents choices with many of us giving little thought to how those choices influences our present and future behavior.  Understanding how those choices are derived may be important in solving behavior problems and training situations.  A choice made between behavioral responses has been greatly influenced by previous reinforcement history and one’s personal preferences.

Choice Behavior and Matching Law

Choice behavior may be described as an animal’s “…ability to choose between alternative courses of action” and is considered a “behavior imperative” enabling focused attention on “…moment-to-moment demands presented by the environment” (Lindsay, 2000).  This choice behavior is influenced by previous reinforcement histories and the animal’s previous choices selected in generalized contexts.  These generalized patterns tend to be “highly correlated or matched” to the relevancy from other alternative sources of reinforcement made available as choices in the environment (Lindsay, 2000).

This notion is confirmed by Guthrie, who proposed “a combination of stimuli which has accompanied a movement will on its recurrence tend to be followed by the movement (1935/1960:23)” cites Lindsay (1990).  Lindsay clarifies this statement by saying, “behavior occurring in some given situation will tend to recur under the same or similar circumstances in the future” (Lindsay, 2000).

Lindsay (2000) proposed “classical and instrumental learning activities are always functionally integrated” even though for the purposes of both practical and experimental applications they are frequently treated separately.  According to Lindsay (2000), Pavlov believed all reflexology would come to be understood as how all learning takes place using a simple S-R mechanism.  Until recently, this idea fell short of Pavlov’s expectations until research by Rescoral and associates found “Pavlovian associative linkages and structures embedded in every major facet of instrumental conditioning” (Lindsay, 2000).  According to Rescoral (1987), “these encoded Pavlovian structures include S-R relations, predictive stimulus-outcome relations, and Pavlovian-like response-outcome expectations” (Lindsay, 2000).

To fully appreciate how the application of choice behavior and matching might be applicable in training animals it might be necessary to briefly review some of the previous theoretical perspectives that have been proposed by Thorndike, Guthrie, Tolman, and B.F. Skinner and have influenced modern scientific studies on animal behavior.

Edward L. Thorndike

Thorndike is known for his experiments using cats and discovered they used a process called “trial and error” in learning to escape from an enclosure.  He termed this stamping in and stamping out, which meant respectively successful behavior was stamped in and unsuccessful behavior including frustration was stamped out and concluded a “…response was directly connected or bonded to the associated stimulus complex” he referred to as stamping in.  Thorndike concluded all “learning is connecting” and is not dependent on “reasoning” or any “specialized instinct” but rather “…entirely on the selective stamping in or stamping out of relevant S-R connecting” (Lindsay, 2000).

Based on his findings Thorndike developed three basic laws of learning as outlined in the following:

  1. Law and effect – all S-R connections are strengthened or weakened depending on the hedonic quality of its consequences.  He further defines these connections as reward satisfies therefore strengthens behavior and punishers or annoyances weaken behavior.
  2. Law of exercise – response is strengthened by continuance of training and use and is weakened when discontinued.
  3. Law of readiness – described conductive units and this law was considered to be worded by Thorndike in a rather peculiar way and later suggested by Hilgard and Bower (1975) to mean units used to measure an objective action tendency or preparation for action and describing the motivational value and satisfaction level for any particular action chosen.

According to Lindsay (2000), the “readiness to act is affected by an animals mental set or attitude” and this is determined by its self-motivations and whether it considers its necessity to act as an annoyance or a satisfying event.  Lindsay (2000) says, this law “anticipates in several details Premack’s theory of reinforcement reversibility” stating “under one set of motivational conditions is reinforcing may be punitive under another” and further says whether “a particular activity is annoying or satisfying is relative to the animal’s varying motivation state.”

Thorndike can most be remembered for his “emphasis on reward over punishment” and can be considered an important contribution to modern training practices (Lindsay, 2000).

Edwin R. Guthrie

Guthrie is most known for his book The Psychology of Learning and its application to dog behavior and training.  He also believed all learning was acquired through simple S-R associations.  Most notably his contribution to understanding how we interpret reinforcing and aversive events is dependent on the context in which it occurs, saying “just as satisfiers do not always ‘stamp in’ a connection, so annoyers do not…always ‘stamp out’ and with this understanding we can predict the influencing stimuli at any time that may be maintaining any given behavior (Lindsay, 2000).

Lindsay (2000) explains, “…the hedonic value…of the reinforcing event is not intrinsically significant to the effect it has on behavior” but the emotional excitement generated either rewarding or punishing does impact the reinforcing effect on learning, because learning is accelerated by excitement, therefore the significance of reinforcers is determined by how they affect behavior.  Thus it’s not the feeling the animal gains from aversive and rewarding events but rather how these events impact their subsequent behavior and what trainers should focus their attention toward.

Guthrie’s system uses an adaptive functional way of learning, which correlates with what many modern trainers suggest to companion dog guardians through train new behaviors and incorporating them into their everyday interactions allowing for generalization and effortless learning for the dog.  Through incorporation of these habits, the dog learns adaptive ways to cope with their environment and once established these habits persist through continued and refined training (Lindsay, 2000).

Guthrie describes many of the same techniques he described as interfering with behavior, response substitution, negative adaptation, response prevention, response fatigue or negative practice all describing common methods used by modern dog trainers.  Lindsay (2000) described Guthrie’ three basic methods used to break unwanted behavior as follows:

  1. Control the situations, preventing the antecedent responsible for the behavior
  2. Fatiguing the response or keep the stimulus under threshold
  3. Substitute undesirable behavior with an incompatible behavior – “if the cue or signal is present and other behavior prevails, the cue loses its attachment to the obnoxious response and becomes an actual conditioner of the inhibiting action”

Guthrie’s use of what he termed “response fatigue or negative practice” is the same process used by Kellie Snider at her seminar in Dallas Texas that used negative reinforcement coupled with fatigue to change the reinforcement value for an aggressive dog’s response to approaching strangers or dogs.  So it seems Guthrie had a huge impact on many of the methods utilized by today’s dog trainers including “vivo exposure and response prevention, counter-conditioning, systematic desensitization, negative practice, and overcorrection” (Lindsay, 2000).

Edward C. Tolman

I think Tolman could best be remembered for introducing the concept of studying behavior “…in the context of the subject’s intended purpose” which meant not evaluating behavior using a “molecular relationship the best alternative at the moment but rather a “molar relationship” making something as good as it can be over time (Lindsay, 2000).

He proposed development for studying the purpose for behavior using “hypothetical constructs” derived from direct observation rejecting using empathy or introspection in drawing his conclusions (Lindsay, 2000).  In the study of purposive behavior, he proposed using three experimental variables, which “co-interact” to arrive at the significance of a behavior as outlined below (Lindsay, 2000).

  1. Independent variables – controlled aspects for experiments, particularly the stimulus conditions and motivational state.
  2. Dependent variables – included measuring changes in behavior while the subject was under the influence of the controlled conditions.
  3. Intervening variables – use of abstract constructs to explain observed S-R relationships.

Tolman’s introduction using an intervening variable for the scientific study of behavior helped clarify presumptions about behavior and preferences over stimuli.  He also placed a greater emphasis on “stimulus or sign learning” than “response habit” proposed by Thorndike.  He suggested animals use “cognitive maps” or “sign-gestalts (signs, significates…behavior routes) to form significant relationships related to the satisfaction of their appetitive needs.  Lindsay (2000) suggests his “…signs correspond to the classical conception of the conditioned stimulus and significates to the unconditioned stimuli.”

Several experiments support Tolman’s “cognitive interpretation of learning” that suggests animals form expectations regarding their environment and form goals based on these expectations.  According to Lindsay (2000), Tolman’s theories make several distinctions saying, “learning is independent of performance, but performance is not independent of learning.”  Lindsay illustrates this point saying, “motivational levels strongly impact performance…generating goal directed tensions demanding satisfaction” summing up that performance is directed by current motivational states and previous learning experiences.  These motivational desires (appetite, fear and aversion) dictate what animals pay attention to and learn from within their environment.

According to Lindsay (2000), it is by this learning process that “…dogs are ever-forming predictive interpretations and expectancies about the occurrence of important stimulus events-a process that is both purposive and cognitive.”

B. F. Skinner

Skinners main contributions and controversial positions are summed up in the following outline.

  1. Placed greater emphasis on reward rather than punishment to change behavior.
  2. The invention of the Skinner box used to study and control behavior events.
  3. Use of the Skinner box to measure and manipulate behavior using schedules of reinforcement.
  4. Rejected most forms of scientific theorizing.
  5. He excluded physiological descriptions and theories including “mentalistic and hedonic interpretations” such as expectancies and pleasures.
  6. He rejected “conceptual accounts” and use of “intervening variables”

His biggest contribution may be his “…system of operant and respondent conditioning” consisting of “two sets of binary laws” with type S laws regulating respondent learning “inductive generalities…from Pavlov” and type R laws governing operant learning, similar to Thorndike’s law of effect (Lindsay, 2000).

The following outlines the basic laws according to Skinner’s system of learning (Lindsay, 2000).

  1. Type S – law of conditioning says, “the approximately simultaneous presentation of two stimuli, one of which belongs to a reflex existing at the moment at some strength, may produce an increase in the strength of a third reflex composed of the response of the reinforcing reflex and the other stimulus.”
  2. Type S – law of extinction says, “if the reflex strengthened through conditioning of Type S is elicited without presentation of the reinforcing stimulus, its strength decreases.”
  3. Type R – law of conditioning says, “if the occurrence of an operant is followed by presentation of a reinforcing stimulus, the strength is increased.”
  4. Type R – law of extinction says, “if the occurrence of an operant already strengthened through conditioning is not followed by the reinforcing stimulus, the strength is decreased.”

According to Lindsay (2000) these laws are “little more than a reiteration of Pavlov and Thorndike” but says Skinner could be better remembered for the “creative and productive ways that he applied them to the study of behavior.”

How is behavior increased?

Lindsay (2000) says, “there are two ways in which the probability/frequency of behavior is affected by the consequences it produces” by using reinforcement and punishment.  If we use a reinforcing stimulus, we can expect to see increase in the behavior following reinforcement, if we use punishment we can see a decreasing in behavior following the presentation of the punishment stimulus.

Additionally, behavior is reinforced or strengthened by using positive reinforcement by “producing or prolonging” a desirable consequence and behavior can be strengthened or terminated by reducing or avoiding an undesirable consequence (Lindsay, 2000).

It is important to recognize that both positive and negative reinforcement “increase the future probability/frequency of the behavior they follow

How is behavior decreased?

Finally, we can effectively “punish or weaken” behavior using negative and positive punishment.  When we remove the opportunity for a reinforcing consequence, it is said this will weaken the preceding behavior from occurring.  It is also said that using positive punishment by applying a “previously escaped or avoided consequence” will weaken the previously presented behavior.  It is noted that both positive and negative punishment can effectively decrease the “future probability/frequency of the behavior they follow” (Lindsay, 2000).

Gaining knowledge through reinforcement history

The use of these previous reinforcement contingencies are applied to every day interaction between an animal and its environment.  This application provides a historical and predictive source that enables the animal to control and manipulate future events and outcomes.  It is through the positive and negatively reinforced and punished consequences of its behavior that helps form an animal’s future behavior.

The ability to control and manipulate one’s environment assists in the learning process by reinforcing both aversive and positive outcomes and according to Lindsay (2000) two “…motivations drive instrumental learning…maximization of positive outcomes and minimization of aversive ones” and both of these behavioral consequences correspond to “positive and negative reinforcement.”  It is these two consequences that have the most influence over maintaining and acquiring goal directed behavior.

It is important that dog trainers and owners realize how this predictability for both reinforcement and punitive consequences may affect the learning process.  It is imperative in training to provide clear links with any proceeding antecedents with behavior and consequences otherwise the subject may be unable to link their behavior with these rewarding or punitive consequences.  This would create a very unstable relationship, which can lead the subject to either learned laziness or even worse learned helplessness.

Positive and negative events

When animals look to satisfy physiological and psychological needs they find this positively reinforcing and the simple associations made when they experience these positive outcomes sets the stage for learning to become a more rewarding activity.  In other words, these positive events help create an internal reward system based on behavior because of the direct influence received from these positive outcomes or consequences.  This holds true when trainers use food as reinforcement for desirable behavior.  If the dog is sufficiently motivated by the use of food presentation for a specific behavior, he will more likely produce that behavior in the future based on this positive training outcome.

On the other hand, negative events have the potential to reinforce dogs when they are able to adjust their behavior based on either terminating or avoiding aversive stimuli.  Dogs experience this type of reinforcement not only from the natural environment, but during training exercises as well.  This negative reinforcement is not always, what many may perceive as some type of negative event or behavioral outcome, but rather simply points out how the animal’s behavior is affected by stimuli.  The stimuli may be as simple as the dog’s necessity for seeking shade from being overheated by direct sun contact.  The moving from direct sun, which potentially could be over heating the dog, may be looked at as negative reinforcement.  In order to be relieved from the over heating effects of the sun, the dog has to move to a shady spot.

These positive and negative events shape the future behavior of the animal by confirming or disconfirming with repeated exposure (Lindsay, 2000).

Behavioral incentives

According to Lindsay (2000), dogs derive the incentives for their behavior from two sources intrinsic and extrinsic.  The intrinsic incentives are contained within the behavior itself and include both positive and negative stimuli.  We may all be familiar with the enjoyment dogs receive from playing with balls or chasing squirrels, these are enjoyable thus considered as positive incentives and do not require any other reinforcement for the behavior.  The negative incentives may include growling, snapping or running away from sources the dog views as aversive because they provide the dog relief within itself; therefore, no other reinforcement is necessary.  These incentives are maintained through the dog’s environment and are determined by how he perceives them.

Extrinsic incentives also include both positive and negative incentives but are derived through sources other than the behavior itself.  These positive and aversive events may include going to the dog park versus going to the veterinarian’s office.  These types of incentives are manipulated by the trainer by providing contingencies for target behavior.  As a trainer, I can manipulate the dog’s perception of these incentives including veterinary visits, by changing the reinforcement value to something the dog perceives as positively reinforcing rather than wanting to avoid how most dogs perceive the veterinary visit.

Motivational factors affecting behavior

As I have previously discussed an expected behavioral outcome is going to be derived from the animals past reinforcement history, its current motivation and willingness to act.  The success for any trained target behavior will depend on these factors and the effectiveness of the trainer manipulating the dog’s environment such that these expectancies and motivations are used to their advantage.

Learning and control

Our history of events coupled with the type of reinforcement we may have received will have an influence on how we perceive future events and can be referred to as one’s expectancies.  These expectancies are formed based on past events, the reinforcement we may or may not have received and can be viewed in terms of “emotional arousal” such as hope/satisfaction, relief, disappointment/frustration and fear/anxiety (Lindsay, 2000).

According to Lindsay (2000), animals receiving continuous reinforcement “tend to generate expectancies” associated with a “degree of certainty” which he termed “elation” and intermittent reinforcement “tend[s] to generate expectancies based on probability” or hope.  When conditions of intense hope are present the subject may turn to “irrational realms of superstition and compulsivity” saying hope is a “controlling motivational factor” in games of chance, such as gambling, which is not associated with winning as much as it is associated with avoiding not losing.  Lindsay (2000) describes this same behavior in “laboratory animals working under intermittent reinforcement” saying these “…individuals…are probably little interested in actually winning money” because they actually lose more than they win but are “…motivated to experience the sheer pleasure and elation of winning and avoiding the painful disappointment of losing.”

Lindsay (2000) says these same effects can be seen with subject’s behavior under the influence of ratio schedules controlled by duration or interval contingencies.  A good example might be the one Lindsay (2000) uses in describing training a dog to sit/stay.  When training duration we usually begin reinforcing specifically set times, such as after 3 seconds slowly increasing the duration while rewarding the dog for increasing his sit/stay behavior.  However, if the trainer fails to reinforce the dog based on his prior learning expectancies and moves the criteria up too much or too fast the dog may break from continuing the behavior because it has exceeded his prior learning expectancies creating disappointment.

Based on the assumptions one could say, “behavior based on expectancies of certainty is vulnerable to disappointment, but behavior based on expectancies of hope is more persistent and motivationally immunized against the adverse influences of disappointment” (Lindsay, 2000).

Choice Behavior

Previously stated choice behavior is apparent to all species and according to Domjan (2003), “understanding the mechanisms of choice is fundamental to the understanding of behavior because choices organisms make determine the occurrence of individual responses.”

Life can provide single options for choice, but in most cases, individuals are presented with more than one alternative with the simplest offering two choices with each choice offering the subject a different schedule of reinforcement.  Under laboratory conditions this two choice option is called a concurrent schedule of reinforcement and is set up to determine how a subject distributes his choices between the two alternatives and how the different reinforcement schedules influence those choices.

One of the most common ways to study this behavior used pigeons offering simple two choice alternatives and concurrent schedule VI 60-sec.  There were no constraints on the pigeon’s ability to choose either alternative.  The result observed was the pigeons responded equally on each alternative, thus “by responding equally often on each side of a concurrent VI 60-sec VI 60-sec schedule, the pigeon will also earn reinforcers equally often on each side” (Domjan, 2003).  This is measured by calculating the relative reinforcement rate of response A and dividing it by the total sum of reinforcement A and B combined.  Using this type of schedule the response was equally earned on either side.

Matching Law

In my previous example using a concurrent schedule and two choice alternative Domjan (2003) says the “relative rate of responding is equal to the relative rate of reinforcement” but will this change if a subject is offered two choices using two different schedules of reinforcement?

Herrnstein (1967) pointed out in a previous paper (1958) “the relative frequency of responding” to two alternative choices [key responses] may be controlled within narrow limits by adjustments in an independent variable.”  He says, these earlier experiments might be considered a “study of differential reinforcement” and his studies outlined in his1967 paper as “a study of strength of response.”

Herrnstein’s (1967) experiment used three pigeons and a conventional experimental chamber using two response keys.  The preliminary training phase consisted of “two sessions of 60 reinforcements each” with a peck to either key being reinforced but only when the just previous reinforcement was for a peck to the other key.  An almost perfect alternation of responding between the two keys was rapidly learned.

During the actual experiment, the pigeons were reinforced for “responding to either key” using a variable-interval schedule with each key’s schedule independently programmed from the other.  This meant reinforcement could be immediately available, available on neither key, one key or the other or both keys and a “response to one key had no effect on the programmer that scheduled reinforcements on the other” (Herrnstein, 1967).

Herrnstein (1967) used an independent variable, which was the “mean time interval between reinforcement on each key” and “held constant at 1.5 minutes.”  The pigeons were penalized when they altered their pecking behaviour between keys for 1.5 seconds with no reinforcement.  The experiment required “at least two consecutive pecks on a given key…before reinforcement” with the first peck beginning a session and the second completing it.  The 1.5-second penalty was referred to as a “change over delay…or COD.”

The response rates were calculated using the same mathematical formula I discussed earlier.  It was observed during the experiment that the “number of times a pigeon changed keys depended on the difference in frequency of reinforcement on the two keys” and the “frequency of alternations between keys clearly decreases as the two keys are associated with increasingly different relative frequencies of reinforcement.”

It was noted the COD “markedly reduces the frequency of alternations between keys” and “unequal reinforcement frequencies on the two keys reduce alternation only when the COD (1.5”) is present.”  It seems also the COD plays a role “in the production of the relation…namely, the tendency of the relative frequency of responding to match the relative frequency of reinforcement” (Herrnstein, 1967).

Domjan (2000) summarized Herrnstein’s (1967) work stating, “pigeons distributed their responses in a highly predictable fashion.  The results…indicate that the relative rate of responding on a give alternative was always very nearly equal to the relative rate of reinforcement earned on that alternative.”

The use of matching law in experimental studies has shown that choice behavior is not made without some thought and is made based on “the orderly function of rates of reinforcement” says Domjan (2003).  What is not completely understood is the “precise characterization of the function” and in spite of all the research, “relative rates of responding do not always exactly match rates of reinforcement.”

When choice behavior does not match, researchers can add two parameters “sensitivity” and “bias” that assist in explaining what is termed under-matching and over-matching.  According to Domjan (2003) “choices are more likely to exhibit reduced sensitivity to relative reinforcement rates than they are to exhibit enhanced sensitivity to reinforcement rates” with under-matching found more often.  There are several variables that can influence the sensitivity that include the type of species, the amount of effort or difficulty in switching and how the schedule alternatives are constructed (Domjan, 2003).  Typically, the more difficult switching is from one alternative to the other the more the subject becomes sensitive to the reinforcement available between the two options.

When researchers use “response bias,” the two alternatives for choice behavior are different providing two alternative choices of reinforcement.  Additionally, researchers have found the relative rate of reinforcement can be influenced by the amount of each reinforcer, the relative delay of reinforcement and palatability of reinforcement and can be considered “aspects of its general value” with larger, more palatable and more immediate reinforcers having more value (Domjan, 2003).

To accommodate the question of single response situations and choice behavior Herrnstein says “even single-response situations can involve a choice” the choice between pressing a lever for reinforcement or choosing to perform some species typical behavior such as grooming, sniffing, walking around or just pecking the floor (Doman, 2003).  The subject not only receives “explicit reinforcement” for their operant choice, but also “intrinsic rewards” for any other activities they may engage in, so “total reinforcement includes the programmed extrinsic rewards as well as the unprogrammed sources of reinforcement” permitting this application of matching law to single-response reinforcement schedules (Domjan, 2003).

Summing this up Domjan (2003) says, “according to this law, the tendency to make a particular response depends not only on the rate of reinforcement for that response but also on the rates of reinforcement available for alternative activities.”  So when applying choice behavior to behavior modification one should not only include the rewards available, but also how an individual can obtain rewards in other ways, suggesting for an accurate assessment of an individuals behavior one should consider the full range of available choices and sources of reinforcement.

Additionally, matching law “suggests novel techniques for decreasing undesired responses and increasing desired responses” and further states that one can increase behavior by providing more “free” reinforcement and by withdrawing reinforcement from other alternatives (Domjan, 2003).

“Matching law describes how organisms distribute their responses in a choice situation but does not explain what mechanisms are responsible for this response distribution” and is a “descriptive law of nature rather than a mechanistic law” (Domjan, 2003).

Molar theories of matching try to explain the total or accumulated sum of responses and overall distribution of responses and reinforcers in choice behavior and molecular theories focus on the level of individual responses and consider the matching association as a net result of individual responses (Domjan, 2003).

Molecular maximizing is described as “organisms always choose whichever response alternative is most likely to be reinforced at the time” and according to Domjan (2003) Shimp (1966, 1969) suggested “when two schedules (A and B) are in effect simultaneously, the subject switches from schedule A to schedule B as the probability of reinforcement for schedule B increases” and proposed this matching relation as “prudent switching when the probability for reinforcement on the alternative response key becomes greater then the probability of reinforcement on the current response key.”

Molar maximizing suggests subjects “distribute their responses among various alternatives in order to maximize the amount of reinforcement they earn over the long run” and is utilized when using concurrent schedules using ratio components to explain choice responding (Domjan, 2003).

There are two confounding problems with using molar maximizing to predict choice behavior.  One occurs when the subject is tested using concurrent VI-VI schedules that offer nearly the same reinforcement value, but the subject must occasionally sample from both alternatives.  This is just one example that “molar maximizing cannot explain why choice behavior is distributed so close to the matching relation on concurrent VI-VI schedules and not in other equally effective ways” according to Heyman (1983) (Domjan, 2003).

The second occurs when subjects are tested by offering a variable ratio and variable interval schedule.  Subjects responding on a variable ratio (VR) schedule and reinforced after making a correct number of responses and subjects working on a variable interval (VI) schedule offering rewards after unpredictable amounts of time but still receiving close to the same amount of reinforcement should favor responses more toward the VR schedule with only an occasional response to the VI, but studies have not revealed the strong correlation that molar maximizing suggests.  What researchers have suggested is “…human participants also respond much more on the VI alternative than is prudent if they are trying to maximize their rate of reinforcement (Domjan, 2003).  This can be explained by saying subjects have to work harder on a variable ratio schedule and do not have to work as hard on a variable interval schedule, but still have the potential for the same amount of reward.

Melioration is a third way of explaining choice behavior but rather an accumulation of choices that may be better over the long haul.  Melioration does not suggest these choices are the best choices at any given time, but rather goal directed choices the subject makes at a specific time, but may have long-term effects.

One important difference between how researchers calculate the rate of responding and reinforcement, is molar theories use “overall rates of responding and reinforcement…calculated over the entire duration of an experimental session” and melioration uses local rate which is calculated only over the time period that a subject devotes to a particular choice alternative” (Domjan, 2003).  When using a two choice alternative the calculation consists of taking the number of responses on A and dividing them by the amount of time spent responding to A.

Melioration suggests subjects will alternate their response choices in order to improve their local rate of reinforcement and will continue this alternation of choices until they are receiving the same local rate of reinforcement from both alternatives (Domjan, 2003).

Matching law probably offers the best explanation for choice behavior viewing choice not just as a single event or internal process but an observable occurrence over time and can be stated that subjects respond to various choice alternatives in exactly the same proportions to the reinforcement value from each choice and differs from rational choice theory by how it predicts individuals will use self control.

What are Concurrent-Chain Schedules?

Life is always full of choices and scientist work to determine how subjects make these choices and concluded that the variety of responses necessary for reinforcement using a variable ratio schedule is preferred rather than the fixed number of responses required for reinforcement using a fixed ratio schedule.  Therefore, scientists use concurrent-chain schedules that create choice with commitment to help explain subject’s choices.  These studies have shown “that subjects do prefer the variable ratio alternative” even if it requires more work.  This suggests variety is not the “spice of life” but rather that a VR schedule does provide reinforcement on occasion and sometimes with relatively little effort.


Self-control is always equated with “choosing a large delayed reward over an immediate small reward” and control over choice is easier when a “tempting alternative” is more difficult to obtain says Domjan (2003).  To study this choice behavior scientist use concurrent-chain schedules offering two different options for responding and when the scientists added a “sufficient delay” before reinforcement was available from either choice the subjects (pigeons) used self-control by selecting the large delayed reward.  However, in a “direct choice procedure” the pigeons selected the small immediate reward over the large reward that included a short delay before reinforcement.

What this demonstrates is when delays for reinforcement are added subjects will “shift in favor of the delayed large reward” when they are required to wait longer for reinforcement for either choice.  However, if the delay is short, the subjects will choose the immediate smaller reward and using the general principle to explain self-control behavior “…the value of a reinforcer is reduced by how long you have to wait” and is mathematically calculated using what is called a “value discounting function” according to Domjan (2003).

Discounting function is used to explain how choice behavior is affected by delay of reinforcement.  Studies on drug addiction have indicated that addicts “show steeper reward discount functions than other individuals” and indicates a general lack of “control and impulsivity” (Domjan, 2003).  It is unclear whether the individual choices were based on the drug or that being under the influence caused these individuals to engage in behavior seeking immediate rewards.

Studies on self-control have concluded, “training people with delayed reward appears to have generalized effects in increasing their tolerance for delayed reward” (Domjan, 2003).  Concluding that “more than one reinforcement schedule may be available” at any given time and the “pattern of instrumental behavior” and “choices between various response alternatives, is strongly determined by the schedule of reinforcement that is in effect” and explains how reinforcement controls behavior in a variety of contexts (Domjan, 2003).


De La Piedad, X., Field, D. & Rachlin, H.   The influence of prior choices on current choice.

JEAB:  85, 3-21.

Domjan, Michael.  (2003).   The Principles of Learning and Behavior (Fifth ed.).

CA:  Wadsworth/Thomson.

Fisher, W. W. & Mazur, J. E.   (1997).   Basic and applied research on choice responding.

JEAB:  30, 387-410.

Herrnstein, R. J.  (1961).  Relative and absolute strength of response as a function

of frequency of reinforcement .  JEAB:  4, 267-272.

Lau, B. & Glimcher, P. W.  (2005)   Dynamic response-by-response models of matching behavior

in rhesus monkeys.   JEAB:  84, 555-579.

Lindsay, Steven R.  Handbook of applied dog behavior and training.  2 Vols.

Iowa:   Iowa SP.   2000.  Vol. 1.

Responsible Dog & Cat

Training and Behavior Solutions

Joyce D. Kesling, CDBC


The greatness of a nation and its moral progress can be judged by the way its animals are treated.  Mahatma Gandhi 1869 – 1948

Copyright Responsible Dog 2005 – 2010

Leave a Reply

Your email address will not be published. Required fields are marked *