Learning, what does this mean for professional dog trainers? How is learning applied in dog training and behavior modification?
“Our quality of life is dominated by our actions and the actions of others…any systematic effort to understand behavior must include consideration of what we learn and how we learn it.” Michael Domjan.
I think before one can discuss what learning means one might want to understand what enables one to learn. Behavior is the result of learning, but how does one learn, how is one equipped to learn, what constraints are there on learning if any.
Individuals come with a certain amount of genetic and biological material that governs an individual’s ability to function both behaviorally as well as physically. According to Lindsay (2000), “genes…do not impact directly on behavior, just as behavior…does not impact genes.” However, “genes exercise an indirect influence…by regulating the operation of biochemical mechanisms” that affect the expression of behavior (Lindsay, 2000).
One’s genetic material is not altered by experience and behavior, rather every individual is equipped with a certain amount of inherited “genotypic characteristics” that allow one to function and adapt to their environment.
One’s behavior gradually develops as they interact with their environment ultimately expressing their individual unique visible characteristics or traits. However, one’s potential is limited “…to the extent that an animal is genetically prepared to learn” (Lindsay, 2000). Therefore, Lindsay says, “…biology and genetics define the limits of how and what an animal learns…experience dictates the moment-to-moment direction” one’s behavior will be shaped.
What is learning?
According to Domjan (2003), there is no “universally accepted” definition of learning. However, he says, “learning is an enduring change in the mechanisms of behavior involving specific stimuli and/or responses that results from prior experience with those or similar stimuli and responses.”
Evidence that learning has taken place is indicated by a change in preexisting behavior to a newer response or suppression of previously learned responses. An increase or decrease in behavior may also indicate learning has taken place. It may also be equally important for one to learn to inhibit behavior as much as learn new behavior. Learning is relatively permanent; however, it can be influenced by one’s current motivational states and fatigue
Learning is necessary and enables individuals the means to adapt to their ever-changing social and physical environments. It is necessary for survival allowing individuals to learn from previous experiences, adjusting their behavior according to present circumstances and forming reliable predictions regarding future decisions.
One might want to think of learning as a sequence using four stages, according to Reid (1996). The first stage is acquisition, which indicates the incorporation of new knowledge or learned behavior. The second stage is fluency indicating an understanding of new knowledge or a learned behavior and the third stage is generalization, when one can understand and apply this new knowledge or behavior to a variety of contexts. The final stage is maintenance and usually means incorporating new knowledge or behavior into one’s repertoire and may need some type of training or exercises to maintain the level of response.
The influence of motivation on learning
Motivation can be defined as the driving force behind the behavior of humans and animals including lower organisms. It’s further defined as an “internal state or condition,” activated, and directed by an individuals goals and biological needs.
Motivation may be influenced by emotions previously learned through emotionally positive experiences and negative experiences associated with avoidance. All learned responses are the result of some type of motivation.
The most basic form of motivation is those associated with physiological needs and commonly with survival including hunger, thirst and avoidance of pain. Biological needs are a secondary form of motivation and not associated with survival but can exercise considerable influence over one’s motivation state. These biological motivations can include seeking out sexual partners, parenting one’s offspring and aggression. These motivations seem to be contextually important
Motivation can have positive effects in directing behavior toward specific goals, leading to increased effort and energy, persistence and increased initiation of activities, enhancing cognitive skills while determining what consequences are reinforcing resulting in improved performance.
When motivation is applied to dog training, one must “consider the motivational state of the animal…learning contingencies” and any competing motivations, according to Reid (1996).
Domjan (2003) says, “…motivated behavior…involves systematically organized sequences of actions” and ethologist refer to these organized sequences as appetitive and consummatory behavior. Consummatory behavior is highly stereotyped having specific eliciting or releasing stimuli and bring a “species-typical response sequence to completion.” Appetitive behavior is less stereotypical, occurring early in a behavior sequence, taking a variety of forms depending on the context enabling contact with the necessary stimuli responsible for releasing consummatory behavior.
According to Domjan (2003), “…consummatory responses tend to be species-typical modal action patterns” contrasted with appetitive that are “more variable depending on the environment.”
Understanding how natural behavioral sequences occur are increasingly becoming necessary in applying classical and operant conditioning effectively to modify behavior. According to Lindsay (2000), in spite of all the strides made in the study of animal behavior, the scientific community ignored “instinct” concentrating most of their efforts on laboratory analysis using rats and pigeons as models using arbitrary sets of behaviors such as “maze learning, key pecking, lever pressing, and…various other simple behaviors” focusing more on learning and conditioning.
Lindsay (2000) said, “…instinctual mechanisms and species-typical action patterns should not be overlooked” when analyzing behavior and motivation. He says, “…instincts preserve genetic information” to an animals biological past, saying “nature is conservative…under natural circumstances many biological constraints and pressures are maintained from generation to generation” during their interaction with their environment. This process has provided an organized system of behavior provided by one’s interaction with an environment during the course of evolution…and even though this behavior is not “encoded in an animal’s genome”; one’s genetics does provide some instruction for the expression of these species-typical behaviors (Lindsay, 2000).
Why is this so important to understand? Understanding the interaction of an animal’s biological and physiological mechanisms and motivations, may help explain why. Animals including humans come equipped with “physiological mechanisms that are directly or indirectly influenced by the action of reflexes” (Lindsay, 2000).
What are reflexes?
Reflexes are involuntary actions or movements that respond to stimuli. A reflexive action is controlled through complex communication and coordination between nerve cells known as neurons. These neurons are found in the central nervous system, which includes the brain, spinal cord and peripheral nervous system. Neurons may be sensory, motor or interneurons and serve specific purposes. The sensory or afferent neurons send messages to the brain and spinal cord and motor or efferent neurons send messages away from the brain and spinal cord controlling muscles and glands. The interneurons communicate between nerve cells within the brain, spinal cord and peripheral nervous system.
According to Lindsay (2000), “much of a dog’s behavior is under the reflexive control of involuntary mechanisms.” Some of this reflexive behavior has been described as innate or instinctual by early ethologists who referred to these types of behavior as fixed action patterns.
However, some scientist have disputed these types of reflexes as fixed, saying “…the threshold for eliciting such activities varies as a function of circumstances” and “the same stimulus can have widely different effects, depending on the physiological state of the animal and it’s recent actions” and refer to these types of behavior as modal action patterns (Domjan, 2003).
According to Domjan (2003), “the stimulus responsible for a modal action pattern can be more difficult to isolate if the response occurs in the course of complex social interaction.” Domjan (2003) says, “…a sign or releasing stimulus is sufficient for eliciting a modal action pattern” but the sign stimulus necessary to elicit the MAP might be “controlled by several stimulus features in an additive fashion” and may not be the “most effective stimulus for eliciting an MAP and not likely to occur under natural conditions.
However, there are examples of fixed action patterns that are not dependent on learning for their appearance. An example Lindsay (2000) says is the female dogs “practice of averting her tail to one side before intromission” and a male dogs “clasping and thrusting” action in response. These behaviors are considered “hardwired” and referred to by ethologists as “innate releasing mechanisms” or IRM’s.
The reflexive behavior of dogs has been extensively researched and documented by Fox (1964) and Sherrington (1906) according to Lindsay (2000). Sherrington’s research on dogs established what seemed as voluntary behavior was actually under the control of involuntary reflexive control.
Sherrington divided reflexive behavior into two broad categories called tonic and phasic. Phasic reflexes occur quickly with brief responses and tonic are associated with sustained “adjustments and equilibrating efforts over flexor/extensor dominance” (Lindsay, 2000). These oppositional reflexes are a reaction to opposing force or pressure. Dogs tend to “react reflexively by responding in an opposing direction to the direction of the force applied” which serves to maintain their physical equilibrium or sustain a course of action in opposition to its desired course.
The stimulus intensity necessary to elicit a reflexive action is referred to as threshold. A threshold can be high or low, depending on the strength of the stimulus necessary to elicit the reflexive response. The ability to alter response thresholds is an important part of behavior modification, especially when related to emotional arousal associated with reflexive responses.
The time or interval between the stimulus presentation and the beginning of reflexive action is known as latency. Latency is dependent on stimulus intensity and willingness of the animal to respond. Irradiation is the “tendency of an especially strong stimulus to elicit a generalized reaction extending to surrounding or associated neural systems” and reciprocal inhibition is “the tendency of elicited muscle actions to inhibit the actions of an opposite type” (Lindsay, 2000).
Muscle reflexes can be stimulated using three different actions, flexion, and extension or tonic a combination of both. When one stimulates one group of muscles the opposite extensor muscles are naturally inhibited, a concept adopted by Wolpe (1958) and termedreciprocal inhibition, thisconcept became associated with the effect of counter-conditioning and systematic desensitization procedures.
According to Wolpe, “if a response inhibitory to anxiety can be made to occur in the presence of anxiety-evoking stimuli, it will weaken the connection between these stimuli and the anxiety responses.” According to Lindsay, he argued “relaxation/appetite” and “anxiety/fear” are likewise restricted, in other words, one cannot be relaxed and anxious or have an appetite and fearful at the same time. Another example is “tonic equilibrium” and comparable to those situations where opposite emotional choices are unrealized, therefore the animal remains in a state of conflict with no known alternatives.
If a repeated stimulus produces a weakened reflexive response, fatigue or habituation may occur, which is the most basic form of learning.
Reflexes control some of our most basic biological functions, occurring automatically with sufficient and significant stimulus. It would be virtually impossible to control one’s biological processes such as increased heart rate when confronted with a fearful situation or salivation in the presence of food.
Pavlov’s research found that in spite of reflexive constraints behavioral and physiological actions could come under the influence of neutral stimuli using a conditioning process known as classical or Pavlovian conditioning.
Habituation and Sensitization
Habituation refers to decreases in responses by repeated presentation of a known stimulus, sensitization is the opposite and refers to increased responses, and both types of change result from previous experience. Both occur in many situations, but always require repeated exposure to a stimulus. They are necessary in designing control procedures in classical conditioning and have a role in operant conditioning (Domjan, 2003).
Habituation is the only non-associative form of learning and considered the simplest. Habituation is a process that allows individuals the ability to filter out large amounts of sensory stimuli that bombard us continually, allowing an individual the ability to focus their attention to more relevant information.
The dual-process theory of habituation and sensitization will be discussed in more detail later, but for the purposes of this paper this theory proposes that both habituation and sensitization processes are not “mutually exclusive” and that both may be activated at the same time with the underlying outcome dependent on the strength of either process. In other words, both processes compete to control behavior (Domjan, 2003).
The theory suggests habituation and sensitization processes “occur in different parts of the nervous system” according to Domjan (2003). The theory suggests habituation processes occur in the “S-R system,” consisting of the “shortest neural path” connecting the sense organs activated by the eliciting stimulus and muscles related to the response and defined as the “reflex arc” (Domjan, 2003). The continuous presentation of a sensory stimulus activates the S-R system causing the habituation effect.
Habituation is one of several ways to change dog’s behavior. According to Burch and Bailey (1999), habituation is sometimes called “adaptation” because the subject animal begins to adapt having less reaction to a given stimulus after being repeatedly exposed several times. Using the reflexive startle response as an example, a gun dog might show a decrease in startle response after repeated gunfire exposure. However, habituation can be temporary if exposure to the eliciting stimulus is not maintained at sufficient levels.
Sensitization is the opposite of habituation in that it produces increases in responsiveness, rather than decreasing responses. However, like habituation sensitization does not usually have lasting effects. According to Domjan (2003), “in all response systems the duration of sensitization effects is determined by the intensity of the sensitizing stimulus” with the greater the stimulus the greater responsiveness and the more intense the stimuli, the longer the duration the sensitizing effects will persist.
The Dual-Process theory suggests sensitization processes take place in the “state system” (Domjan, 2003). This system is responsible for other parts of the nervous system, maintaining arousal levels, and activated by arousing stimuli and emotional experiences. Drugs “such as stimulants or depressants” are able to alter the state system changing levels of response. The highly emotional state of fear is controlled by the state system.
Applying a sensitization process to dog training might include relaxation training, establishing a hierarchy of stimulus presentations beginning from the least intrusive, working up, and using counter-conditioning with a reinforcer for accepting the previously feared stimulus. Understanding the short duration associated with the sensitization process, a fearful dog should be continually exposed to the previously fearful stimulus to maintain the newly established positive response.
Habituation and sensitization are respondent conditioning processes, and related to biology and reflexes and sometimes overlapped by operant learning. When working with dogs and fear issues, we are commonly working with respondent conditioning, because fear responses are reflexive changing one’s biological condition and reflected by increased heart rate and respiration. However, these responses may be due to some type of operant learning experience, causing an avoidance behavior associated with fear and anxiety.
Finally, both habituation and sensitization processes are limited in their application and only involve responses already in the animals behavioral repertoire and do not include “learning new responses or responses to new stimuli” and usually involve just one type of stimuli, according to Domjan (2003).
Classical conditioning, respondent conditioning or Pavlovian conditioning (take your pick)
Classical conditioning is about establishing associations between stimuli that affect an individual’s behavior accordingly. Establishing associations within one’s environment allows individuals to predict events and future outcomes based on previous learning experiences; it is simply a “cause and effect” relationship according to Domjan (2003). The types of relationships are usually those associated with safety, danger, food preferences, and emotional reactions established by learning usually associated with fear and pleasure
Who was Ivan P. Pavlov?
Ivan Pavlov was a Russian physiologist and while investigating dogs’ salivary response to the presence of food, accidentally discovered classical conditioning. What he discovered was repeated presentation of food was not necessary to elicit a salivary response but other irrelevant stimuli in the environment could also stimulate the same response.
According to Pavlov’s experiments and method of control, he used two types of stimuli, a light or tone (neutral) that previously did not elicit a salivary response and a food source that did elicit a salivary response. The tone or light was referred to as the “conditioned stimulus” because salivation became dependent on its presentation after repeated pairings with the presentation of food. The salivation elicited by the tone or light became the “conditional response” and the term “unconditional response” used for the food or sour taste (Domjan, 2003).
What this defined was “stimuli and responses whose properties” were not dependent on previous learning were termed “unconditional” and “stimuli and responses whose properties” were associated with training were termed “conditioned” (Domjan, 2003).
These terms are commonly referred to in abbreviated forms, such as CS for conditioned stimulus, CR conditioned response, USunconditioned stimulus and UR unconditioned response.
When we think of Pavlovian or classical conditioning, we are usually referring to “emotional reactivity” and reflexive behavior. Watson and Raynor (1920) did a considerable amount of research in conditioning emotional reactions. One classic study was “little Albert” who was conditioned to fear a white rat after hearing a loud sound associated with the presence of the white rat. After just five conditioning trials, associating a loud sound with the rat produced a strong fear response in Albert. Additionally, the fear generalized to similar stimuli such as “other furry things,” that included a Santa Claus mask, a rabbit, fur coat, dog and cotton wool (Domjan, 2003).
Further research continued on other species such as rats rather than humans and usually consisted of pairing an electrical shock with a tone or light. The electrical shock was considered an aversive because it caused a startle reflex, which is associated with fear and a “species-typical defense response” of freezing. Freezing is a typical response in many mammals in anticipation of a fear or aversive stimulus.
When conducting these types of experiments, researchers were more interested in determining how much the fear response affected the ongoing activities of the tested animal, in most cases this was rats. The measurement procedure became known as the “conditioned emotional response” or CER (Domjan, 2003).
Using this procedure, one establishes a baseline of activity such as pairing lever pressing and food getting. The number of responses is recorded in a predetermined amount of time. Once established the classical conditioning part of the experiment is introduced. During this segment, a CS (conditioned stimulus) perhaps a tone or light is introduced followed by a brief electrical shock, the US (unconditioned stimulus). During this phase of the experiment called acquisition, feeding behavior is usually interrupted. To determine the “suppression ratio” one calculates and compares the number of responses during the presentation of the CS to the number of responses determined as the baseline and before the presentation of the CS (Domjan, 2003).
Further studies utilized an eye blink (defensive response), a reflexive response similar to the automatic knee-jerk. What these experiments determined was “classical conditioning requires…pairing of a CS and a US” and “initial learning may not be directly observable” (Domjan, 2003).
Two aspects of classical or associative learning are sign tracking and autoshaping. Experiments in sign tracking have shown evidence of “compelling attraction to classically conditioned signals” indicating food with the marked characteristic that the conditioned stimulus must be “localized” so that the subject can approach and track the stimulus (Domjan, 2003). Further experiments illustrated the conditioned stimulus required “proper modality and configuration” and “time spent” in the “experimental context relative to the duration of each CS presentation” (Domjan, 2003).
Taste aversion a common side affect of classical conditioning can easily be learned by associating a food item followed by illness. These types of experiments have been extensively studied and indicate “food aversion learning” is independent of one’s rational thought processes and can contradict one’s conclusions regarding illness (Domjan, 2003).
Taste aversion is produced by pairing a CS (taste) with a US (radiation or aversive chemical compound) and follows all the standard laws of learning, but has two unique features. One, taste aversion can be acquired in one-time associations and two, it can occur even though the illness may be delayed by several hours. Scientist suspect this “long-delay learning” may be a result of an evolutionary process enabling humans and some animals the ability of learning to avoid poisonous foods with delayed effects (Domjan, 2003).
All of the previous examples demonstrate how a CS (conditioned stimulus) can be sufficiently paired to elicit an associated US (unconditioned stimulus). In addition to conditioning an association, timing is critical in determining the effect achieved through this conditioning process. There are five common conditioning procedures as follows.
- Short-delayed conditioning – using a slight delay between the presentation of the CS and the US
- Trace conditioning – similar to procedure 1, but the US is presented after the cessation of the CS using a short delay
- Long-delayed conditioning – similar to procedure 1 and 2, except the delay may be as long as 5-10 minutes
- Simultaneous conditioning – most obvious, the presentation of the CS and US occur simultaneously
- Backward conditioning – during this procedure, the US is presented just prior to the CS, which is different from all the other procedures.
The scope of this paper will not include a discussion on the effectiveness of these different procedures, but I will include what one expects to learn from these procedures indicated as follows.
- Magnitude – how strong is the response – how much salivation could one measure for example
- Vigor – how often does the CS elicit the conditioned response, measuring the probability or number of responses
- Latency – measures the duration (time) between the presentation of the CS and the conditioned response
All of the previous discussion has focused on how using these procedures can measure learning and predict when an event or US (unconditioned stimulus) occur. Another type of conditioning procedure determines how subjects can “learn to predict the absence of an unconditioned stimulus and is referred to as conditioned inhibition (Domjan, 2003).
The reason why this type of procedure might be useful is research has shown animals exposed to “unpredictable aversive stimulation” can be “highly aversive” causing various “physiological symptoms” such as “stress” and “stomach ulcers.” The results have determined if one must be exposed to these aversive types of stimuli, that “predictable or signaled aversive stimuli are preferable to unpredictable aversive stimulation” (Domjan, 2003).
The reasoning is if a subject is able to predict aversive events or stimuli, this also might indicate an ability to predict the absence of an aversive event or stimulus. Stress reduction techniques have been introduced to create the absence of aversive stimulation and are referred to as “conditioned inhibitory stimuli” (Domjan, 2003).
Inhibitory conditioning procedures
Conditioned inhibition is different from the previous type of conditioning in that “the absence of a US” indicating a significant event requires the US must occur at some time during the aversive situation. What this means is the US must only occur when the US is presented in an excitatory context, allowing one to use “inhibitory conditioning and inhibitory control” of their behavior (Domjan, 2003).
In order to condition such a response Pavlov provided for two conditioned stimuli and two different conditioning trials. The procedure is outlined as follows.
- Excitatory conditioning trial – includes the US (unconditioned stimulus) and is paired with a conditioned stimulus designated as CS+, using a tone for example, which indicates the aversive stimulus is coming, which stimulates the excitatory context allowing the development for conditioned inhibition.
- Inhibitory conditioning trail – during this type of trial the CS+ (tone) is simultaneously presented with the CS- (light for example) and the US (unconditioned stimulus-aversive) is not presented, allowing for the CS- to become a conditioned inhibitor.
During a normal training process these two types of trials are alternatively offered (back and forth) and after sufficient presentations, the CS- “gradually acquires inhibitory properties” (Domjan, 2003).
What this offers in the form of learning is that the CS- allows a subject the ability to inhibit a normal response to what otherwise is known to indicate potential danger.
One last procedure for conditioning inhibition is pairing a CS- with the US and without the normal CS + serving as the excitatory stimulus. The result is the “CS signals a reduction in the probability that the US will occur” (Domjan, 2003).
What predictable value this offers a subject is that in the presence of the CS- the US is not likely to occur. However, it cannot reliably be predicted. Comparing the previous two procedures, the “US always occurs at the end of the CS+,” but “does not occur when the CS- is presented” at the same time as the CS+ and since these procedures allow the exact timing of the US, they also allow one to predict exactly the time the US will occur. This creates a systematic way for one to predict the absence of the US (Domjan, 2003).
How is conditioned inhibition measured?
First, conditioned excitatory stimuli elicit responses not previously learned. These conditioned excitatory stimuli elicit new responses such as “salivation, approach or eye blinking” and dependent on what was the unconditioned stimulus (Domjan, 2003).
Because we are working with physiological and behavioral response systems that operate in opposing directions makes this possible. For instance, heart rate can increase or decrease just as a behavioral response such as approach can increase or decrease, but in both cases the result is a change in behavior in one direction or the other. Therefore, in both cases of conditioned excitation and conditioned inhibition the result is a change in behavior in the opposite direction (Domjan, 2003).
This type of procedure is limited since they can only be used in response systems that operate in opposing directions and only measure the “net effects of excitation and inhibition.” Thus approaching the CS will only occur if its excitatory properties are greater than the inhibitory properties and withdraw from the CS if inhibitory properties are greater than excitatory properties (Domjan, 2003).
For the purposes of this paper, I am not including the various testing methods for these procedures. Rather conclude that classical conditioning is more complex than habituation and sensitization. Classical conditioning procedures are defined as learned associations that connect one’s behavior and response and how quickly this takes place. These types of procedures are usually associated with biological and physiological response systems and allow individuals to make future predictions about their environment.
Instrumental Learning or Operant Conditioning
This type of conditioning is based on the presentation of stimuli dependent on the prior occurrence of a designated response. In other words, consequences are a direct result of one’s own behavior; giving a subject some control over these consequences is goal-directed behavior (Domjan, 2003).
Origins of research
Thorndike was the first contributor to instrumental learning, through designing puzzle boxes; he was able to conduct experiments that could measure how quickly a subject learned to escape the box to receive a food pellet. Using chickens, dogs and cats, he determined the individual subject’s ability to escape the box with continued practice produced shorter and shorter completed trials when compared to previous response times.
Thorndike’s meticulous work advanced the study of animal intelligence adhering to a “strict avoidance of anthropomorphic interpretations” of the subjects observed behavior. He did not acknowledge any insight into solving the task by the study subjects, but rather concluded the responses reflected learned behavior and S-R (stimulus-response) relationship. This stimulus response mechanism according to Thorndike reflected behavior associated with a confined animal and that successful attempts at escape merely reflected learning associated with stimuli inside the box and the animals normal escape behavior (Domjan, 2003).
Based on his research, Thorndike proposed the “law of effect“, which states “if a response in the presence of a stimulus is followed by a satisfying event, the association between the stimulus (S) and the response (R) is strengthened” and “if the response if followed by an annoying event, the S-R association is weakened” (Domjan, 2003).
It was further emphasized, “according to the law of effect, animals learn an association between the response and the stimuli present at the time of the response.” The consequence is not included in the association, it merely serves to weaken or strengthen the relationship between the stimulus and response, therefore the “law of effect involves S-R learning” (Domjan, 2003).
Discrete-Trial procedures – Thorndike
- Maze learning – the design idea was initiated based on a rats natural environment
- Allowed the scientist to quantify and measure running speeds – typically expecting increased speeds after repeated trials
- Latency could be measured by determining the time between the start and finish for an individual trial – typically expecting results to become shorter as training continues.
Free-Operant procedures – B.F. Skinner
- This procedure contrasts discrete-trials because the subject is allowed to continue repeating trials immediately following completion allowing for continuous training.
- Allowed researcher to study the subjects ongoing natural activity
- Defined behavior by dividing into measurable units
- Skinner box – based on subject pressing a lever and subsequently a food pellet is delivered
Skinner defined the “lever press” by the effect it had on the subject’s environment and that “sufficient depression” need only be enough to cause the pressing action to close the micro-switch, which released the food pellet. He also felt the “muscles” necessary to press the lever were not as critical and it did not matter whether the subject used their right paw, left paw or even their tail, but that these muscles produced the same operant effect.
Magazine training and shaping a response
The initial phase of this training procedure was interesting because it also included preconditioning an association (classical conditioning) with the delivery of food and the delivery device (sound) and after sufficient pairings produced a “sign tracking” response and termed magazine training (Domjan, 2003).
The next phase was training the operant response, which is known as shaping. For instance, the subject is offered a food pellet for anything close to the desired goal behavior, in this case pressing the lever. As each predetermined behavior is shaped, the food pellet is withheld to shape the next step in the entire behavior sequence. Shaping is defined as “two complimentary tactics…reinforcement of successive approximations to the required response and nonreinforcement of earlier response forms” (Domjan, 2003).
The result is the subject will usually perform all the initial pieces of the entire chain of behavior, even though not all the initial pieces may be necessary to perform the behavior. The expected outcome is the desired behavior should become stronger using less effort to complete.
The purpose of instrumental conditioning is not training new behavior, but using existing behavior such as sitting or lying down and constructing a “new behavioral unit.” Instrumental conditioning is also used to teach new behavior or novel responses through a process of training (Domjan, 2003).
Two unique characteristics assist and are evident in shaping procedures; one is shaping “takes advantage of the inherent variability of behavior” and “shaping can produce new response forms, forms never before performed by the organism” (Domjan, 2003).
Shaping responses are performed in a continuous training cycle, allowing a subject to make frequent responses; Skinner determined the “rate of occurrence” for a given behavior would be the measure for the probability of responses (Domjan, 2003).
Four Basic Types of Instrumental Conditioning
Every form of instrumental conditioning includes a response and a consequence or outcome. According to Domjan (2003), “whether the result of a conditioning procedure is an increase or a decrease in the rate of responding depends both on response-outcome contingency and on the nature of the outcome.”
There are four quadrants of instrumental conditioning and differ “in what type of stimulus (appetitive or aversive) is controlled by the instrumental response and whether the response produces or eliminates the stimulus” (Domjan, 2003).
- Positive reinforcement – produces an appetitive stimulus, resulting in either reinforcement or increase in response rate.
- Negative reinforcement – behavior is increased when followed by the removal or avoidance of a negative stimulus.
- Positive punishment – produces an aversive stimulus, resulting in either suppression of behavior or decrease in response rate.
- Negative punishment – is the withdrawal of a positive consequence, resulting in a decrease in behavior.
Punishment and negative reinforcement are often confused, punishment always offers a positive contingency between the aversive stimulus and the instrumental response, compared to negative reinforcement, “there is a negative response-outcome contingency” meaning the expected response will either terminate or prevent the delivery of the aversive stimulus (Domjan, 2003).
Two types of negative reinforcement procedures
Escape – the aversive stimulus is present but turned off by the instrumental response and in turn, the response is reinforced by termination of the aversive stimulus. For example, a dog being trained using a choke collar can avoid being corrected by walking nicely on lead (Domjan, 2003)
Avoidance – an aversive is scheduled to be presented, but may be avoided when the subject presents the correct instrumental response. For example, in the laboratory, a rat may avoid an electrical shock, when presented with a warning signal (impending shock), by making the correct instrumental response (Domjan, 2003).
Omission Training: This is another example of a procedure involving a “negative contingency” associated with the instrumental response and an environmental event. This type of procedure does not use an aversive stimulus, but rather takes away reinforcing opportunities. These types of procedures are sometimes referred to as DRO (differential reinforcement of other behavior). This type of procedure provides opportunity for reinforcement as long as the subject does anything other than the specified target behavior.
Three Fundamental Elements
Instrumental conditioning always includes three fundamental elements, the response, the reinforcement, and contingency, which is the relation between the response and resulting behavior.
An instrumental response will likely produce or strengthen behavior, but it may also produce “creative or variable responses” depending on the particular requirements set during instrumental conditioning (Domjan, 2003).
Constraints on instrumental conditioning
There are limitations to instrumental conditioning, such as the concept of belongingness and what the Breland’s found and termed “instinctive drift” and documented in their landmark paper The Misbehavior of Organisms, in 1961. These limitations were further described and determined consistent in applications of systems theory, one should refer to the work of Timberlake for further review.
It has also been determined the quantity and quality of the reinforcer used during conditioning processes, was dependent on the subjects “past experience and with other reinforcers.” The concept of ‘positive contrast” resulted from studies focusing on a positive and negative contrast between rewards offered during trials. The positive contrast showed an “elevated responding for a favorable reward resulting from prior experience with a less attractive outcome.” In comparison, “negative contrast” resulted in a lower response rate, for the “unfavorable reward because of prior experience with a better outcome” (Domjan, 2003).
It is not entirely clear, but results showed negative contrast was more easily conditioned and is most likely a result of “aversive or frustrative effects” and this creates a “series of cognitive and behavioral changes” over time resulting in “emotional disappointment” when subjects must continually contend with small rewards (Domjan, 2003).
Connecting the reward to the consequence
For instrumental conditioning to be effective, “you have to know when you have to do something to obtain a reinforcer and when the reinforcer is likely to be delivered independent of your actions” requiring a “sensitivity to the response-reinforcer relation” (Domjan, 2003).
In addition, “two types of relationships exist between a response and a reinforcer” the “temporal relation” which defines the time between the reinforcement and response and a special type called “temporal contiguity” which indicates the immediate delivery of the reinforcer. The “causal relation or response-reinforcer contingency” defines the necessary and sufficient criteria between the instrumental response and reinforcer (Domjan, 2003).
What has been determined is delayed reinforcement is not as effective as immediate delivery of reinforcers and that learning may be “disrupted” by “delaying…delivery of the reinforcer after the occurrence of the instrumental response” (Domjan, 2003).
What makes delayed delivery of a reinforcer so difficult is the subject may miss the association between a desired response and reinforcer further compounded by other responses that may occur during the delay. There is a technique designed to overcome this deficiency and referred as a “secondary or conditioned reinforcer” and takes the place of a primary reinforcer (Domjan, 2003).
These conditioned reinforcers “bridge” the delay between the instrumental response and delivery of the primary reinforcement (Domjan, 2003). An additional procedure called “marking” was found to be effective studying rats and delayed reinforcement with 90% correct choice during trials conducted in special maze experiments.
What is superstitious behavior?
It seemed there was much debate about the importance of delayed reinforcement and its effect on learning. Initial thought was the relationship between the response and immediate reinforcement was the most crucial factor for instrumental learning to take place, but after further research, it was established that both the timing as well as the contingency were both equally important.
Skinner’s landmark experiment in 1948 temporarily put this debate to rest affirming the thought that “temporal contiguity” served as the most effective form of learning. The experiment used pigeons placed in separate experimental chambers with food delivered every 15 seconds without any necessary response from the subjects. What Skinner found was that each individual pigeon performed a completely different response in relation to food delivery. He termed this “superstitious behavior” saying, “whatever response a subject happened to make just before it got free food became strengthened and subsequently increased in frequency” due to “adventitious reinforcement” which means the “accidental pairing of a response with delivery of the reinforcer” (Domjan, 2003).
In 1971, a landmark study conducted by Staddon and Simmelhag, found flaws in Skinner’s claim that “response-reinforcer rather than contingency” was more important during instrumental conditioning. They determined prior to food delivery the subject tended toward certain types of responses, which they termed “terminal responses” and those responses occurring after and prior to food delivery were termed “interim responses.” What they observed was the terminal and interim responses did not vary as indicated by Skinners research, but rather “responses did not always increase in frequency merely because they occurred coincidentally with food delivery,” but that “food delivery appeared to influence only the strength of terminal responses” during initial training phases (Domjan, 2003).
Staddon and Simmelhag (1971) proposed as an explanation for the apparent consistency in both the terminal and interim responses and suggested that terminal responses are species-typical responses, reflecting an anticipation of the food delivery and that interim responses were due to other “sources of motivation” (Domjan, 2003).
An alternative and “best developed of these alternative formulations” was proposed by behavioral systems theory and feeding habits of animals. They proposed that interim responses are due to species-typical foraging, called “general search responses” and terminal responses as feeding behavior and called “post-food focal search responses” (Domjan, 2003).
To sum up instrumental conditioning we might say it is defined by the relationship established between a subject’s response and subsequent consequence and that both reinforcement and timing are essential elements.
Burch, Mary R., & Bailey, Jon S. (1999). How Dogs Learn.
New York: Howell
Taflinger, M., & Wilkinson, J. (Eds.). (2003). The Principles of Learning and Behavior . (5th ed.).
Reid, Pamela J. (1996). Excel-erated Learning .
CA: James & Kenneth
Richmond, Raymond L. A guide to psychology and its practice
Retrieved from http://www.guidetopsychology.com/intro.htm
Copyright Joyce Kesling 2012-2017