A Training Program for Filter-Search Mine Detection Dogs

REST (Remote Explosives Scent Tracing) is an odor detection concept in which air from suspect locations is vacuumed though a polyvinyl chloride (PVC) netting filter. The filters are transferred to dogs trained to signal specific target odors such as drugs or explosives. Application for the detection of landmines involves vacuuming an area of land suspected to contain mines, and requires that the dogs achieve detection skills for very low concentrations of the target odor. Here, we describe the principles behind and results of a training program designed to produce REST dogs with detection skills similar to those obtained on filters from minefields. The entire training process for four dogs took under 6 months, and involved two trainers working for 1-5 days/week for about 5 h/day (dogs were trained most days, but both trainers were not always present). Principles underlying the training program were: (1) minimise dependency on handler, (2) encourage independent search, (3) build extended search by progressively reducing the frequency of occurrence of positive filters (using variable interval reinforcement), and (4) progressively generalise discrimination to lower concentrations of TNT. Reward-based “clicker” training was used exclusively. After 15 weeks, the dogs achieved 95% detection reliability. Properly documented training programs such as this one are essential if the industry is to further develop the REST concept for operational implementation on a broad scale.

Dogs have been used for millenia as search tools during hunting. However, only in recent decades have there been systematic attempts to train dogs in a variety of search roles (Fogle, 2000). Their use has been so successful that images of dogs searching for buried people at the site of a disaster are now routine, and most airline travellers are accustomed to the sight of search dogs at international entry points. The use of dogs as search tools for humanitarian clearance of landmines began late in the 20 th century, although their use for mine detection purposes by the military began in world war II (Lemish, 1999). Mine detection dogs (MDDs) are now widely used, with over 700 dogs working in about 23 countries (see http://www.gichd.ch/pdf/Where_MDD_are_used.pdf), and their use is increasing. *Melissa Burns Cusato was the action editor for this paper.
We thank M.B. Iden and T.H. Eiken for logistic and administrative support, and C. Aakerblom, H. Bach, P. Blagden, A. Göth, B. Lewis, P.J. Matre and K. Schultz for discussion and comments. The filters and vacuum sampling equipment were supplied by Mechem Consultants (South Africa). This research was undertaken as part of a broader program on the use of mine detection dogs, and has benefited from discussion with a wide range of partners, especially Norwegian Peoples Aid and the MDD Advisory Group on Standards. Sponsorship of the dog program was provided by the governments of Germany, Japan (though the United Nations Trust Fund), Norway, Sweden, the United Kingdom, and the USA. Address correspondence concerning this article to Ian G. McLean, Geneva International Centre for Humanitarian Demining (GICHD), Ave De La Paix 7bis, CH-1211 Geneva, Switzerland (i.mclean@gichd.ch).
The two largest programs, each with up to 150 operational dogs, are in Afghanistan (nationally run since 1993) and northern Iraq (UN run). In most countries afflicted by landmines, local people do not generally have the educational and management skills needed to operate a program of such a size and complexity. A frequent pattern is for an international agency (e.g., the UN, a nongovernment organisation, or a commercial organisation) to initiate and build such a program. Local nationals are trained in all of the essential roles, and eventually the ongoing operation of the program is handed over to them. The Afghanistan MDD program was the first example of such a model. Central to the success of such a handover process is a carefully designed and fully documented training program for the dogs.
Most MDDs are used in a field search role in minefields, where they search for mines directly. An extensive bibliography (with summaries) on issues related to the use of dogs for demining purposes is available as an interactive CD from the Geneva International Centre for Humanitarian Demining (see http://www.gichd.ch/publications/index.htm). Filter-search, odor detection, or REST (Remote Explosive Scent Tracing), is an entirely different mine detection concept that can be thought of as bringing the minefield to the dog. Originally developed in South Africa for drugs and explosives detection, the concept was adapted for mine detection along roads (Joynt, 2003). Unfortunately, the company that first applied the principle (Mechem, currently a subsidiary of the South African CSIR) did not publish any of their development or validation results. The principle involves vacuuming air from the minefield though a filter. The filter retains molecules of mine odors (explosives and their breakdown products, and possibly casings), potentially for several years if stored correctly (V. Joynt, personal communication, February 9, 2003). The filters are easily transported and can be delivered to the REST MDDs at a central location for analysis. The dog alerts to a positive filter (i.e., containing molecules of target substances) by sitting or lying down. If just one dog indicates a positive result, the section of road or land from which the filter was taken is treated as requiring clearance. If no dog indicates a positive, the land is declared clear. REST is by far the most cost-effective area clearance tool available to demining agencies. It is not yet widely used for a variety of reasons (Bach & McLean, 2003), but one is that no properly documented training program for REST dogs is available. We make a first attempt to describe such a program here.
Successful detection depends on an effective sampling system, careful handling procedures to avoid contamination of filters, and the accuracy of the dogs. Obviously, the dogs must be capable of detecting all positive filters. However, false alerts (the dog indicates a filter as positive when it is not) represent a significant cost to the demining agency because of the need for follow-up clearance, which is costly and slow. Thus the training program for the dogs must address both detection threshold (the dogs must be able to detect TNT or other explosives at the concentrations available on the filters) and reliability of alerting. Training for the former involves progressively reducing the odor concentrations available to the dog for detection. Training for the latter involves carefully designed training programs that maximise extended search effort and minimise occurrence of rewards.

Subjects
The dogs used were all hunting English springer spaniels or springer x Labradors. The training program began with six dogs, of which two were rejected: one was unable to develop an extended search pattern, and the other was too dependent on the handlers. Of the four that completed the training program, thee (Bandura, male, crossbred, 4 years; Hindi, female springer, 3 years; Skinner, male springer, 3 years) were bred at the Fjellanger dog training school. One (Sam, male, springer, 2 years) was imported from England. Two trainers worked with all dogs in order to minimise the possibility of trainer-dog dependencies developing.

Apparatus
The training apparatus was a circular stand with 12 stainless steel arms, effectively a multiple-choice apparatus ( Figure 1). Fittings for boxes or filter cartridges were placed on the end of each arm. The stand could be revolved after search by a dog, allowing rapid adjustment of filter position. Filters could be quickly exchanged, or transferred between arms. The stand was regularly cleaned with alcohol, and all handling of equipment and filters was done using disposable plastic gloves. It was placed in a small room (4 x 4 m) with two entrance doors, a blind for trainers to hide behind, and a one-way window. Dogs were trained to enter at one of the doors, make one circuit of the stand sniffing at each box or filter, and exit at the same door. For any one search, zero, one or two trainers could be in the room, either hidden behind the blind or in the open (we note that as a result of experiences in the program described here, our current procedure is to remove or hide all personnel for all searches). For the dog, the only constants on any search event were the presence of the circular stand and the room itself. Dogs searched the stand with no support or assistance from the trainers, except for the whistle reward when correct positive alerts were given (see below). If the dog correctly determined that there were no positives on the current trial (i.e., it gave no alert), it was rewarded once it was outside the door. The filters were made of coiled polyvinyl chloride (PVC) netting. The procedure for making test filters was as follows. Odor strips made in a standard way were supplied by Mechem, who made the strips by vacuuming over TNT for a standard amount of time. These strips were saturated with TNT, ensuring a high intensity target for the dogs. In the early training series for the dogs, up to 8 strips were used as a target, and the dogs were trained to associate those targets with rewards and to search for the targets using the selection apparatus. Once the dogs were successfully detecting the high intensity target, two processes were initiated. First, the number of arms on the machine was increased progressively to 12 (making the single target more difficult to find). Second, the intensity of the odor source was progressively reduced. During the intensity reduction process (equivalent to generalising response to lower concentrations of the target substance), the number of Mechemsupplied strips in the box was progressively reduced to one. Then the Mechem-supplied strips were used to make new target filters. Initially, 8 strips were placed in a large cardboard box (0.92 m 3 ) and left for 2 h. Then a vacuum was inserted into the box and a new filter was made using the principle of vacuuming though a filter for a standard amount of time. Successive reduction of TNT intensity on the filters was achieved by reducing the number of Mechem-supplied strips in the box (8 to 1), by reducing the time they were left in the box (2 h to 1 h) and by reducing the vacuum time (60 s to 45 s). Unfortunately, no quantitative assessment of the odor concentrations available from the strips and filters was possible. However, dogs trained using this process are able to detect positive filters made by vacuuming over or near real mines laid in the ground, indicating that odor concentrations are similar to or less than those obtained in real minefields.

Procedure
The training program was designed to implement the following training principles: (1) minimise dependency on handler, (2) encourage independent search, (3) build extended search by progressively reducing the frequency of occurrence of positive filters (using variable interval reinforcement), and (4) progressively generalise discrimination to lower concentrations of TNT.
All training involved positive, reward-based "clicker" training (Fjellanger, 2003;Pryor, 1999). In this program, a whistle replaced the clicker. The whistle must first be established as a conditioned reinforcer, which takes 5-10 min with most dogs by linking the sound to an unconditioned reinforcer (usually food). Once established, it allows for very precise timing of reward presentation, allowing reinforcement of behaviour with minimal involvement of the trainer. The dog can therefore be precisely rewarded for approximations of desired behavior (e.g., approaching an odor source) without any need to lead or assist the dog in any way. The trainer effectively waits for the dog to offer successively more refined versions of the behavior, and rewards appropriately during the shaping process. An expert trainer uses a variety of simple procedures to encourage the dog to exhibit the desired behavior (e.g., walking around or standing near to the training device so that the following dog approaches the target). The clicker or whistle is normally used on a continuous rather than intermittent basis. Once the desired behaviour is established and refined, the sound may be removed completely, may become just one of a variety of rewards presented intermittently, or may still be the main reward used. The sound is maintained as a conditioned reinforcer by intermittent presentations of various primary reinforcers (food, ball, praise, etc).
In this training program, the whistle was used throughout as the main reinforcer. Shaping continued throughout the program, so that the training ended at the stage described above where maintenance of an established behaviour was becoming the primary training objective. The whistle allowed extended noise production compared to a clicker, and gave the trainer a greater ability to lead the dog early in the training process (in effect, an extended whistle combined with directed movements of the trainer can be used like a pointer). As training progressed, the whistle was used only to provide an instant of sound, as with a clicker (application of principles 1 and 2 above). The whistle was used continuously. Maintenance of the whistle as a secondary reinforcer was achieved by intermittently linking it to primary reinforcers on about 2 of 3 presentations on a random schedule.
Preliminary training for extended search, odor discrimination skills, and working with the multiple choice apparatus began in early January 2001, and was completed by early-mid February (5 weeks, 1-2 trainers, 4-5 days/week, 5 h/day).
In early February, training to develop skills specific to the requirements of an operational REST dog began. At this stage, the dog was familiar with the training apparatus, had basic discrimination skills for TNT, and was focussed on search and discrimination. A reduced detection threshold and reliable alerting were not yet established.
From early February to mid May, 1-2 trainers worked with the dogs on most working days (i.e., 5 days/week, for about 5 h each day), but the training activity on each day varied. Within any one week, testing of the dogs was conducted on 1-4 days, but most dogs received training on some aspect every day. On non-testing days, the trainers worked with each dog to improve general skills, and also to fix problems identified in the training results (see example below). A daily diary was maintained, but the skills of the dogs were recorded quantitatively only on test days. Test days were not specifically programmed into the training schedule, and in practical terms occurred on those days on which both trainers were available (two were required to ensure objectivity of the data). The research protocol required that personnel be available to conduct tests on a minimum of one day/week. This objective was achieved in all but weeks 10, 12 and 13 of the 15-week program. The before/after research design involved tracking the training success of each dog though time.
Dogs that performed poorly at any stage received extra attention in order keep all dogs to a similar standard, so in some weeks, some dogs received more attention than others. For example, on 7 March, two dogs (Sam and Hindi) gave a high proportion of false alarms (Sam: 10 mistakes of 15 attempts; Hindi: 19 of 26). Therefore on 8-14 March, these dogs were given training designed specifically to improve the false alarm problem. They were tested again on 15 and 16 March when they gave the results (15 March, Sam: 5 mistakes of 16 attempts; Hindi: 11 of 27; 16 March, Sam: 3 of 21, Hindi: 3 of 22). The results for these days can be seen in Figure 2 on week 5, and the effect of the training response can be seen in the improvement in the immediately following weeks.
The dogs were challenged to discriminate one target (or positive) filter from a total of 12. A trial is one event in which a dog searches all filters on the test apparatus on a test day, and was completed in about 1 min or less. The average number of trials per dog per day was (X+s.e.) Bandura: 16.9+0.5, N = 27 days, range = 6-35; Skinner: 17.1+0.5, N = 29, 5-36; Sam: 18.4+0.5, N = 28, 5-38; Hindi: 17.8+0.5, N = 29, range 4-35. For some trials, no target was provided, and for some trials one target was provided. Thus the dog returned any of the following possible responses: (1) alerted to the target = correct response = Cr, (2) gave no alert when there was no target = correct rejection = Cnt, (3) alerting to a non-target = false alarm = Fa, (4) miss = not alerting to a target = Fm. Mistakes are Fa+Fm. It was possible for a dog to make more than one mistake on a trial, for example because it gave several separate false alarms during that trial. However, it was only possible to return one correct result for a trial.
The results are presented in Figure 2 as the proportion of the total number of possible responses that were correct, on each test day: (Cr+Cnt)/(Cr+Cnt+Fa+Fm). The number of test days within a week varied, so all available daily proportions were combined to give a weekly average. In Figure 2, a value of 1 indicates perfect correct responses and a value of 0 indicates that all trials involved a mistake. The dogs are expected to get at least some trials correct by chance (e.g., on a "no target" trial), so the chance of a 0 being returned was very small.
The two types of mistake required different training corrections. Fa is primarily a training problem because it suggests that the dog has learned that alerts can bring rewards in the absence of the contingency (a target odor). Fm is primarily a discrimination problem and suggests that the dog needs more work on the sensitivity of detection and possibly its concentration on the task. We therefore explored the relative occurrence of each mistake though time using the formula Fa/(Fa+Fm). The results of this analysis were extremely variable, in part because of decreasing numbers of errors towards the end of the time series, so the curves were smoothed using a running mean calculated from the ratios for (week-1)+(week 0)+(week+1).
The operational requirement to have dogs alert reliably without being rewarded was introduced near the end of this training program, but the conditions required for its introduction (an established behaviour) were only just being achieved at that point, and it was not a central theme. A more detailed account of the training process summarised above is available as a PDF document from Fjellanger at rf@noksh.com.

Results
Throughout the training program, the dogs worked on the search problem with great enthusiasm. They were always excited to enter the room, worked their way quickly around the 12 filters, and as enthusiastically left the room (they were sometimes rewarded outside the door after a search). There was never any suggestion in their behavior that they found the repetitive training scenario dull or uninteresting, and the presence or absence of trainers had no influence on their search behavior.
The pattern for 3 dogs was for the proportion correct to decrease over the first 5 weeks of training, after which the proportion correct increased rapidly, then more slowly but reliably towards the desired objective of no mistakes (Figure 2). The fourth dog (Skinner) showed a somewhat different pattern by maintaining the proportion correct reasonably consistently around 80% until week 7, when a sudden increase in errors occurred. This change was quickly corrected before he moved reliably towards the training objective of no mistakes. The dogs (in order Bandura, Skinner, Sam, Hindi) attained final proportion correct ratings of 96%, 93%, 95%, and 95%. For trials in the final testing week where a target was provided (i.e., using values for Cr only), the proportion correct alerts were respectively 95% of 22 trials, 88% of 23, 86% of 15, and 95% of 20. At 5 weeks, specific training intervention was given to address the increasing proportion of mistakes being given by two dogs (Sam and Hindi). The effect of that intervention can be seen in Figure 2. At this time the other two dogs (Bandura and Skinner) were showing an increasing proportion of missed targets (Fm, Figure  3), even though Skinner was not exhibiting the general pattern shown by the other thee dogs of an increasing proportion of mistakes overall. It appears that 5-7 weeks was a critical period in the training of all four dogs, although the problems requiring addressing and the amount of intervention were not the same for each dog. It was after 7 weeks that the data from all dogs began to move consistently towards the training objective.
As the dogs progressed though the training program, the proportion of trials in which a target was presented decreased from 1 in 2 to 1 in 4, due to the background strategy of variable interval reinforcement for targets in order to build motivation to search [Method,principle (3)]. The opportunity for doing Fa was always high throughout the training (there were 11 or 12 non-target filters available on every trial). However, the opportunity for doing Fm declined and was quite low by the end of the training program. The few mistakes made at the end of training were about equally distributed between Fa and Fm (Figure 3), indicating that the occurrence of false alerts (a training problem) was negligible. The background strategy of progressively lowering sensory discrimination for TNT [Method,principle (4)] was the cause of the increasing tendency to make Fm mistakes throughout the training program by all dogs (most obvious in the Figure 3 results for Sam and Hindi). Thus the very low proportion of mistakes at the end of the training period was established despite a much improved discrimination skill for detection of TNT relative to the beginning of training. Figure 3. False alarms (a nontarget is alerted as positive, Fa) as a proportion of the total number of mistakes (Fa + missed targets, Fm) made by four dogs during training over 15 weeks for development as operational REST dogs. 1 = all mistakes were Fa; 0 = all mistakes were Fm. No test data were available for weeks 10, 12 and 13.

Discussion
After 4.5 months of training, the four dogs had all achieved an ability to detect TNT reliably at odor concentrations similar to those obtained in an operational situation. They were making a very low rate of mistakes, worked independently on the training apparatus, were enthusiastic, had no dependence on the trainers, and searched for a large number of trials without difficulty. At the time the training program was completed, these dogs were ready for accreditation, and judged by these data should have passed accreditation without difficulty. Thus they were ready to become operational. It seems likely that reliability could improve further with operational experience, as found by Hayter (2003) for dogs trained to detect tripwires. The problem of the task being extremely repetitive and dull is a significant issue in the training of MDDs, whether for REST or field search (GICHD, 2001). The enthusiasm for the task exhibited by the dogs throughout this study is an important benefit of a reward-based training program. With clicker training, the dog quickly learns to link contingency with effect (reward; Pryor, 1999). As long as those rewards are motivating for the dog, it will tend to offer the desired behaviors with increasing frequency and reliability. With careful use of first continuous and then intermittent reinforcement, the clicker rapidly becomes rewarding for the dog, even in the absence of the primary conditioner. The clicker is therefore a powerful tool that encourages the dog to behave with great dedication, even when the task is dull.
The twin training objectives of improving reliability of alerting, and response generalisation to lower concentrations of target odors, are standard objectives in any MDD training program. In essence, these are both practice effects, with the dogs being trained to improve their skills though practice for an extended time period. As any child sitting at a piano knows, practice is repetitive and unrewarding, and the parent of such a child will routinely report that any procedure designed to push or force the child to practice is likely to be counterproductive. In this program, variable interval reinforcement schedules were used to extend willingness to search (the proportion of trials containing a target progressively decreased, thus the average time between them increased). These procedures allow the trainer to progressively make the task more difficult (lowered concentrations), and improve the reliability of alerting behavior, without any loss in enthusiasm for the task by the dog despite the extensive practice required. Once the desired search behaviour is established, intermittent (variable ratio) presentation of rewards would be introduced to decrease probability of occurrence of reward on any one correct alert (that procedure was being introduced here at the time training ended). This is essential preparation for the operational use of the dog, because when unknown targets are being assessed it must be assumed that all alerts are real, and alerts should not be rewarded because of the possibility of rewarding mistakes and detraining the dog.
A critical aspect of this program was the daily collection and inspection of detailed training records. It was through ongoing analysis of these records that the increasing rate of mistakes was noted for several dogs, resulting in training intervention to address those problems specifically. In the case of Sam and Hindi at 5 weeks, the problem was that reduced odor concentration was resulting in positive targets being missed. The training intervention used was to assist the dog to identify the target in one or both of two ways. First, the dog was rewarded when it sniffed at the target (i.e., no alert was required). This is the same procedure as was used in the early stages of training to teach the dog that reinforcement was contingent on the presence of a target odor, and emphasises the need for focusing on sensory analysis of the targets. Second, the odor concentration was raised slightly to an intermediate level in an attempt to extend previously established stimulus discrimination skills. This is the procedure used by several organisations that produce REST or field dogs. In our experience, the second procedure generalises to lower concentrations at a much slower rate than the first. However, the first method requires the use of a procedure such as clicker training, where the dog is highly motivated to find and offer the behavior that is being rewarded, and very precisely timed rewards can be given.
In this training program, we do not address the question of how to maintain the skills of a dog once it becomes operational. However, those skills would be maintained using the same procedures as were used in the original training. Such maintenance training is conducted during parts of the day when the dog is not being used operationally. It offers the two benefits of providing data to monitor the detection and alerting capability of the dog, and ensuring that acquired skills are not lost due to operational experiences. Maintenance training can also be used to further improve the detection skills and alerting reliability of the dogs. Unfortunately, the odor concentrations involved are too low to obtain independent measures of odor availability using current laboratory technology (Phelan & Barnett, 2002;Phelan & Webb, 2003). We believe that 95% reliability is extremely good (in an operational testing situation the miss rate is compensated for by using several dogs). However, clearly the dogs should be as close to 100% as possible, and the maintenance training program must be designed to keep each dog near this maximum. Mechem claims 98% reliability with its REST dogs (K. Schultz, personal communication, February 9, 2003), although quantitative documentation of that claim has never been published.
The primary value of REST is for rapid area reduction. Most "demining" is done on land that does not actually contain mines -primarily because the suspect area is routinely much larger than the mined area. If demining agencies can be given a quick procedure that shows land is clear with a high probability (even if that probability is lower than 100%), then procedures can be adjusted to give significant cost-benefit improvements without compromising safety. For example, current demining machines do not give 100% clearance and must be followed by other (and slower) demining techniques (Handicap International, 2000). However, land shown to contain no mines using REST (with a reliability of e.g., 95%), could potentially be followed by a machine (with a clearance reliability of e.g., 80%) to produce land that is safe for many or even all purposes (note that this is not the place to develop the controversial issues underlying this hypothetical example). Operational use of REST by Mechem in the 1990's assumed that 100% reliability was attained, although 10% of the land was checked using other means as quality assurance (Joynt, 2003, personal communication, February 9, 2003. Certainly, 100% reliability is desirable, but it is not yet known if it can be achieved. Either way, the opportunity to rapidly reduce risk on a broad scale is highly likely to be embraced by national agencies in charge of demining priorities. Unfortunately, the time and investment required to establish and validate a REST testing facility (minimum 1 year) is a serious disincentive.
The issues underlying the further development of REST technology were reviewed at a recent workshop (in February 2003, overview in Bach & McLean, 2003). Important points that emerged were: the limited availability of operational REST detectors (currently, about 30 worldwide) was an important restriction on technology improvement initiatives; the equipment and technology used for REST sampling needed further testing and development, as did the sampling procedure itself; the procedures used to produce, handle and analyse REST filters were still causing detection failures due to introduced contamination and other errors; and research on the development of the technology had never been adequately funded, implemented, or reported. This list appears damning, given that the technology has been in operational use for more than 10 years. However, it reflects the nature of humanitarian demining, as versions of this list could equally be applied to most of the operational systems for demining in use today.
Dog trainers often claim that the ability to train dogs is as much an art as a science. Unfortunately, mine detection dogs (whether for REST or field search) are routinely handled and even trained by nationals with little formal education, and who live in countries where dogs are not widely respected or kept as pets. If they are to train or work with dogs, such people must be given a detailed program to follow, and careful instruction in the basic principles of learning psychology. They are unlikely to have the background skills that are intrinsic to "dog training as an art", but are capable of learning to apply relevant principles in an objective way (i.e., "dog training as a science"). REST dogs are currently being trained by nationals in Angola for the humanitarian organisation Norwegian Peoples Aid, and there is a preliminary proposal on paper to introduce and nationalise the technology in Afghanistan, beginning in 2003. It is anticipated that the use of dogs and other animals (such as Cricetomys rats, Verhagen et al., 2003) for REST detection will increase in the future. Detailed descriptions and documentation of training procedures and results, such as those presented here, are essential if these people are to fully understand the objectives and implementation of their training programs.