Spatial navigation model based on chaotic attractor networks

We present a model of spatial navigation based on the non-convergent dynamics of brain activity. The system includes a hippocampal module that processes global spatial information and a cortical module that deals with local sensory information. We test the model using several spatial navigation paradigms: goal finding, shortcutting and detouring. Computer simulations show that the performance of the agent qualitatively matches that of animals and related models. This new approach provides a novel interpretation of how the brain accomplishes spatial navigation.


Introduction
In recent years significant efforts have been focused on building computational models of animal navigation. Goal finding, shortcutting, detouring, exploration and cognitive mapping are just a few of the navigational tasks that are used to test models of navigation. These models take different approaches in describing how animals perform important navigational tasks.
One approach is to build models that do not take into account the brain regions involved in spatial navigation (Kuipers 1978, Cartwright and Collett 1987, Cheng 1989, Levenick 1991, Chown et al. 1995, Reid and Staddon 1998, Voicu and Schmajuk 2001. These models produce real time behaviour that is compared with animal performance in a given task.
Another biologically inspired approach is to create models that take into account the architecture of the brain regions involved in navigation without aiming to reproduce neuronal activity (Schmajuk and Thieme 1992, Bachelder and Waxman 1994, Benhamou et al. 1995, Samsonovich and McNaughton 1997. Also, these models produce real time behaviour that is compared with animal performance. Still another approach is to create computational models of the brain regions involved in navigation that reproduce the experimental findings related to neuronal activity in the brain (Wilkie and Palfrey 1987, Burgess et al. 1994, Arleo and Gerstner 2000, Hasselmo et al. 2002). Owing to the high level of detail in reproducing the parts of the brain involved in navigation, the behavioural analysis includes simple navigational tasks.
Our approach is motivated by experimental findings related to activity patterns of population of neurons, and since it does not aim to reproduce neuronal activity it fits in the second category mentioned above. However, within this category there is an important aspect that differentiates our model from the others. Unlike other models, ours is based on experimental findings related to the mesoscopic dynamics of brain activity . Mesoscopic dynamics occur at the level of the brain at which the electroencephalogram (EEG) is measured. At this level, the sampled signals are the result of the positive and negative feedback interactions between populations of neurons. Freeman (1975) proposed a mathematical model that captures the dynamics of these cell assemblies based on brain data measured in the olfactory bulb. The impulse response was used to identify the parameters of a second-order linear differential equation that describes the behaviour of neuronal populations. Based on this simple model, which simulates the dynamics of cell assemblies, Freeman proposed a hierarchy of models (K0, KI, KII and KIII) that have the capacity to show aperiodic behaviour similar to that found in the EEG (Freeman et al. 1997) and can be used to perform robust pattern recognition of noisy and variable data (Kozma and Freeman 2001). This paper expands the KIII-based method for the problem of spatial navigation (Kozma and Freeman, 2003).
The model of navigation that we propose in this paper describes the activity of the hippocampus and the cortex by using two KIII systems. As in other approaches, we assume that while the hippocampus processes information related to global landmarks, the cortex deals with information received from the close vicinity of the simulated organism. We mention points of contact between our approach and others as we develop our navigation model which takes advantage of the non-convergent dynamics of the KIII models.
We first describe the model and illustrate how different modules operate during spatial navigation. Then, we go on to show how the model performs in several spatial navigation tasks such as goal finding, multiple T-maze, detour finding and goal finding in a Marslike environment. We conclude by comparing our model and its performance with other similar approaches.

Components of the K sets
The K sets (K0, KI, KII and KIII) represent a family of models of increasing complexity which initially were developed to model the function of the olfactory bulb and now are used to model various functions of the brain. K0 is the elementary building block that represents a population of neurons. Its behaviour is described by a second-order linear differential equation: where a and b represent time constants, p i represents the internal state of the unit i and f i represents the input signal, which is a weighted sum of the Q i signals from other units: The activation function Q i is an asymmetric sigmoid. Its shape was determined experimentally (Freeman 1975).
where Q m represents the slope of the sigmoid. KI contains two K0 units that are linked using either excitatory (i.e. positive) or inhibitory (i.e. negative) connections. As shown in figure 1, KII is a double layer of excitatory and inhibitory KI units. Given a set of carefully chosen parameters and connection weights, this system can maintain oscillatory behaviour by itself or once it is perturbed from a stable state (see figure 2). The parameters of the model used in the simulations are described in the Appendix. Figure 3 shows the diagram of a KIII model which includes three layers linked by feed-forward and feedback connections. Each layer has multiple KII sets, connected by lateral weights between corresponding nodes. Owing to its KII modules that oscillate at different frequencies, the KIII model is able to show aperiodic oscillations of the nature found in the EEG signal (see figure 4). Unlike KO, KI and KII, KIII includes a Hebbian learning mechanism that allows KIII to perform pattern recognition and classification (Freeman et al. 1997). A calculation of the Lyapunov exponent (Wolf et al. 1985)   of the time-series generated by the KIII model shows a strictly positive exponent. This is an indication that the KIII system produces chaotic activity. Figure 5 shows the diagram of the model of spatial navigation. The system comprises a high-level sensory module which gathers information about distances and angles to known landmarks in the environment. A place recognition system uses this information to make associations between places and actions. The model also contains a low-level sensory module which provides information about obstacles that are relatively close to the agent. A pattern recognition system processes this information, identifies shapes of obstacles and sends the appropriate motor command that avoids collision with the obstacle. Motor commands that come from the pattern recognition system have priority over commands sent by the place recognition module.

Space representation
First we want to make the distinction between the representation of the simulated space and the internal representation of space that the agent uses for navigation. The representation of the simulated space is a 20 Â 20 grid which allows the agent to perceive the environment from 400 different locations. At any instance in time, the robot can choose the next move from one of the four direct neighbours of the current grid position. For simplicity, in our computer simulations we use a grid to model the environment. However, this does not restrict the applicability of the model. Any continuous twodimensional environment can also be used. Unlike the simulated surroundings, the internal representation of space that the agent builds and uses for navigation is not discrete since the sensory acquisition and the internal dynamics are continuous. The term internal spatial representation might not be appropriate in this context since it usually denotes a representation that does not change dynamically in a static environment.

High-level sensory information
We assume that we have a reliable and robust localization system that provides distances and angles from the current position of the agent to three known landmarks in the environment (see figure 6). This can be achieved by a combination of dead-reckoning (Monte Carlo localization (Rofer and Jungel 2003), Kalman filtering (Balakrishnan et al. 1999)) and global positioning system using vision or radio signals. The information delivered to the place recognition system contains the current sensory image (L d1 , L a1 , L d2 , L a2 , L d3 , L a3 ) and the most recent nine images. L di represents the distance to landmark i and L ai represents the angle formed by the line that contains the agent and landmark i and the horizontal line of the grid. The place recognition system receives 60 ( ¼ 10 Â 6) input values. The size of the memory for the sensory information, in this case nine, was determined empirically and was chosen so that KIII would provide robust classification.

Low-level sensory information
The low-level sensory information provides readings from eight sensors that the agent can use to scan for obstacles at a distance twice the size of the grid cell (see figure 7). The sensors are oriented in the following directions: N, NE, E, SE, S, SW, W and NW. The information delivered to the pattern recognition system contains the current sensory image (S 1 , S 2 , S 3 , S 4 , S 5 , S 6 , S 7 , S 8 ) and the most recent 14 images. The number of inputs received by the pattern recognition systems is 120 ( ¼ 15 Â 8). The size of the memory Figure 6. High-level sensory information. Outline of collecting orientation information with respect to three given landmarks (black circles) located in the environment. for the sensory information, in this case 14, was determined empirically and was chosen so that KIII would provide robust classification.

The hippocampal and cortical modules
The operation of the hippocampal and cortical modules can be described as follows. In the absence of stimuli each module is in a high dimensional state of spatially coherent basal activity (see figure 4(a)), which is described by an aperiodic (chaotic) global attractor. In response to external stimuli, the module can be kicked off the basal state into a local memory wing (see figure 4(a)). This wing is usually of much smaller dimension than the basal state. It shows coherent and spatially patterned amplitude-modulated (AM) fluctuations. The system resides in the localized wing for the duration of the stimuli then it returns to the basal state. This is a temporal burst process that is repeated every few hundred milliseconds . The system is able to store information in the sequence of AM patterns during a burst. When used to classify linearly non-separable patterns, the system performs as well as multi-layer feed-forward neural network-based classifiers (Kozma and Freeman 2001). KIII compares favourably with these methods, especially regarding robustness and noise-tolerance of the pattern recognition.
In the model, several types of learning rules have been used simultaneously, including habituation, Hebbian learning and global stability control (Kozma and Freeman 2001). All these learning methods exist in a subtle balance and their relative importance changes at various stages of the memory process.
In our model, Hebbian learning is applied to modify the lateral connections between the excitatory nodes of the cortical and hippocampal KII layers. We design the following learning cycle. We show a given pattern to the system for a duration of 100 ms, which corresponds to the period of the theta cycle in rats, when sensory inputs are perceived. This is followed by a period of 100 ms without input pattern, corresponding to a resting part of the sensory cycle. After the 100 ms resting period, a new pattern is shown, and the whole cycle is repeated. We use Hebbian learning in combination with reinforcement. Learning occurs if the system receives positive or negative reinforcement. This approach corresponds to rewarding or penalizing the animal depending on its response to the environmental information. Lines represent distance sensors.

Motor system
We assume that the motor system executes the commands received from the pattern and place recognition systems. No learning mechanism is included.

Computer simulations
We present computer simulations that illustrate how the model performs in five spatial navigation tasks.

Goal finding in an open environment
In this paradigm, animals learn to find a goal position which can be perceived only from close proximity. External landmarks are available and the environment does not contain obstacles.
4.1.1. Experimental data. Many studies report that animals that use contextual or local cues are able to find a hidden escape platform in a water pool. This is known as the Morris water maze paradigm (Morris 1981, Morris et al. 1982. One popular version of the Morris water maze paradigm is as follows. At the beginning of each training trial, animals are placed in a random position in the water pool and allowed to explore the maze until they find the hidden platform. Animals are left on the platform for a few seconds. If they do not find the platform then they are placed on the platform for a couple of seconds before the trial is completed. As illustrated in figure 8, animals demonstrate the ability to learn the position of the hidden platform.

Simulated results.
In the simulated version of the Morris water maze paradigm, the agent learns to find a goal position while it randomly moves in an environment (see figure 9). At each step the agent receives positive reinforcement if the next position is closer to the goal. This is equivalent to the reinforcement the animal receives once it finds the platform or when it is placed on the platform by the experimenter.  Figure 10 shows the trajectories of the robot after the learning trials. Since the agent does not have a mechanism to recognize goals it keeps moving in the vicinity of the goal.

Shortcut experiment
Using an open field (see figure 11), Chapuis (1988) rewarded dogs after moving them from a start point to a first goal location, back and forth, and then from the same start position to a second goal, again back and forth. He showed that on the test trial, when placed at the start point, the animals would go first to the closest rewarded location and then would shortcut through unexplored territory to the second rewarded location. Similarly, in order to minimize the total distance travelled to retrieve hidden pieces of  food, chimpanzees and vervet monkeys use a least-distance strategy, as described, respectively, by Menzel (1973) and Cramer and Gallistel (1997). We simulated Chapuis's paradigm by assuming two hippocampal modules that were reinforced separately for two different goal locations (see figure 12(a)). First, the simulated animal learns the position of two goal locations while it randomly moves in the environment. While one of the hippocampal modules is reinforced for the first goal location the other is reinforced for the second goal. On the test trial (figure 12) when placed at the start point, we let the simulated animal navigate by activating the first hippocampal module. Once the simulated animal reaches the first goal location we activate the second hippocampal module, which makes it reach the second goal location. Thus, as in the experimental study, the simulated animal takes a shortcut from the first goal location to the second one.

Detour experiment
In this paradigm, animals show the ability to avoid obstacles placed on a path to the goal that has been experienced before. The environment contains obstacles and external landmarks or cues. Figure 11. The experimenter led the dogs on the path CA-AC-CB-BC to show that food was located at place A and place B (left). Then, during the test trial (middle and right), the dogs were left to retrieve the food. Once they reached place A they shortcut directly to place B.
Adapted from Chapuis (1988). Figure 12. Test trial for the shortcut experiment. The agent moves from the start location to the first goal while the first hippocampal module is active and then once the first goal is reached it shortcuts to the second goal while the second hippocampal module is active. In this case we assume that the agent can detect the proximity of the goals.

Experimental data.
Many observational and experimental studies (Santschi 1913, Dennis 1929, Tolman 1932, Thorpe 1950, Schmidt et al. 1992, Zeil and Layne 2002 have shown that animals are able to perform detours. For example, Dennis (1929) showed that blind rats are able to manoeuvre around an obstacle placed on a previously visited path to the goal location. Tolman (1932) showed that rats are able to avoid or escape U-shaped barriers placed in-between the start and goal locations.

4.3.2.
Simulated results. Our simulations show that the agent performs detours by learning and recognizing different shapes of obstacles that need to be avoided while navigating towards the goal. The pattern recognition module is reinforced to send the appropriate motor commands that avoid collision. Unlike the landmark-based navigation where the simulated robot explores an open environment, the navigation using a limited sensory horizon takes place in a setting that contains obstacles. As shown in figure 13, during random exploration the agent learns to avoid obstacles. During one trial the agent starts in one corner chosen at random and performs 50 steps before starting another trial. In this particular setting only the cortical module is trained. The environment does not contain a goal location.   Figure 14 shows the trajectory of the agent after it has learned to avoid obstacles. At this stage the agent does not reach the corner of the obstacle. When, in addition, the agent is trained to reach the goal, it does so in an efficient manner by avoiding the obstacles in-between the start and the goal location (see figure 15). This behaviour is produced by the co-operation of the hippocampal and cortical modules, with the cortical one having priority whenever an obstacle is detected.
We measured the optimality of the trajectories generated during navigation by calculating the ratio of good and bad moves. While a good move is defined as a step towards the goal, a bad move is defined as a step away from the goal. The good and bad moves are measured at two levels as shown in figure 16 for different sizes of the sensory horizon. Table 1 presents the results for different sizes of the sensory horizon. As the size of the sensory horizon increases the ratio of good and bad moves also increases. Since the number of good moves becomes larger than the number of bad moves, as shown in table 1, we conclude that the simulated animal learns obstacle avoidance.

Multiple T-maze
Experimental studies show that animals are able to navigate in very complicated mazes. For example, Blodgett (1929) showed that rats are able to solve the multiple T-maze after several days of training. In his study, food-deprived rats received one trial a day. They were placed in the start box (place 1, see figure 17) and retrieved after reaching the goal box (place 19) where food was located. After receiving one trial a day for several days, the number of bad moves (error score) used to reach the goal decreased significantly (see figure 18).

Simulated results
We simulated the multiple T-maze paradigm by placing the agent in the maze and giving it positive reinforcement whenever it moved towards the goal location. As shown in figure 19, the agent demonstrates a significant improvement after five trials compared with the first training trial. The number of steps to the goal is significantly higher in the first trial than in the fourth and fifth trials (df ¼ 20, t ¼ 8.79, p < 0.001, df ¼ 20, t ¼ 7.46, p < 0.001, respectively). Figure 20 shows the best generated path between the start location and the goal.

Navigation in a Mars-like environment
A challenging navigation problem is to find the location of a goal in a Mars-like environment (Huntsberger 2001, Tunstel 2001. We tested our navigation model in a Figure 18. Performance of rats before and after training. The error score represents the number of wrong turns the rats made on their way to the goal location. Adapted from Blodgett (1929). Figure 19. Performance of the agent in the multiple T-maze.
simplified Mars-like environment that contains obstacles of different sizes. The size of the grid in this particular case is 50 Â 50. Obstacles are uniformly distributed in the environment and their sizes have an exponential distribution. We used the same paradigm as for the goal finding task mentioned in section 4.1.

Simulated results
As shown in figure 21, the agent is able to avoid the obstacles and reach the goal location by using a sub-optimal path. The average number of steps to the goal for random exploration is 2034, whereas after learning it decreases to 117.   Tunstel (2001). The size of the grid is 50 Â 50. Black circles represent the landmarks used by the localization system.

Discussion
We have presented a neural dynamical model of spatial navigation that describes goal finding, detouring and shortcutting. The model has the following features. First, we used a localization system that uses high-level and low-level sensory information to provide a robust representation of space. Second, we used the chaotic dynamics of the KIII sets to implement pattern and place recognition systems. The place recognition system guides navigation based on global landmarks and the pattern recognition system performs obstacle avoidance based on local sensory information. Next, the connectivities between nodes in the third layer of the KIII sets are updated using a Hebbian learning rule. Finally, the decision of which place to move to next is based on the positive reinforcement the simulated animal receives while exploring the environment.
Although it accomplishes the same navigational tasks as other related models of spatial navigation this novel approach is fundamentally different as it takes into consideration the chaotic dynamics observed at the EEG level. The paradigms presented in this paper have been described by other models closely related to ours (Blum andAbbott 1996, Trullier and. These models are based on experimental findings related to the properties of the place cells Dostrovsky 1971, O'Keefe andRecce 1993).
The Blum and Abbott (1996) and  models have many points in common with our model but also important differences, which allow them to capture diverse aspects of spatial navigation. Before presenting a comparison between their models and ours we shall describe their functioning at the conceptual level.
Blum and Abbott analysed the formation and exploitation of an internal representation of space (also known as the cognitive map) by taking into account the following factors. First, it was assumed that long-term potentation (LTP) occurs in the hippocampus only if presynaptic activity precedes postsynaptic activity by less than 200 ms. Postsynaptic activity before presynaptic activity cannot cause LTP or long-term depression (LTD). Second, CA3 is used as a recurrent network which is able to store connections between either place cells or other types of information like configuration and time. Third, the model receives the sensory information necessary to activate the current place cell and at the same time the heteroassociative network in CA3 and CA1 provides a shifted place code towards places that have been visited in the past. By taking into account the difference between the shifted position and the real location, the hippocampus is able to guide the navigation through the environment. During exploration, the model is able to update its internal representation of the environment and to restore it when needed.
The model proposed by  combines the long-term memory provided by the recurrent connections in CA3 with the ability to store sequence of places in a short-term memory (dentate gyrus). As in most of the spatial models of the hippocampus, the output of entorhinal cortex is used as a high-level sensory information system. In this model, the dentate gyrus plays the role of a short-term memory for the most recently visited places. This memory is fairly small (up to six to seven most recently visited places) and it is built on neurons that are able to fire repeatedly even when the input that caused the firing is lost. The activity of these neurons is modulated by the synaptic input from the entorhinal cortex. Consequently, whenever the local view is changed the short-term memory is erased and a new sequence is initiated. Therefore, the short-term memory holds sequences of places that correspond to straight lines in the environment.
The model takes advantage of the recurrent connections of the CA3. The model builds a map of the environment by using CA3 as a heteroassociative network which has the synapses not only updated using Hebbian learning but also gated by the head direction cells. The model also includes a layer of goal cells that encode the direction from the animal to certain goals in the environment. It is assumed that each cell starts firing at a specific place and continues to fire if the animal moves in a straight line. Once the model has learned the environment it is able to reach any goal from any starting position.
In terms of spatial representation, all three models use a finite grid as a simulated environment. While our model and Blum and Abbott's model assume no a priori representation of the simulated environment, Trullier and Meyer's model assumes that the internal representation of the simulated environment is formed during exploration. Unlike our model, which does not assume connections between adjacent cells on the grid, the other two models assume connections between cells that form paths in the environment. In all three models, the grid representation is provided by a localization system that produces a unique code for each cell in the grid.
One important difference between our model and the model of Blum and Abbott is in the mechanism that produces action. Whereas our model uses actions that were previously reinforced, Blum and Abbott suggest that directions for future movement are derived from the shifts of the position coded by the hippocampal place cells.
Another important aspect of the models is the learning procedure used for goal finding. Whereas Blum and Abbott's simulated organism needs to be moved away from the goal on successive runs so that it can produce stable behaviour, our model can learn the position of the goal from any starting place.
Still another important difference between our model and the other two is the way in which information is processed. Whereas our model encodes information in the amplitude of the oscillations generated by the KII units and the Hebbian synapses in the third layer of the KIII system, the other two models use firing rate and Hebbian synapses. Despite these differences the performance of our model matches that of the standard approaches.

Conclusion
In this paper a KIV-based navigation model has been introduced. The model is able to show goal finding, detouring, shortcutting, maze learning and goal finding in cluttered environments. These are simple tasks that do not require very sophisticated navigation techniques. More complex spatial behaviour, like reaching a goal without relearning the environment with respect to that goal, requires an internal representation of the environment that does not depend on goals. This is the next step in our attempt to build a reliable and robust navigation system based on chaotic attractors.