Optimization of olfactory model in software to give 1/ f power spectra reveals numerical instabilities in solutions governed by aperiodic (chaotic) attractors

We present a general connectionist model for an olfactory system. The dynamical behavior of each node (neural ensemble) of the model is governed by a second-order ordinary differential equation (ODE) followed by an asymmetric sigmoidal function, relating the aggregate activity of neurons to system parameters and stimuli from an outside environment. A digital implementation of the general connectionist model simulates the characteristics of a mammalian olfactory system having modiﬁable synaptic connections and spatio-temporal interactions among neural ensembles. Each of four distributed delay terms is represented by a second-order ODE. A parameter optimization algorithm is an integral component of the model. The parameter optimization discussed in this paper results in aperiodic oscillations having a near 1/ f -type power spectrum with a peak in the gamma range, simulating the electro-encephalographic (EEG) potentials from the neural olfactory system. Random optimization is used for a rough search in a global parameter domain and the parameter self-adaptation rule serves for ﬁne tuning within a local domain after the global search. By design the model is started from an unstable zero point by an external impulse input. It requires 650–850 ms to pass through an initializing transient before settling into a strange attractor. The attractor persists for at least 1500 ms, which is 7.5–20 times longer than the duration of the maximal stationary states observed in the EEGs and it is stable under perturbation by simulated sensory inputs giving response amplitudes less than 2 times the basal aperiodic activity. However, the optimization to 1/ f -type activity reveals an inherent limit on digital simulation of chaotic states, owing to attractor crowding such that the size of basins decreases with increasing size of the model, until it approaches the size of digitizing step in computation, here a 64-bit word ( , 10 ¹ 16 ). Outputs that are not optimized to approach 1/ f -type power spectra transit earlier to limit cycle activity, usually well before 2000 ms. The duration of stationarity is increased by randomizing the terminal bit of the 64-bit words representing the state variables. The 1/ f -type solutions are also exquisitely sensitive to parameter truncation; parameter values must be saved in their full binary form for re-starting. The implications in terms of numerical instability, chaos, attractor crowding and the shadowing theorem are discussed. q 1998 Elsevier Science Ltd. All rights reserved.


Nomenclature n
number of channels of an olfactory system, i.e. a distributed OB layer with n units x i (t) wave amplitude activity of cell ensemble i ʦ {1, 2,…,N} at time t ʦ R y i (t) pulse density activity of cell ensemble i ʦ {1, 2,…,N} at time t ʦ R r i (t) external stimulus (forcing term) of x i at time t w ij system parameter, which is the connection strength from y j to y i j k i system parameter, which is the gain for r i to y i

Introduction
There are two steps to model a very large-scale neural system.The first is to construct a set of parametric equations, which incorporate the anatomical synaptic connections and physiological interactions among neural 0893-6080/98/$19.00᭧ 1998 Elsevier Science Ltd.All rights reserved.PII: S0 89 3 -6 0 80 ( 97 ) 00 1 16 -0 LR set of cell ensemble nodes in one of the layers of the olfactory model, i.e.LR ʦ {PG, OB, AON, PC} Q(x i , q (LR) ) sigmoidal I/O transformation function for x i with a system parameter, q (LR) , defining its least upper bound DL set of four delay feedback nodes a 0.220/ms, which reflects the slower real rate constant of x i b 0.720/ms, which reflects the faster real rate constant of x i Neural Networks 11 (1998) 449-466 NN 1116 Neural Networks ensembles that are perturbed by external inputs.The second is to derive quantitative rules, which direct self-adjustment of the values of parameters of the model from its searching history.The automation of parameter optimization in largescale biologically plausible cortical networks opens a new approach to investigate sensorimotor operations in brains (Freeman and Shimoide, 1994;Chang and Freeman, 1996).By using biological data as the criteria, we optimize parameters in a neural ensemble model used to simulate output of the olfactory system for specified input and for basal activity with no input.
An early model simulating a biological neural system was a relatively lower dimensional chaotic system (Freeman, 1987) in which slight variation of initial conditions of variables or system parameters led to diverse output patterns.The results from computer simulations are thus highly dependent on the numerical precision and differences in machines, ODE solvers, programming languages, etc.Instead of showing the values of parameters used to simulate experimental data, it makes more sense to report the optimization algorithm, including specification of performance criteria, for researchers to generate the data by using their own software and hardware.Both the parametric mathematical model and its parameter optimization algorithm, based on the attendant activity perturbation dynamics, should be regarded as integral components of a realistic neural model.
The parameter optimization algorithm, which leads the model to operate in a domain of aperiodic solutions simulating the olfactory EEG, includes two techniques: random optimization (Matyas, 1965) and error propagation (Chang and Freeman, 1996).The first serves for a rough searching in a global parameter domain and the second serves for fine tuning within the local domain from the global search.This optimization algorithm, including specification of performance criteria should be regarded as an integral component of a realistic neural model, although it is not unique.The results reported here reveal a significant barrier to implementation of a digital embodiment of the model, owing to the instability of numerical solutions of the equations.

Architecture of a model of an olfactory system
The central olfactory system consists of the periglomerular layer (PG), the bulb (OB), anterior olfactory nucleus (AON) and prepyriform cortex (PC).Each part is composed of cell ensembles, which are populations of excitatory or inhibitory neurons.Fig. 1 shows a schematic diagram of principal types of neurons, pathways and synaptic connections in the olfactory mucosa, bulb and cortex.Each node represents an ensemble in a local neighborhood.The dynamics is represented by a linear second-order equations to model cable delay and passive membrane delay.The linear part is followed by a static sigmoid nonlinear function.
The ensemble rate constants, a and b, of each node were fixed throughout at physiologically determined values (Freeman, 1975).Length constants were obviated by using a globally connected network.Gain constants, q (LR) , for the asymptotic maximum of the sigmoidal I/O relationship (Fig. 2) were fixed at a common value (Eeckman and Freeman, 1991) within the periglomerular (P) array in the PG and at other values in the mitral (M) and granule (G) arrays in the OB, the excitatory (E,A) and inhibitory (I,B) neurons in the AON and PC and the deep pyramidal cells (C) in the PC.The connection strengths between nodes within the PG, OB, AON and PC and the feed-forward and feed-back gains between them, are the system parameters modeling the synaptic interactions of neural ensembles.Other parameters required to model the full set are distributed feedback delays.We have been shown empirically that the solutions of the model suffice to simulate versatile EEG patterns by proper evaluation of all the parameters (Freeman, 1987(Freeman, , 1992;;Yao and Freeman, 1990).
The asymmetric sigmoid function (Fig. 2) has been derived from the Hodgkin-Huxley equations (Freeman, 1979) and evaluated by fitting it to the normalized probability density conditional on EEG amplitude of single and multiple unit firing of neurons in the PG, OB, AON, PC and entorhinal cortex (Freeman, 1975;Eeckman and Freeman, 1991).The asymmetry is necessary for the input-dependent increase in gain that is required to simulate bursts of oscillation on inhalation.The function has only one parameter, q (LR) , to specify slope, asymptotes and zero operating point, whereas alternatives such as the logistics curve, the arctangent and a cubic spline cell require three parameters.

The problem of parameter optimization
Because the synaptic connection weights cannot be directly measured from experiments, we optimize these values by simulating with the model the biologically measured activity (endogenous EEG waves) and average evoked potentials (AEPs impulse response evoked by electrical stimulation).Owing to a high dimensional parameter space and thus high complexity of parameter controls for the entire system, it is unrealistic to optimize all the parameters at the same time.We take a step-by-step approach to overcome this difficulty (Freeman, 1975;Freeman and Shimoide, 1994;Shimoide and Freeman, 1995 (Freeman, 1975).2. Partition the olfactory system (e.g. the PG, OB, AON and PC) and optimize the values of their internal parameters with respect to AEPs in their near-linear ranges at their characteristic frequencies.(Freeman and Shimoide, 1994;Chang and Freeman, 1996).3. Fix the values of parameters which have been determined and optimize the values of synaptic connection parameters between the parts with reference to the EEGs and multiple unit activity of the four parts.
In this paper, we focus on the third step.One of observed characteristics of EEGs is a 1/f-type power spectrum.Averaged auto and cross spectra for the OB and PC at rest and of the entorhinal cortex and dentate gyrus of hippocampus (DG), show that the log power decreased nearly linearly with log frequency.Temporal frequency analysis Fig. 2. Asymmetric sigmoidal I/O transformation function [q (LR) ¼ 5.000].This function was derived from the Hodgkin-Huxley system (Freeman, 1979) and holds for all parts of the central olfactory system (Eeckman and Freeman, 1991).
on pre-and post-stimulus of neocortical EEG segments for prepyriform, visual, somatic and auditory recordings also reveal 1/f-type power spectra (Barrie et al., 1996).Thus, the 1/f Property becomes our criterion for the optimization.Our aim is to derive an analytical rule and determine the values of parameters to match the characteristic from normal neuro-physiological data.
In Section 2, a mathematical model of the olfactory system is presented.The parameter optimization rule is addressed in Section 3. Simulation results shown in Section 4 employ the mathematical analyses in the previous sections.

Dynamics of the olfactory neural ensemble model
A massively parallel distributed architecture with multiple layers can be used to describe the olfactory neural system (Freeman, 1974).We first express a vector-matrix representation of the olfactory neural dynamics and next give an example to show how it is derived.The definitions of variables and parameters are given in nomenclature.

Vector-matrix representation of the olfactory model
The dynamical behavior of each cell ensemble of the olfactory system can be governed by: a general connectionist model (Chang and Freeman, 1996), where F i and W i are derived from experimental rate constants.A single (double) dot over a variable means the first (second) derivative of the variable with respect to time.The variable, x, represents dendritic potential, as it is evaluated by extracellular measurements of EEGs.The variable y represents multineuronal pulse density as it is evaluated by multiunit recording.Both variables are continuous in the cortex.They are discretized by experimental measurement.Conversion from pulse density, y, to wave amplitude, x, is implicit in the synaptic weights.Conversion from wave amplitude, x, to pulse density, y, is implicit in the sigmoidal function, Q(x,q (LR) ) (see Fig. 2 with q (LR) ¼ 5.000).The pulse-wave conversion is kept in its linear range by the sigmoid (Freeman, 1975).The variable r stands for an external stimulus.Physiological experiments confirm that a linear second-order derivative is an appropriate choice.
By the topological diagram of Fig. 1, the n-channel olfactory model is implemented by Eq. (1) in vector-matrix notation with the specifications listed in Appendix A. This notation allows us to concisely present the olfactory model and its corresponding dynamics of parameter optimization (see Section 3).

ODEs for the periglomerular cell ensemble
To show how to evaluate Eq. (1) in vector-matrix representation, we begin with the dynamics of the P cell ensemble in the PG layer: The RHS of Eq. ( 2) represents the summed input to a P neural ensemble from receptor R, from the other P-ensembles in the same layer and from delayed outputs of ensembles in the AON layer.Here, K(t) Ն 0 is used to simulate the distributed delay feedback from cell ensemble E 1 (t).It is a unimodal and normalized function, which means that for a current time t Ն 0 and 0 Յ t Յ t, K(t) has only one maximum with K(0) ¼ K(t) ¼ 0 and t 0 K(t) dt ¼ 1.The LHS of Eq. ( 2) is the inside change of the ensemble due to the net input.
Let t ¹ T (s) 2 and t ¹ T (e) 2 be delay starting time and delay ending time, respectively, and take the kernel function K(t) as: i.e. a difference of two discrete step functions.The delay feedback term in Eq. (2) becomes The olfactory model is thus described by a set of differential-delay equations, which is the dynamics that Yao and Freeman studied in 1990.It is, however, biologically implausible, owing to distributed arrival times in multiaxonal tracts due to distributed conduction velocities and delays and it is also numerically error prone in digital simulations, due to coarse graining of distributed delays by the computational time step.By using a numerical solver for an ODE set, the actual integration time step is internally adjusted according to the slope at that moment.To have an accurate implementation on a numerical integration of Eq. ( 2) with K(t) ¼ K (step) (t), we should offer these data of E 1 (t) at each correct moment, but it is technically impossible.
In this paper, to model temporal distributed axonal delays we express the kernel function K (step) by a difference of two exponential functions: as shown in Fig. 3. Obviously, K (exp) (t) has the unique maximum at t ¼ t ¹ T m with 2 and K (exp) (0) Ϸ 0 (when t is large) and K (exp) (t) ¼ 0. In addition, because of: The second term becomes negligible when t is large.)K (exp) is thus a temporally smooth version of K (step) .Through Laplace transformation, we then see that Eq. ( 2) with Moreover, because the Fourier transformation of D 2 (t) is: with the delayed feedback node acts as a low-pass filter.Using the same procedure to model the other delayed feedback loops, the ''infinite dimensional'' differential-delay equations (Yao and Freeman, 1990) convert into finite dimensional ODEs.The dynamical model expressed in Section 2.1 and Appendix A is thus constructed.We call it the low-pass filter olfactory model.

An optimization technique for 1/f power spectra
We combine global and local search processes for parameter optimization.The global search has long jumps to avoid being trapped in a local minimum.The local search is a gradient descent to a local minimum.

Parameter optimization rule for local searching
By way of spatio-temporal error propagation, we obtain the parameter self-adjusting rule (Chang and Freeman, 1996).The criterion is to match the power spectrum of the model to the desired 1/f-type power spectrum (in the sense of minimizing a pre-defined error function within a frequency domain Q).
We decompose each second-order ODE to two first-order ODEs.Let l ʦ {0, 1} and x (0)  i (t) ¼ K x i (t) and reformulate Eq. (1) as: A time sequence x (l ) i (t) is then generated by Eq. (5) for t from t i Ն 0 to t f Ͼ t i .Its power spectrum is S (l )  i (q) (a Fourier transformation of x (l )  i (t)) for t from a state transition t st Ն t i to the next state transition t nst Յ t f .
To derive parameter optimization rules, in the first step we have to choose an objective function E to be minimized in terms of power spectra of actual outputs of the mathematical model S (l )  i (q)s and its targets T (l ) i (q)s.There are only two requirements for the choices: E Ն 0 and E ¼ 0 if S (l ) i (q) ¼ T (l ) i (q) for all l ʦ {0, 1}, q ʦ Q and i ʦ {1, 2,…, N}.Here, F(S (l )  i (q), T (l ) i (q)) is an error function, which can be from the least-square fit, relative entropy, etc. (Kullback, 1978;Chang and Freeman, 1996).We also introduce a weight function h (l )  i (q) for S (l ) i (q) achieving T (l ) i (q) at frequency q.For instance, we may have: to suppress the output time series of x i (t) from having too strong frequency components around q ¼ q (see also Section 4 for the h (l ) i (q) we used in the simulations).Thus, the objective function 1) with the form: In the second step, we derive dynamics of the network of perturbation variable l (l ) i (t) [Eq.( 8)] from the dynamics of the network of ensemble variable x (l )  i (t) [Eq.( 5)].We need to define and distinguish two kinds of partial differentiations with respect to x (l )  i (t).(1) The derivative which accounts for the corresponding change of E due to an arbitrarily small change of x (l )  i (t) at some particular time t with everything else fixed; that is, the conventional partial derivative: ‫ץ‬F(S (l ) i (q), T (l ) i (q)) ‫ץ‬S (l )  i (q) ϫ Re[X (l ) i (q) exp( ¹ iqt)] dq from Eq. ( 6).
(2) The other derivative which accounts for the corresponding change of E due to an arbitrarily small change of x (l )  i at time t, when this change propagates throughout the trajectories of other ensembles in the entire time interval that is denoted by l ), an ordered partial derivative (Werbos, 1974(Werbos, , 1988)).
The perturbation dynamics are then: with boundary conditions The complementary relationship of network structures between the two dynamics is expressed in Fig. 4.
We can next relate the perturbation of E with respect to parameters w ij and k i to l (l )  i (t) by the following Eq.( 9).Because of symmetric property of the network structure, assume that q i 1 j 1 , q i 2 j 2 , … and w i L (q) j L (q) all have the value of the ıth distinguishable connection strength qı .Similarly, Following this, the derivative which accounts for the corresponding change in E due to an arbitrarily small change of the parameter, p i over the time interval from t st to t nst , is: where After obtaining the gradients, we can apply a proper optimization algorithm to derive the adaptation rules for parameters.Here, a gradient descent approach allows us to adjust the values of weight parameters according to where the constant C governs a learning rate.In summary, for an interconnected neural ensemble model of the olfactory system described by Eq. ( 1), if the system parameters are updated cyclically at t ¼ t nst according to Eq. ( 10) with the gradients given in Eq. ( 9), then the connection Fig. 4. Reformulation of dynamics in Eq. ( 1) for ensemble activity and activity perturbation from second-order to pairs of first-order ODEs.The upper loops resulting from the reformulation do not imply self-excitation, which does not occur in cortical neuropil, the dense layered fabric of axons, dendrites and synapses.
strength parameters can be locally optimized to give the output of the model a 1/f-type power spectrum.
The model in Eq. ( 1) and its parameter optimization rule [Eqs.( 8)-( 10)] are generalizations of the multilayer perceptrons and the back-propagation rule.The detail is shown in Appendix B. This convergence supports the correctness of the derivations and provides alternative approach to derive the back propagation rule (Chang and Freeman, 1996).

Approach to global optimization of parameters
The parameters which have been determined in the early stage are fixed to reduce the complexity/dimension of the parameter searching space.From neurobiological suggestions, we set initial guesses and a global search domain for these parameters to be optimized.In the global searching, we generate a set of random numbers with certain distributions for parameters updating and test on the updating whether the updated parameters are in the search domain (Matyas, 1965;Solis and Wets, 1981).Next, we numerically integrate Eq. ( 5) and calculate the corresponding error by Eq. ( 6).When this error is sufficiently small for transition to local tuning, we use the local optimization rule presented in Section 3.1 for the next parameter adaptation.In the local tuning, we numerically integrate Eq. ( 5).We set the boundary for l (0)  i (t nst ) ¼ l (1) i (t nst ) ¼ 0 and integrate Eq. ( 8) from t nst back to t st .At the same time, we also integrate Eq. ( 9) to get the gradients for parameters and thus adapt these parameters according to Eq. ( 10).When a difference of the successive error or parameter vector is small enough, we revert to the global search.The whole process continues until a stop criterion (defined in Section 4.1) is met.
Unlike a traditional random optimization technique, in which the parameters are always updated randomly, we update the parameters randomly only if a local optimum of an error function is reached.We choose the random distribution as a Gaussian function.Through the experience of several simulation runs, we are able to locate the appropriate mean and standard deviation.However, it is, in general, heuristic to determine the distribution functions which can quickly direct converge to the basin of a global optimum.

Simulations
The technique developed in Section 3 were used to optimize the parameters of the olfactory model with four low-pass filters representing distributed feedback delays.The numerical integration was performed by Livermore solver for ODEs with time step Dt ¼ 1.0 ms in all cases.All the simulations were performed on Macintosh II computers.
We studied impulse responses with:

@
which was always used to initiate non zero output from the low-pass filter model.The initial values of x i (t) and x ˙i(t) were set zero.We used the fast Fourier transform algorithm (Ramirez, 1985) to calculate X (l ) i (q) ¼ F [x (l ) i (t)] for time t from t st to t nst .A least-square fit led us to choose: A graph of log(q) versus log(S (0) (q)) of an experimental EEG power spectrum can be fitted by a ''straight'' line for q( ¼ K f =2p) greater than a minimal value w min to the maximal frequency w max (see Fig. 5), with slope ¹ b i and y-intercept at x ¼ log(q c ) as log(a i ).We had the target pattern as T (0)  i (q) ¼ a i q ¹ b i .h (0) i (q) ¼ q 2b i was set to normalize across high frequency and h (1)  i (q) ¼ 0 because of no feature to optimize on x (1)  i (t).By the form of Eq. ( 6), the objective function for minimizing became: which gave the ordinary partial derivative For an mth-order differentiable variable x(t) with its Fourier 187, and w (G 1 D 1 ) ¼ 0:500, w (PD 2 ) ¼ 4:000, w (I 1 D 3 ) ¼ 0:500, w (G 1 D 4 ) ¼ 4:000, and T (s) 1 ¼ 20:000, T (e)  1 ¼ 10:000, T (s) 2 ¼ 26:000, T (e) 2 ¼ 15:000, T (s) 3 ¼ 25:000, T (e) 3 ¼ 12:000, T (s) 4 ¼ 39:000, T (e) 4 ¼ 24:000; (B) the corresponding log-log plot of power spectra from 867 to 2037 ms.transform existing: holds for any non-negative integer j Յ m.Thus, the above objective function with respect to S (0) i (q) can be transformed to that with respect to S (1)  i (q) and this result was more practical especially when x i (t) was a ''pure'' 1/f (i.e.b i ¼ 2, which was often the cases for our simulations).Therefore, the power spectrum of x (1)  i (t) was supposed to be a constant and: with the ordinary partial derivative as ‫ץ‬E ‫ץ‬x (1) i (t) ϫ exp( ¹ iqt)} dq Thus, the target T (1) i (q) ¼ a i and h (0) i (q) ¼ 0 and h (1)  i (q) ¼ 1.From the viewpoint of the Livermore solver on the olfactory model Eq. ( 5), both x (1)  i (t) and x (0) i (t) were independent variables to be solved.No further preparation was needed if using Eq. ( 12) and the numerical accuracy of x (1)  i (t) was just as good as x (0) i (t).

Numerical results of parameter optimization
We optimized q CB 1 , q B 1 C , q M 1 M 1 L , q G 1 G 1 L and connection strengths q ij between layers PG, OB, AON, PC and delay feedback nodes D 1 , D 2 , D 3 and D 4 .Appendix C gives the fixed parameters, which had been determined.A step-bystep algorithm is listed in Appendix D.
In Fig. 6, the parameters are optimized for a fourchannel low-pass filter model, yielding a 1/f-type power spectrum from the output time series of OB node (the granule cells G 2 of the first channel in Fig. 1).To express scaling invariance of the model, we extended the simulation to a 16-channel case and kept the parameter values unchanged.Fig. 7 shows the output of the first G 2 node having a 1/f-type power spectrum.(Although the initialization period in the 16-channel model simulation ended around t ¼ 650 ms, we still calculated power spectra from t ¼ 867 ms for comparison with the four-channel case.)Both the examples had Eq. ( 5) numerically integrated from 0 to 2037 ms.Because 1/f power spectrum is only a necessary condition to simulate an EEG wave, after finding possible candidates by way of the parameter optimization algorithm, we visually checked if the time series was experimentally realistic.
Next, this four-channel model was given a constant stimulus from 1367 ms to 1537 ms to channel 1 (on node i ¼ 1 and i ¼ n þ 1).Fig. 8 shows that the first 867 ms is a warm-up period and the following 500 ms gives a basal state.The model trajectory then jumps to a near ''periodic'' state (a burst-like state) in the next couple hundred milliseconds due to presenting a 170 ms constant stimulus input at R with intensity 0.68 and goes back to the same basal state, shown by the another 500 ms, after the stimulus is ended.Again, the simulation was extended to the 16channel model (Fig. 9).An important property of the lowpass filter olfactory model is revealed by these examples: the system returns to the previously defined attractor after termination of the stimulus.Moreover, the 1/f state is stable in the sense that if the total perturbation amount (external input) to the model is not more than 0.68 (input intensity) ϫ 170 s (input duration), the 1/f state can persist.Because of its high nonlinearity, the low-pass filter model is destabilized when either the stimulus duration or the stimulus intensity becomes too long or too strong.The input stage of the olfactory system has a neural mechanism for input range compression by a logarithm transformation (Freeman, 1975), which, if over-ridden or by-passed, can lead to onset of seizures.Fig. 10 shows the one-channel model with optimized parameters to simulated the olfactory seizure activity.one-channel model has two connected P nodes (not a single P node) to form the layer of periglomerular cells (Freeman, 1987).When we compare the 1/f state shown in Fig. 11(A) with a non-1/f state shown in Fig. 12(A), these simulation results also suggest that a 1/f state is less influenced by a tiny perturbation (e.g. the numerical truncation/round-off).Thus, we propose that maintenance of the 1/f spectrum is a good criterion to ''stabilize'' a biological neural system and that corresponds to having high correlation of output time series in numerical simulations of the low-pass filter olfactory model [see Fig. 11(A)].Here, ''stabilization'' for the model means that sufficiently small perturbations to parameters, variables, or even the model's structure will still permit the trajectory to return to the same attractor.
We cannot expect to reproduce the identical results by directly using Livermore package with the truncated data given in Figs.6-12.When the low-pass filter model is operated in a domain of aperiodic oscillations, it resembles a chaotic system, in that tiny perturbations of state variables or parameters may lead to an exponentially diverging output.The exact parameter values used by the machines internally were binary representation with double precision (64 bits) and rounded of to give the decimal values shown.However, in any programming language with any proper ODE integrator and on any computer, given the initial guesses of the parameters and the starting condition, we can always optimize the values of parameters in respect to a specified criterion of output.

Shadowing of the chaotic solution
A chaotic system has the property that a relatively small numerical noise tends to grow exponentially fast.To integrate ODEs on a current digital computer, a continuous time function must be converted into a discrete time map.Following an iterated process with a proper time step, we get a numerical solution.We cannot accept this is ''a true solution'' of the original chaotic system.We agree with Dawson et al. (1993) who stated: ''We believe that in systems with high-dimensional chaos, trajectories with intrinsic noise, such as computer-generated pseudotrajectories, can be shadowed only for short times.''Thus, the extended question we examine is the fuzzy word ''short'': Can we control the shadowing time?Because the normalizing-rounding process in the floating point arithmetic is a deterministic process occurring at the least significant bit level, the shadowing lemma to the digital integration for a chaotic solution is not applicable.Fryska and Zohdy, 1992 injected controlled amounts of uniformly distributed random noise to a three-dimensional chaotic system (a piecewise linear system) during digital integration and thereby closely approximated the statistics of the invariant chaotic attractor.Due to simplicity of their system, an exact analytical solution was obtainable, which was used as a reference for comparing with numerical solutions from different precisions.
We have no way to analytically solve the high dimensional neural model.To check the shadowing property and the numerical sensitivity of the model, here, we gave two mathematically equivalent formulas for the low-pass filter model, say Implementation 1 and Implementation 2. Consider the ODE for the M 1 node of OB: 2 ¼ 14:250, T (s) 3 ¼ 24:870, T (e) 3 ¼ 12:330, T (s) 4 ¼ 37:960, T (e) 4 ¼ 24:160.

459
H.-J. Chang et al. / Neural Networks 11 (1998) 449-466 On Implementation 1, the second term of the RHS was coded by: On Implementation 2, it was coded by: All the other parts on coding the ODEs were identical.
Because of truncation/round-off on terminal bits, the two formulas are not identically represented by machine code.
The difference can propagate out spatio-temporally and finally cause two divergent numerical solutions.By injecting controlled amounts of random noise to the model, we used: for each single equation of Eq. ( 5) to randomize terminal bits.The numerical simulations extended the shadowing time.For the four-channel model with the identical parameter values used in Section 4.1, we still tested the G 2 node in the first channel.Fig. 11(A) shows an obvious divergence of both implementations from t ¼ 1527 ms; Fig. 11(B) shows both implementations start to diverge obviously around t ¼ 1978 ms.If we then set q (E 1 M 1 ) ¼ 3:300, q (B 1 C) ¼ 1:700 instead, the resultant trajectory becomes a ''less chaotic'' pattern: an obvious divergence from t ¼ 1320 ms in Fig. 12(A), but from t ¼ 2127 ms in Fig. 12(B).Thus, a random noise injection can ''stabilize'' the model's trajectories (in the sense that the two numerical solutions from mathematically equivalent formulas, but not numerically identical implementations stay close longer).
The key question is how to control the random noise (e.g.how to set the values of mean and standard deviation for the Gaussian random variable) to reduce the divergence rate.It is, of course, parameter dependent because the model trajectory depends on a given set of parameters.When the parameter optimization algorithm is applied the values of parameters are continually updated.To ''entirely stabilize'' the model with parameter adaptation is still an open problem.If the shadowing time period can be reasonably long due to such a ''stabilization'' process, we have a relatively high confidence in the numerical results of parameter optimization from a digital computer.The expected maximal duration of a stable state between transitions is around 200 ms in the biological neural system (Barrie et al., 1996).The desired safety factor for stability is 10 ϫ 200 ms.For competing with experimental EEGs, ϳ2000 ms is a fairly good shadowing length for numerical simulation.
Because of such super sensitivity to numerical truncation/ round-off, before the shadowing problem is ''solved'' we are obliged to regard it as part of the olfactory model.The current technology of numerical methods appears to be the only executable approach, which can give us a closest pseudo-solution to the true solution of such a high dimension, nonlinear ODE set.

Conclusion
In Chang and Freeman (1996), we optimized parameters according to an objective function measured in time domain (i.e. the output pattern is examined in time domain).The main advantage of this optimization technique is that it can give us derivatives of an objective function with respect to system variables under constraints, when we have the objective function formulated in terms of these variables.Following this, we can always derive the corresponding parameter adaptation rules.Here, in a frequency domain, the objective function which relates variables to the feature of a 1/f power spectrum is formulated.Thus, the self-adaptation rule for parameter fine tuning is derived.To avoid being absorbed into a local minimum, we apply the random optimization technique for global parameter searching.How to choose random distribution functions to quickly locate a global minimum is heuristic.By using this parameter optimization algorithm, optimized values of parameters for the low-pass filter olfactory model become self-adaptively achievable to generate 1/f power spectra and match experimental data.
Earlier models of olfactory dynamics (Freeman, 1987;Yao and Freeman, 1990) used a set of differential-delay equations to model axonal propagation delays between layers PG, OB, AON and PC.''Differential'' accounts for a discrete set of distributed delays in an array of stored values at fixed time steps.''Delay'' accounts for the timedelay feedback and the dynamical model made mathematical analysis difficult by introducing unavoidable numerical error (inaccuracy) in using existing ODE solvers as described in Section 2.2.This is a significant problem for a chaotic system because we cannot tell whether the output chaotic patterns generated by computer simulations are from the dynamics or from the numerical error.In this paper, we simulate the time-delay feedback by using another second-order ODE.Biologically, this low-pass filter model is plausible because the onset and offset of delayed feedback by axonal tracts is smoothed by temporal dispersion.Mathematically, the entire model becomes a set of ODEs, which eliminates the numerical inaccuracy of differential-delay equations and simplifies the derivation of parameter optimization rule.However, on embodiment of the model in software, the ODEs are transformed into difference equations, in which the size of the digital time step and the word size discretize the model's trajectories.This makes no difference for point and limit cycle attractors, but it is non-negligible for aperiodic, presumably chaotic, attractor solutions.The duration and stationarity decrease with the size of the model in the OB from four to 16 to 64 oscillators, indicating that the instability is due to attractor crowding (Wiesenfeld, 1989;Tsang and Wiesenfeld, 1990).Without noise, this is unavoidable in a model of realistic size operating with aperiodic attractors.Because a biological system is stable and robust, we would like its model to be robust in qualitative dynamics to perturbations in variables and parameters.Except for the injection of temporal noise as discussed in Section 4.2, the other neurobiologically guided method in the future to stabilize the model may be either to explore the introduction into the model of controlled random variation of gain and delay parameters, which simulates the known variation in the properties of neurons in the olfactory system as multiplicative noise, or to explore the utility of low-level random variation of state variables in the model in the form of additive noise, which simulates the presence of axonal noise in the synaptic inputs to neural ensembles.After adding this spatial noise to the model, we will re-optimize its parameters.with vectors and matrices as The [w ij ] is basically a block diagonal matrix.Each block starts for the layers of the n-channel OB, AON and PC, respectively.The last row, representing the delay feedback nodes, together with other sparse elements, form the connections between layers.When we want to extend the system, all we need to do is to add another block for a new layer and sparse elements for connecting it bidirectionally with the old layers.All results based on Eq. ( 1) are applicable.Each weight represents the strength (gain) of synaptic action of one group of neurons on another.Experimental testing has shown that the gains are kept within a small signal linear range during normal function (Freeman, 1975).They are time-invariant during recording periods of several seconds and can changes to new values abruptly during learning or pharmacological manipulation.The value can be a positive, negative or zero number to represent an excitatory, inhibitory or no connection from ensemble j to i j, respectively.Here, w (P m P u L) denotes the connection strength linking Pm, with P u m and w (P m D 2 ) is the one from D 2 to P m , etc. Physically, ''from D 2 to P m '' means from E 1 (t) to P m with time delay implemented by passing the output of E 1 through D 2 node.Because our approach is to break the olfactory system into small modules, for simplification it is reasonable to allow: and w for all 1 Յ m, u Յ n in the PG and OB layers; and w in the PC layer.
(7) The output is operated on by: y i (t) ¼ Q(x i (t), q (LR) ) ¼ K q (LR) 1 ¹ exp ¹ 1 q (LR) [exp(x i (t)) ¹ 1] & ' , i ʦ LR: x i (t), i ʦ DL: The static nonlinearity of the cell ensemble results from the function Q(x i ,q (LR) ).The q (LR) is a constant for all the x i s belonging to one layer of cell ensembles.For later use in Section 3.1, we also calculate the derivation of the sigmoid to give the nonlinear gain: To gain an insight into this general mathematical model and the corresponding parameter optimization rule, we investigate the relationship to an artificial neural model and the traditional back propagation learning rule.
A frequently used discrete-time version of the standard artificial neural model follows from Eq. (1).
1. Let y i (t) ¼ Q(x i (t)).(Remove delay feedback nodes and assume all the other nodes have identical I/0 transformation.)2. Let Dt be a small time duration and neglect O (Dt 3 ).3. Let a þ b ¼ 2=Dt and ab ¼ 2=Dt 2 Following this, take the Taylor series expansion of x i (t þ Dt) at time t.Eq. (1) reduces to: This is a discrete-time artificial neural network (Rumelhart et al., 1986).Next, we may relate the parameter optimization rule of the model of Eq. (1) to the weight learning rule of the artificial neural network.To have a close comparison, consider a multi-layer feed-forward network with error measure according to: with U (0) i (t) a teaching signal.When the same three conditions listed above are satisfied, we have: from the second half of Eq. ( 8).Using Eq. (B.2), the first half of Eq. ( 8) reduces to: and Eq.(10) (for p ¼ w) combined with Eq. ( 9) becomes Dw ij ¼ ¹ CQ(x j (t))l ( 0) i (t), i.e. the traditional back-propagation learning rule (Rumelhart et al., 1986).The term p (l )  i (t) dt of Eq. ( 8) is from the conventional partial derivative ‫ץ‬E=‫ץ‬x (l )  i (t) with E being an integration, e.g.Eq. ( 6).Since E has been taken as Eq.(B.1), p (0)  i (t) dt and p (1) i (t) dt should be replaced by ‫ץ‬Q(x i (t))=‫ץ‬x i (t)[Q(x i (t)) ¹ U (0) i (t)] and 0, respectively.
7. Assign Y 2 ¼ Y 1 .8. Generate the random numbers g and check if p (c) þ g (c) eD? • Yes → continue; • No → go to (8). 9. Update p (c) ¼ p (c) þ g and increase c ¼ c þ 1. Run an integrator with the given p (c) to solve Eq. ( 1), calculate E according to Eq. ( 11) and assign Y 1 ¼ Y(p (c)  There are mme variations in random parameters updating for steps (8) and ( 9) (Solis and Wets, 1981).We have to assure the convergence to the global minimum of the objective function with probability 1 and then compare the convergence rates from modified methods with each other.

Fig. 1 .
Fig. 1.This topological diagram specifies the connections among neural ensembles of the olfactory system.The olfactory bulb (OB) and prepyriform cortex (PC) are distributed layers of coupled oscillators.The anterior olfactory nucleus (AON) is a control nucleus relaying centrifugal modulatory commands.These three oscillators have incommensurate characteristic frequencies.Only the OB is at present modeled as an array.Each node stands for a second-order ODE for the single ensemble dynamics.An input from receptors (R) goes to periglomerular (P) and mitral cells (M).The mitral cells transmit to granule cells (G) and to AON and PC, from which the final output is sent to the external capsule (EC) from deep pyramidal cells (C), as well as back to the OB and AON.

Fig. 5 .
Fig. 5. (A) An EEG from OB of a non-motivated rabbit; (B) the corresponding log-log plot of spectrum of the OB.

Fig. 7 .
Fig. 7. (A) The example of time series from G 2 of the 16-channel low-pass filter model; (B) The corresponding log-log plot of power spectra from 867 to 2037 ms.

Fig. 8 .
Fig. 8. (A) The example of time series from G 2 of the four-channel low-pass filter model with constant stimulus from 1367 to 1537 ms (simulated burst of inputinduced state transition); (B) the corresponding log-log plot of power spectra from 867 to 1367 ms; (C) the corresponding log-log plot of power spectra from 1367 to 1537 ms; (D) The corresponding log-log plot of power spectra from 1537 to 2037 ms.

Fig. 9 .
Fig. 9. (A) The example of time series from G 2 of the 16-channel low-pass filter model with constant stimulus from 1367 to 1537 ms (simulated burst of inputinduced state transition); (B) the corresponding log-log plot of power spectra from 867 to 1367 ms; (C) the corresponding log-log plot of power spectra from 1367 to 1537 ms; (D) the corresponding log-log plot of power spectra from 1537 to 2037 ms.