1 Scale-Free Cortical Planar Networks

Summary. Modeling brain dynamics requires us to deﬁne the behavioral context in which brains interact with the world; to choose appropriate mathematics, here ordinary differential equations (ODE) and random graph theory (RGT); to choose the levels of description and scales in the hierarchy of neurodynamics; to deﬁne an appropriate module for each level; and to address questions of boundary conditions, linearity, time-variance, autonomy, and criticality. ODE applied to the olfactory system serves to model perception by a phase transition that reorganizes the background activity. Feedback control theory is used to model the dynamics of self-organized criticality and simulate the background activity and its reorganization, by which microscopic input triggers the construction of an order parameter that retrieves a mesoscopic spatiotemporal pattern expressing the class of input. Perception is shown to depend on the coincidence of three conditions: intentional prediction of a sensory input by an attractor landscape; emergence of a null spike in the background activity; and the presence in the sensory input of the expected stimulus.


The context of quantitative brain dynamics from psychology, biology and philosophy
Animals and humans use their finite brains to comprehend and adapt to infinitely complex environments. The dynamics employed by brains conforms to the scientific method of hypothesis formation and experimental testing. As mathematicians we seek to devise formal systems by which to describe the neural processes of construction and evaluation of hypotheses in brain dynamics by reformulating the existing hypotheses that have been developed by psychologists, neurobiologists, and philosophers to describe the dynamics underlying behavior.
Experimental psychologists describe the process in terms of trial-and-error behaviors guided by reinforcement [41] and by temporal difference learning [106]. Trial behaviors constitute hypotheses that, if successful, are rewarded by release of chemicals such as dopamine and CCK into the brain. The power of chemical reinforcement is readily apparent in the worldwide epidemic of cocaine usage, which triggers the reward system. Intermittent reinforcement establishes persistent behaviors that can be extremely resistant to extinction, when the rules change and reward is no longer forthcoming. Scientists and rats that have been conditioned to expect failure may persevere indefinitely in unsuccessful behaviors. This criterion shows the greater dependence of choice of actions on pre-existing brain states of expectancy than on environmental conditions. In this review we use random graph theory (RGT) to describe the formation and exercise of such states of focused attention in brain networks.
Neurobiologists describe the process in terms of the action-perception cycle [48]. Brains construct predictions of future desired goal states. These expectancies emerge in the form of spatiotemporal patterns of brain activity having two aspects. One aspect is motor: the formation of strategic plans of action in the motor systems [58], [63], in which are embedded a variety of directions for tactical moves that may be imposed by environmental and bodily constraints and contingencies. The other aspect is preafference [70]: the evocation in each of the sensory cortices of attractor landscapes, which embody the predictions of the several forms that sensory input is expected to take as the result of successful action. For example, an animal searching for food can expect to find one of two or more possible types of food, or none, or a novel odor that must be explored, or an odor of predator, each with its attractor in the landscape, only one of which will be selected by the stimulus. Brains then adapt to the results of the action-based test by modification of their synaptic networks through learning by association and habituation. RGT is exceptionally well adapted to modeling these changes in synaptic connectivity.
Philosophers describe this process as the exercise of intentionality [81]. Goal-directed action is preceded by intent to act that thrusts the body forth into the world. The senses report the consequences to the brain, which, from the distortion of the self resulting from action, reorganizes and reshapes itself and the body so as to accommodate to the world, thereby coming to know the world by assimilation. This view is predicated on two forms of unity. The local unity is the integrity of the individual, literally undividable, forming a closed system with respect to meaning but open to flows of energy and information. The universal unity is the essential embedding of every individual in the material, social and spiritual worlds by the process of embodied cognition through brain dynamics. A valid mathematical description of the dynamics must conform to the outlines of the cycle of prediction, action, perception and assimilation.
In the past half century science has opened several new windows for direct experimental observation of normal brain function after 3000 years of logic and speculation. Our preferred window of observation is on the electric fields that are generated by brains, because our techniques enable us to record and measure neural activity patterns from brains of animals and humans that are actively engaged in intentional behaviors. The behaviors and the electric activity patterns range in size from single neurons to an entire cerebral hemisphere, and in duration from fractions of a millisecond to minutes, days and even years. Indirect measures of neural activity by measuring blood flow in order to estimate metabolic energy dissipation provide supportive data. Brains are profligate in energy dissipation as heat at rates an order of magnitude greater than other organs of comparable mass. Additional insights come from measurements of magnetic and chemical energy gradients. The space-time patterns of these observable signs of neural activity are well suited to precise measurement. Until now our preferred tool for modeling the dynamics underlying the measured activity has been ordinary differential equations (ODE) in space-time, because the neurodynamics of large-scale populations, as manifested by the electric currents of axonal action potentials and dendritic synaptic potentials, can be modeled in a mesoscopic continuum. Now we propose to adapt another tool, neuropercolation, which has been opened to brain science by recent developments in RGT.

The mathematical foundations shared by ODE and RGT
Brain function is nonlinear, nonstationary, non-Gaussian and drenched in noise. It is therefore counterintuitive to find that the dynamics of most areas of cortex maintain themselves in smallsignal domains that are linear, stationary and Gaussian, as shown by testing for superposition with impulse inputs. These frames last on the order of 0.1 s and occupy domains at the scale of cm2. There are temporal nonlinear discontinuities between these frames, which require very few ms for spatial dissemination, so the relatively long time spans (∼ 1s) that are required for application of the standard tools of linear analysis (Fourier transform, Principal Component Analysis, Independent Component Analysis, regression models, wavelets, etc.) provide space-time averages over the data that yield reproducible decompositions, such as the clinical frequency bands (theta, alpha, beta, gamma, etc.) and that relegate the brief nonlinearities to the residuals. Neural activity can readily be elicited in response to perturbation by sensory or direct magnetic and electrical stimulation of the brain. If that evoked activity remains well within the amplitude and frequency ranges of the background brain activity, it conforms to superposition and can be measured by curve-fitting with sums of linear basis functions that are the solutions to linear ODE. The preponderance of clinical and experimental descriptions of large-scale neural activity are expressed in terms of frequencies on the assumption of linearity and stationarity, and the fact that these methods work as well as they have for over half a century is testimony to the power of Fourier and Laplacian methods, as well as to the basic simplicity of cortical dynamics, if it is seen from an optimal point of view.
The coefficients of the fitted basis functions evaluate the poles and zeroes (eigenvalues) of the optimized ODE. The ODE are structured to represent the topology of the neural networks, in which the time constants are fixed, and the synaptic interaction strengths are represented by feedback gains. Sets of solutions give root loci in the complex plane along the phase curves at either 0 rad for positive feedback loops or π rad for negative feedback loops. The feedback gains are given by the intersections of the amplitude contours with the root loci. Solution sets and root loci have been derived for changes in cortical function with variations in stimulus intensity, location, and timing [44], from which the changes in interaction strengths are inferred. Changes in functional connectivity during learning have been modeled by using the driving impulse as a conditioned stimulus. Likewise the root locus method has been used to model the neural connectivity changes that underlie changes in the levels of arousal and motivation; type and level of anesthesia, and the pharmacological effects of common neurotransmitters, neuromodulators, and their agonists and antagonists. The ODE that have been evaluated by fitting sums of linear basis functions to cortical impulse responses have enabled the construction of a hierarchical set of dynamical networks entitled Katchalsky sets (K-sets) [51]. These models provide the formal structure, from which we undertake representation of large-scale neurodynamics, first in the ancient, primitive, six-layered allocortex common to all vertebrates ('reptilian') and then in the six-layered neocortex found only in mammals. In this report we summarize the dynamics as seen by use of ODE, so that RGT can replicate the dynamics as currently known and then proceed into the unknown.

The applicability of ODE and RGT depends on the following properties of cortex
• Hierarchy: Brain activity exists simultaneously at all levels of a hierarchy from atomic and molecular to psychosocial, whereas observation and measurement is restricted by the experimental tools to bounded ranges in space and time. In conformance to our main sources of data we restrict our concerns to three levels imposed by the tools of observation: microscopic measurements in microns and ms of synaptic potentials and action potentials from microelectrodes; mesoscopic measurements in mm and tens of ms from electrode arrays of electrocorticograms (ECoG); and macroscopic measurements in cm and s from scalp electroencephalograms (EEG), magnetoencephalograms (MEG), and functional magnetic resonance imaging (fMRI) reflecting levels of cerebral metabolism. • Scaling: Most descriptions of cortical connectivity emphasize microscopic, local nets.
Recent studies on neocortical connectivity have emphasized conformance of large-scale connectivity to power-law distributions [115], [74], [28], [57], which opens the likelihood of self similarity of connectivity across scales, and which imposes the necessity of choosing a scale of description that matches the scale of experimental observation. In spatial scale we focus on the mesoscopic population of neurons, because our reliance on electric fields enables us to interrelate our observations of ECoG downwardly to the microscopic activities of neurons that form the population, and upwardly to the macroscopic activity patterns that are formed by the mesoscopic ensembles. The most interesting and important brain dynamics concerns the relations across levels in the hierarchy: millions of microscopic neurons (action potentials) create mesoscopic order parameters (ECoG) that regulate the neurons. Likewise in temporal scale we focus on mesoscopic states having durations on the order of 0.1-1.0 s. Microscopic events such as action potentials lasting 0.5-2.0 ms are described in statistical averages such as probability density distributions.
The changes in brain dynamics that result from learning occur on still slower time scales, which we treat as state changes that we describe as a temporal process over hours and days. The energy used by brains for electrodynamic events is drawn from an immense store of electrochemical energy in the ionic gradients across membranes. Energy deficits incurred during events of activity are small fractions of the total available; restoration is by slow metabolic processes that are measured in s, which is the scale of fMRI and related hemodynamic techniques. This separation of time scales permits use of ODE as Hamiltonians applied to a conservative system, and separately to deal with the dissipation of free energy in brain dynamics [111], [53]. • Dimensionality: The large-scale dynamics of cortex is optimally described in planar networks of ODE and random graphs, owing largely to its laminar organization. • Modularity: Description of brain dynamics at every level is based on choosing a module to represent a node and then constructing a network of modules. In microscopic neural networks the module is a model neuron having a summing junction and a nonlinear threshold device. In mesoscopic neural networks the module is the KO set, which represents an ensemble of microscopic neurons. Its properties are averages of well-known time and length constants that parameterize the processes of spatiotemporal integration by dendrites and the transmission of output by action potentials of dendrites. Nonlinear interactions among the neurons in a KO ensemble support the emergence of mesoscopic order parameters that are observable in sums of axonal and dendritic fields of potentials, which collectively are called brain waves: axonal multiple unit activity (MUA) and the electrocorticogram (ECoG) and electroencephalogram (EEG) of dendrites. The optimal module for RGT is the KO set, because our focus is on large-scale neural activity. • Boundary conditions: Brains are anatomically bounded structures, but efforts to identify absorptive or reflective boundaries at the edges of anatomically defined collections of neurons [89] have not succeeded. Use of partial differential equations has been limited to describing the dynamics of core conductors in axons and dendrites. ODEs are extended into the mesoscopic spatial domain by use of integro-difference equations on digital computers to model distributed arrays described by K-sets with periodic boundary conditions. This is the main point of entry for RGT. • Linearity and time-invariance: An effective tool for testing superposition in mesoscopic populations is single-shock and paired-shock electrical stimulation of axonal afferent pathways, which yields cortical impulse responses. The technique of paired shock testing for superposition reveals a small-signal, near-linear range of dynamics for input domains that give impulse responses not exceeding the amplitude of the on-going, self-stabilized background activity. Both additivity and proportionality hold for time ensemble averages of sums of superimposed evoked potentials, within limits that are imposed by the refractory periods of the stimulated axons. However, linearity is temporally bounded into brief epochs of brain activity that are interrupted by the discontinuities recurring several times each second. Repeated samples of impulse responses reveal time-variance in the wave forms, showing that state changes occur at the discontinuities. Linearity is progressively violated as the intensity of input pulses is increased and the ensembles are driven outside the small signal range. Time variance and amplitude-dependent nonlinearity are made compatible with description with linear ODE by the introduction of state-and amplitudedependent gain coefficients in the ODE and by display of the changes in the characteristic frequencies with root locus techniques [44]. • Stationarity: The ongoing background spontaneous ECoG and EEG from populations reveal endogenous discontinuities, because the synchronized oscillatory signals from multiple electrodes in arrays abruptly disorganize and then resynchronize quickly at a different frequency that varies little until the next discontinuity. By this criterion the populations can be treated on average as stationary over extended time periods and as having a single mean frequency with a decay envelope that represents a distribution of frequencies.
Though in fact the populations undergo repeated state transitions and exhibit stationarity only briefly, the ODE and RGT suffice to represent the dynamics averaged over the duration and spatial domain of averaging required for observation and measurement. • Autonomy: The brain is deeply engaged with body and the environment at all times, not as a passive recipient of energy and information but as a proactive, predictive agent. All normal brain areas exhibit background spontaneous activity, which is the source of exploratory drive. It can be completely suppressed under deep anesthesia, but it returns robustly with recovery of normal brain function. This autonomous background activity is notable for its repetitive discontinuities in multiple frequency ranges, which recur independently of the engagement of the subject by intentional behavior. The temporal discontinuities appear as essential steps in the action-perception cycle by which cognition proceeds, including the emergence of a goal state, the successive construction and execution of plans for actions of observation, the perception of stimuli upon generalization and abstraction, and assimilation by learning. We propose that the spontaneous discontinuities provide the opportunity for phase transitions, by which cognition proceeds frame by frame. The ODE suffice to represent the compartmentalized dynamics in functional domains preceding and following a phase transition; we now seek to describe with RGT the transition that links the preceding and following frames. • Criticality: Piece-wise linear analysis using root locus techniques with ODE have brought to light the existence of a nonzero point attractor in cortical dynamics, which is manifested as a zero eigenvalue (a pole at the origin of the complex plane), and at least one limit cycle attractor (a pair of complex conjugate poles on the imaginary axis of the complex plane). The point attractor is absolutely stable and provides the background excitatory bias that all brains require for maintenance of their intentionality. The limit cycle attractor is conditionally stable. Modeling with K-sets and ODE indicates that during a discontinuity in temporal dynamics the neural population may closely approach the complex pole pair [ [49], [53], at which the cortical dynamical system becomes undefined and goes critical. At or near that point all frequencies and wave lengths coexist. This is the gateway to phase transitions by which order parameters impose synchronized oscillations [95] at high frequencies on very large neural populations, in humans covering large fractions and at times the whole of each hemisphere [47], [13], [100], [107]. The critical state is self-stabilized and therefore self-organized. As the main route to the adaptiveness and conditional stability of cognitive functions of brains, criticality is the main target for description using RGT.

Allocortex: The simplest model for the phase transition in criticality 1.2.1 Microscopic sensation and mesoscopic perception
Olfaction is the least complex of the sensory systems, phylogenetically the oldest, the algorithmic predecessor of all others, and for most nonhuman animals by far the most important cognitive organ. In the simplest description the olfactory system consists of the receptor layer in the nose treated as a KO set, the olfactory bulb, and the olfactory cortex, both having interactive excitatory (KIe set) and inhibitory (KIi set) populations in Fig.1.1A that in negative feedback comprise KII sets. These structures along with the hippocampal formation are examples of three-layered cortex, referring to the outer layer of input axons and dendrites, the middle layer of cell bodies, and the inner layer of output axons and interneurons. The receptor cells numbering 10 8 send topographically ordered axons to the olfactory bulb with a convergence ratio of roughly 2000:1. The ∼ 10 3 types of receptor indicate 10 5 of each type, with convergence to 5 × 10 5 excitatory bulbar neurons that are coupled in negative feedback with a roughly equal number of clusters of inhibitory interneurons (granule cells grouped by gap junctions). The mitral cells send axons to the cortex not by topographic mapping but by a divergent-convergent pathway. Each bulbar transmitting neuron excites a broad distribution of cortical neurons, and each cortical neuron receives from many bulbar mitral cells. This transmission pathway performs a spatial integral transformation of bulbar output that selectively amplifies that component of mesoscopic activity that has everywhere at each instant a common frequency of oscillation. That condition is met only by the endogenously generated activity of the bulb and not by the activity that is driven by the receptor input. The benefit from the large number of receptors is the capture of faint odorants at exceedingly low concentrations, such that only a very small fraction, perhaps 10 2 among the 10 5 available receptor cells is activated on any one sniff. Owing to turbulence in the nose, the selection and number differs on every sniff. The problem of generalization from the sample on each sniff to the class of the odorant is solved in the bulb by Hebbian learning (Fig.1.1B). A reward is given on each trial in which the expected odorant is present, but not on the trials without the odorant. Chemical neuromodulators such as dopamine and chole-cystokinin (CCK) increase the strength of connection between just those pairs of neurons that are coexcited on reinforced trials. On unreinforced trials the connection strengths among other pairs of coexcited neurons are automatically reduced by habituation. Over multiple sniffs a Hebbian nerve cell assembly forms by strengthening of the excitatory synapses between pairs of excitatory neurons, represented in the ODE by a gain coefficient kee, and weakening of other excitatory connections represented by kei. The output connection strengths of the inhibitory neuron populations, represented by kie and kii, are unchanged. The effect of the increase in kee is modelled in Fig.1.2B. An increase in kee of 20% increases the output 50-fold; a decrease in kee of -25% decreases output 1000-fold. The entire assembly is vigorously excited into oscillation no matter which neurons in the Hebbian assembly are excited by sensitive receptors, or how many are excited; this demonstrates generalization to the relevant class of stimulus.
However, the activation of a Hebbian nerve cell assembly constitutes ∼ 0.1% of the bulbar output neurons. The further step that is required is a phase transition by which the entire olfactory bulb enters into a spatial pattern of oscillation that is governed by an attractor. Figure  1.2A shows examples of oscillatory wave forms in the gamma range that appear on every channel but are spatially modulated in amplitude, called AM patterns. These AM patterns manifest an attractor that corresponds to the class to which the conditioned stimulus (CS) belongs. In each olfactory system a landscape of attractors embodies the repertoire of odorants that the subject is capable of classifying at each stage of its lifespan. Each attractor is based in the synaptic matrix of the bulb in conjunction with other parts of the brain, which constitutes the memory of the subject for that odorant and its significance for intentional behavior. The lack of invariance of the AM patterns for control and odor stimulation shows that they are not representations of CS formed by filtering; they are creations through nonlinear dynamics.

The origin of the background activity in mutual excitation: positive feedback
Three types of KO ensemble are active in brains: excitatory, inhibitory, and modulatory. The dynamics of each module is described by a transfer function that consists of a component for linear pulse to wave density conversion with gain k a , a distance-dependent part, H(x, y), and a time-invariant, amplitude-dependent nonlinearity, G(v). The necessary and sufficient minimal representation of k a is by measuring the impulse response of a KO population in which interactions have been blocked by deep anesthesia. The response to an excitatory impulse is the average synaptic potential with a rapid depolarization and exponential return to the baseline ( Fig.1.3). Paired shock testing reveals proportionality and additivity. The minimal fitted curve is the sum of two exponential terms, for which the equation is a linear 2nd order ODE.
A 1st order ODE does not suffice, because the single exponential term only captures the passive membrane capacitive delay and fails to incorporate the synaptic and dendritic cable delays, which are crucial in the population dynamics. Here a and b are biologically determined time constants. p i (t) denotes the pulse density at the i-th node as a function of time; i ∈ N, where N is the index set of all nodes. F i (t) includes the effects of the neighboring populations influencing node i.
H(x, y) is represented by linear matrix algebra. Most calculations to date have been with a fully connected squared array of nodes with periodic boundary conditions. The employed models incorporate conduction velocities for distance-dependent delays. The source term in Eq.(1.1) can be expressed as , where the summation runs over the index set N of all populations.

Fig. 1.3.
The open loop impulse response (averaged evoked potential under deep anesthesia) of the olfactory system is fitted with the sum of four exponential terms, two approximating the synaptic and dendritic cable delays, the passive membrane decay, and the slow rate (∼ 1/s) of the metabolic recovery process.

G(v)
is determined by calculating the action potential probability conditional on ECoG amplitude. The fitted curve, which was derived for the population by a generalization from the Hodgkin-Huxley equations for single neurons [45], is a sigmoidal function that expresses the axonal pulse density output of an ensemble as a function of dendritic wave density: In Eq. 1.2, q is the parameter specifying the slope and maximal asymptote of the curve. This sigmoid function is modeled from experiment on biological neural activation [44].
The nonlinear gain can be replaced by a gain constant equal to the tangent to the sigmoidal curve in piece-wise linear approximation of the dynamics in the vicinity of a steady-state operating point. The linking of KO modules into distributed networks of K-sets yields descriptions of brain dynamics at the macroscopic level, which describe the neural mechanisms of intentional behavior, including assimilation by learning [51], [66].

Source of the repeated discontinuities facilitating phase transitions: Rayleigh noise
A neural mechanism for the variability of the decay rate of the impulse response and the aperiodic approach to criticality has been revealed by detailed analysis of the background ECoG [11]. The characteristic form of the power spectral density (PSD) of subjects at rest is power-law in log-log coordinates with a slope close to -2 (brown noise) and decreasing in sleep near -3 (black noise) [101]. In animals intentionally engaged in perceptual tasks the PSD deviate from 1/f with peaks reflecting excess power in the gamma and theta ranges [33], [35], [39]. These peaks result from inhibition by negative feedback that acts as a band pass filter.
The application to the raw ECoG of a band-pass filter in the gamma range followed by the Hilbert transform yields the analytic signal, which is decomposed into the analytic power, and the analytic phase. Calculation of successive differences, divided by the digitizing step, yields the analytic frequency. The spatial mean, and spatial standard deviation , SDX(t), co-vary with time, revealing nearly constant values of analytic frequency and low values of SDX(t) for ECoG segments lasting 50-120 ms, but bracketed by brief epochs of high SDX(t) and either low or high values of frequency. These are the times of temporal discontinuity, across which the values for the frequency change on average by ∼ 10Hz (∼ 16% of mean frequency) together with sudden large changes in spatial patterns of amplitude and phase of the ECoG.
The discontinuities coincide with the decrease in mean analytic power, A 2 (t), to values as low as 10 −4 below the modal values. The high spatial variance in phase revealed by the maxima of SDX(t) at null spikes reflects the indeterminacy of phase at the minima. This pattern of repeated near-zero values of analytic power accompanied by maxima in analytic phase variance in very brief time periods was simulated by band pass filtering simulated brown noise, applying the Hilbert transform, calculating the analytic power and phase difference at each point in time and space, and plotting mean power, A 2 (t), and phase SDX(t). We infer that these coordinated analytic phase differences are a type of Rayleigh noise, in which the mesoscopic power nearly vanishes, owing to summation and cancellation of excitatory and inhibitory currents, but without decrease in the microscopic power. The observable mesoscopic potential differences vanish, but the microscopic ionic currents persist undiminished. At this minimum the order parameter vanishes, and the complete disorganization of the microscopic activity enables the cortex to transit to a new phase. Being mesoscopic, the preceding and succeeding patterns and the transient abeyance in the null spike are completely invisible in microelectrode recordings at the microscopic level. What is seen is the sustained firing of neurons in Hebbian nerve cell assemblies, which is identified as that of feature detector neurons [99], also pejoratively known as grandmother cells [71].
We postulate that the source of the variation in wave form with fixed input is the variation in the mesoscopic amplitude imposed by the null spikes. If so, the impulse driving of the KIIei population reveals by the decreased decay rate the amplification of the response to the evoking stimulus, whenever the mesoscopic background activity diminishes. We propose that the necessary condition for the transposition from the microscopic pattern given by a Hebbian nerve cell assembly to the mesoscopic order parameter, which is realized in the AM spatial pattern governed by an attractor, is the occurrence of a null spike during the sustained discharge of an activated Hebbian assembly. Then the initiation of a phase transition that leads to perception requires the conjunction of three recurring events of independent origin. One is the macroscopic initiation by the limbic system of a behavioral act of observation, such as a sniff, saccade or whisk, that brings a sensory volley to the cortex primed by preafference. The second is the onset of a mesoscopic null spike, which opens the cortical dynamics to the singularity by which a state of criticality is closely approached. The third is the presence in the sensory volley of microscopic action potentials from receptors that belong to the expected class of input and active a Hebbian assembly. Therefore the formation of a percept is contingent on the presence in the environment of that which is sought, on the intentional observation of the environment, and the local opening of a gate in sensory cortex, ready to reorganize its background activity.
Empirically the recurrence rates of null spikes are proportional to the value of the center frequency and to the width of the pass band. The mathematics remains to be worked out. Null spikes occur independently in multiple frequency bands spanning the beta and gamma ranges. The appearance of neocortical ECoG in array windows fixed on the brain surfaces of active animals resembles that of a pan of boiling water, in which the bubbles resemble the epochs of transient stationarity. Most null spikes do not lead to phase transitions; most incipient phase transitions are quenched, either because there is no concomitantly active Hebbian assembly, or because the reduction in background power is insufficient.

Application of RGT to neocortex 1.3.1 Topology of large-scale neocortical connectivity: Generality versus specificity
Allocortex is relatively simple; 2 quick steps takes it from a selection of receptor cells to a Hebbian assembly and then to a chaotic attractor [46], [109], [65]. Neocortex is more complex. The greatest challenge in modeling the dynamics of neocortex is posed by the requirement to meet two seemingly irreconcilable requirements. One is to model the specificity of neural action even to the level that a single neuron can be shown to have the possibility of capturing brain output. The other is to model the generality by which neural activity is synchronized and coordinated throughout the brain during intentional behavior. The topology of allocortex summarized in Fig.1.1 displays these two essential features of cortex: specificity in the topographic mapping of receptors into the bulb, and generality in the divergent-convergent integration of bulb to cortex. Neocortex provides these two features as well but on a far greater range of spatial scale. The temporal dynamics of visual, auditory and somatic cortices is very similar to that of olfaction, including the provision of background spontaneous activity, power-law distributions of connection lengths and spatial and temporal power spectral densities, repeated formation of AM spatial patterns with carrier frequencies in the beta and gamma ranges, and frame recurrence rates in the theta range. ODE serve to describe the mesoscopic dynamics of neocortical populations, but falter in attempts to model the entire range including the transitions between levels, which appear to take place very near to criticality. RGT offers a new approach, by simulating the results from mesoscopic recording and ODE modeling.

Neocortical area and neuronal density
Anatomists [104], [14], [102] agree on the relative invariance across areas of neocortex and species of mammal, the number of neurons/mm 2 being close to 10 5 within a factor of 2, compared with wide variations in number of neurons/mm 3 . The logarithm of the area of both allocortex and neocortex varies across species in accordance with the logarithm of body size, so inversely does the thickness of cortex and therefore the volume density, respectively in human neocortex 2.6 mm average thickness and 4 × 10 4 /mm 3 . Apart from the scaling factor of body size that holds for allocortex, the way by which neocortex increases in size with evolution is by increase in surface area, adding neurons at nearly fixed density per unit surface area. By the extra degree of freedom neocortical area increases more than volume relating to body mass, leading to gyrification by wrinkling of the surface. The area of allocortex varies over 3 orders of magnitude in proportion to body size. Neocortical area ranges an additional 2 orders of magnitude. The largest brain in the sperm whale is 10 5 greater in area than the smallest in the tree shrew, yet the temporal dynamics revealed by the ECoG is similar from mouse [43], to whale [77] giving dramatic evidence for power-law distributions of connectivity and self-similarity in scale-free dynamics.
The challenge posed by neurobiologists to physicists and mathematicians is to explain the phase transitions that appear in scalp EEG and MEG recordings from normal humans [10]. The fine structure of neocortex reveals each hemisphere to be a unified dynamical organ with no interruptions in connectivity over its entire surface. Given that fact alone it is not surprising to find synchronized oscillations over distances that often approach the length and breadth of the hemisphere. The similarity in patterns of phase synchrony extended across a difference in scale of 34:1 between human and rabbit. What is surprising is that the beta and gamma oscillations at each spike jump to a new value of frequency that differs from the previous frequency on average by 10 Hz over the range in animals of 20-80 Hz. The repetitive resynchronization over relatively immense correlation distances is surprising, because the coordination of the activity is based in the main in communication by action potentials with conduction velocities for the most part well under 10 m/s. [52] have reviewed the variety of proposed alternatives in field theory to explain the rapid resynchronization and concluded that it is an emergent, macroscopic property. [13] came to the same conclusion in their study of MEG modelled using small-world theory.

The large-scale embryological growth and development of neocortex
Anatomists have for over a century recognized the coexistence of relatively numerous local connections within putative modules and relatively sparse long connections among networks of modules, with the likelihood of dynamic modification in formation of Pavlov's temporary connections by learning. Recently several research groups have attempted to model cortical structure and function with small world networks [114] by constructing a cellular neural network and replacing a small proportion of local connections with long connections. The network is grown by adding new cells one at a time and forming new connections for each node as it is added [e. g., [8]).
This conception lacks important features of how neocortex forms [1], [88], [62], [80]. At the microscopic level the neurons initially replicate as small spheres in large numbers, migrate toward the surface of the brain, and then sprout axons that reach out and form synaptic connections on dendrites. The dendrites also extend branches that provide the surface area needed to accommodate synapses, each with a current path to the axonal trigger zone near the cell body. Neurons continue to branch, extend, and form new connections, long after replication ceases and excess neurons are pruned by apoptosis (programmed cell death). Lifelong growth continues in some areas by cell division and maturation of new neurons, but predominantly the existing neurons continue to reach out and form new connections that provide the matrix of connectivity that is shaped by learning. Lengthening and branching of axons and dendrites continues well into the sixth decade of human life and perhaps beyond.
Forms of the distributions of connection lengths used in random graphs are schematized in Fig. 1.4A. In the initial formulation by Erdős and Rényi [38] the probability of connections is uniform with distance (lower line). In cellular neural networks the connections are restricted to nearest or next nearest neighbor nodes, e.g., [30]. In small-world graphs/networks a low percentage of the local connections is replaced with uniformly distributed long connections, step function [114], which dramatically reduces the depth by bridging across many nodes. Anatomical studies of cortical connectivity report exponential distributions of the lengths of axon collaterals and projections (dashed curve) [92], [24], which display predominantly local connections and increasingly sparse connections with increasing distance from the cell bodies.
Revising the data display from semi-log plots to log-log plots gives power-law distributions of distances in the middle range with deficits at both ends. The deficits can be explained by the limitations in the experimental methods. For short distances the observations using light microscopy omit gap junctions and most unmyelinated axons, which in electron micrographs substantially outnumber the myelinated axons. For long distances the observations of  [24]. The data were re-plotted in log-log coordinates.
axon lengths in Golgi preparations are made in tissue sections of limited thickness (e.g., 300 microns, Fig.1.4B) in small mammals (here mouse). Continuity is very difficult to follow in serial sections, leading to a deficit in the numbers of long connections. The self-similarity of many axonal and dendrite trees is well-documented [110], [60], and the distributions of cortical structural connectivity are power-law [74], [59], [57], [83]. The most appropriate model for generating an embedding generalist random graph is a large fixed number of nodes having power-law distributions of connection distances at fixed slope and with steadily increasing asymptotes as new stochastically generated links are added. Specializations within this matrix need then to be undertaken to model the effects of genetic factors on selective targeting biases and apoptosis, and the effects of learning from experience.
Special significance is attached to the graph with power-law distribution of connection distances, owing to the introduction of self-similarity at different scales of observation and measurement. The significance lies in the dynamical properties endowed by scale-free connectivity [25], [29], [5], particularly the capacity for repetitive phase transitions to occur almost simultaneously over very large areas of neocortex, owing to the approach to criticality giving the long correlation lengths.

Evolution of heterogeneous cortical structures
Spatially heterogeneous networks are important in various systems, including the cortex. To model the development of such inhomogeneous brain structures, we describe the evolution of large-scale graphs starting from well-defined initial configurations. To model brain development from the inception of the individual to the formation of detailed brain structures, we evoke the concepts of pioneer neurons [34], [85] and the transitory subplate [103], [6]. We start from an initial set of 10 3 pioneer neurons, |V 0 | = 10 3 . Pioneer neurons are modeled as small balls densely packed in a small area of characteristic size of 10µ. During the developmental stage, this tiny cortical tissue multiplies about 10 8 -fold, to have the fully developed brain with 10 11 neurons. This means a 10 4 -times increase in linear size, by considering the essentially 2-dimensional structure of the cortex. For illustration, the stages of the development of the brain are given in Table1.1. Cortical columns 10 mm 10 9 Cortices 100 mm 10 11 Full brain The evolution of the brain is described as a inflation of the initial graph G 0 formed by the interconnected pioneer neurons. Assume an initial population of about 10 3 pioneer neurons packed on a small lattice; packing distance is ∼ 0.33µ. As the cortex develops, the space spanned by the neurons expands so that the distance between the original neurons increases. The space created by this expansion is filled in with newly created neurons. These new neurons grow connection to the existing ones. A schematic view of the initial stage of development is shown in Fig.1.5.

Directedness in cortical networks
An important constraint in RGT is directedness [8]. Most neural connections by synapses (unlike gap junctions) are unidirectional; with the exception of dendrodendritic synapses reciprocal interconnections are by different axons. Topologically the result is parcellation of a large network into a giant component, one or more input components, one or more output components, and miscellaneous islands having insufficient densities of links to participate in the other three groups. The utility of these distinctions in applications to cortex is obvious. The concept of the giant component provides an interesting candidate for a neural correlate of consciousness by virtue of its immense size, unity, and capacity for rapid state transitions. The islands, evolving in relative isolation by continuing growth of connections, might be fruitful for using scalp EEG to study the process of insight, the 'aha' phenomenon described in [56] and as the collision of matrices of thought by Arthur Koestler [64], which has been documented in the auditory cortical EEG of gerbils engaged in category learning [90].

Sparseness in cortical networks
In graphs the number of synapses between neurons (mono-, di-, poly-synaptic) determine connection distances. In networks the radial lengths of axons/dendrites from cell bodies determine distances. Most dendritic trees are small enough (< 1mm radius) to sum their currents by passive cable spread; most axons are long enough (∼ 1mm) to require communication by pulses that use local energy so as to transmit with delay but without attenuation. Every synapse requires a dendritic loop current path to the axon like a leaf on a tree, hence the large surface area of the dendritic tree and the high packing density in humans of neurons (4 × 10 4 /mm 3 ) and G 0 G 1 G 2 Fig. 1.5. Schematic illustration of the sequential evolution of the cortical graph G; black circles -pioneer neurons G 0 , grey and white circles -consecutive generations of neurons. The first two steps of the evolution are shown. The illustrated regular structure is used only for simplicity, while the actual brain tissue evolves random topological and connectivity structures. synapses (4 × 10 8 /mm 3 ) to accommodate the 10 4 synapses on each neuron sent by 10 4 other neurons [104], [14], [24], [108]. So great is the density of neurons and their connecting fibers that on average a neuron demarcated by its cell body and nucleus connects sparsely with < 1% of the neurons within its dendritic arbor, and the likelihood of reciprocal connections between pairs is < 10 −6 . Given that each neuron transmits to ∼ 10 4 others, the number of its targets in 3 steps would be 10 12 , which approaches the 13.7±1.6 10 9 neurons in each cerebral hemisphere [23]. This leads to an estimated depth of 3 for a cortical network. As most of the connections are local, real world networks have depths much larger than this value; see also Appendix. Here we propose a 3-level hierarchy of cortical organization: microscopic networks of neurons within cortical columns; mesoscopic populations in dedicated areas of sensory, motor and associational cortex, and the entire cerebral hemisphere serving as a macroscopic system that organizes the intentional activity of the brain and body. A node at each level consists of a network of nodes at the next level below. To some as yet unknown extent this hierarchical organization is made possible by self-similarity of connectivity and dynamics across levels.

Structural connectivity: Emergence of neocortex from allocortex
The laminar neuropil in the phylogenetically older parts of the cerebrum is called allocortex [86]. Examples include the prepyriform cortex (paleocortex), the hippocampus (archicor-tex) and parts of the perirhinal and entorhinal cortex (mesocortex) [78]; the olfactory bulb is here included as allocortex owing to its similarity to the others in topology and phylogenetic derivation, though not all anatomists accept this taxonomy [24]. Generically allocortex has three layers with differing subdivisions specific to each area; see Fig. 1.6. Layer 1 is also called marginal layer, which lies under the bounding pial membrane and has input axons and the dendritic trees on which they synapse. Layer 2 has the cell bodies, often with triangular shapes giving the name pyramidal cells (mitral cells in the bulb). Layer 3 has output axons with recurrent side branches called collateral branches that synapse on other pyramidal cells and on local interneurons called stellate cells (internal granule cells in the bulb). In contrast to allocortex with three layers, neocortex (also called isocortex) is described as having six layers. The layers are defined on the basis of the spatial patterns of the cell bodies revealed by selective staining of the DNA in the cell nuclei [72], not on connectivity [26], [104]. In allocortex (also called reptilian) the layers form by cells migrating from insideout and spreading under those already arrived. Comparative neuroanatomists have shown how neocortex (found only in mammals) has evolved from reptilian allocortex by intrusion into allocortical (Layer II in Fig.1.6) of neurons migrating from the basolateral forebrain [62], [1]. The first neocortical cells called pioneer cells [34], [85] are guided by a transitory subplate [103], [6] that enables subsequent cells to leapfrog over pioneer cells by inside-out progression and form the fine-grain radial bundles of parallel cells that are oriented perpendicular to the pia in neocortical columns and hypercolumns. The cytoarchitectures and connectivity patterns in Layers II to IV vary greatly between local areas. These palisades of axons preferentially following the pioneer cells provide topographic mapping of sensory input and motor output to and from neocortex. The local areas by Brodmann [26]. These specializations support the specialized functions that are performed by local networks in the many sensory, associational, and motor areas of neocortex. The individuation is enhanced by close dependence on topographically organized specific thalamic inputs by axons forming synapses in Layer IV. The neocortical marginal Layer I continues to receive input axons from nonspecific thalamic nuclei and from allocortex. In neocortex the input to Layer IV percolates outwardly to smaller, local neurons; in allocortex the input to Layer 1 percolates inwardly to larger, long-range projection neurons in Layer 2.
Braitenberg and Schuz wrote [24]: "A recent hypothesis by Miller [82], based on differences in both connectivity patterns and spontaneous activity between upper and lower layers, assigns to the upper layers the role of a neuronal library, storing most of the information encoded by assemblies, while the lower layers are assumed to catalyze the process of assembly formation" (p. 150). Kaas wrote [61]: "Generalizing from cats and monkeys it appears that the evolutionary advance in brain organization is marked by increases in the numbers of unimodal sensory fields, not by increases in multimodal association cortex as traditionally thought" (p. 147). Therefore Layers II-IV can be conceived to provide highly textured mesoscopic modules that are most active in learning and adaptation, while the embedding neuropil in Layers I, V and VI provides the connectivity needed for rapid integration of modular activity into macroscopic ensembles. We conclude that the local networks within Layers II to IV are embedded in the continuous sheet of neuropil in each cerebral hemisphere, which is unified by the global network of Layers I, V and VI. The deep pyramidal cells in Layers V and VI have long basal dendrites that receive connections from neurons broadly situated in Layers II to IV, and their apical dendrites extend and branch into Layer I. The deep pyramidal cells send the longest output axons to other areas of cortex, to the basal ganglia and brainstem, and by widely radiating collaterals into the overlying layers. Long cortico-cortical connections are not fully random; they are often distributed in patches with high local connection density, with intervening regions having few connections [79]. Most analyses of anatomical data have emphasized point-to-point networks, for example, the meta-study by Fellemin and Van Essen [40] of connectional architecture in visual cortex, and the cortico-subcortical modules of Houk [58]. These linkages are essential for dissemination of activity among the nodes in the networks. However, they do not address the connectivity required for the long-range coherence and rapid phase transitions revealed by carrier waves. Therefore the best target for modeling neocortical activity by random graphs appears to be the embedding layers.

Considerations of differences between allocortical and neocortical activity
Three major differences allocortical and neocortical activity were revealed between by the functional properties of the beta and gamma patterns. First, there was only one bulbar phase cone at a time. In neocortex there were multiple overlapping phase cones at all times. Second, each bulbar oscillatory pattern occupied the entire bulb with fixed diameter. The neocortical patterns of shared frequency varied in diameter (correlation distance), averaging 2-3 times larger than those in the bulb and often appearing large enough in cats and rabbits to occupy the entire hemisphere. Third, the variation in bulbar phase cone duration was limited to half to two thirds of the interval between frames recurring at the rate of breathing or sniffing. The distributions of durations from frames with AM patterns that were classified with respect to CS also conformed to the same power law. Hence within a range spanning nearly three orders of magnitude the durations of phase cones showed self-similarity and lack of characteristic mean and standard deviation, characteristic of scale-free distributions as seen also for the PSD of the EEG. Each of the myriad small phase cones without classifiable AM patterns was interpreted as a local state transition that appeared to abort before accessing a learned basin of attraction. The superimposed local events resembled the subthreshold fluctuations described by Prigogine [96] in a system held very close to criticality and a transition at the boundary of a basin of attraction.

Preferentiality in the assignment of probability of connections
A feature of random graphs is preferentiality [8], [4], in which the probability of a node receiving new connections is proportional to the number of connections it already has. Constraining a scale-free network with preferentiality in the assignment of the probabilities of input and output connections may lead to the emergence of a hub, which is a node of exceptionally high density of connections, an outlier in a power-law distribution that includes long-distance links. Widely studied networks with hubs include the route maps of major airlines and the patterns of connectivity on the Internet. Such networks are relatively unaffected by random loss of nodes but fail with loss of hubs [112], [113]. Likewise the performance of cerebral cortex is relatively resistant to degradation by local lesions in areas not affecting input and output pathways. The medical literature is rich with descriptions of transient loss of circumscribed cognitive functions that accompany focal damage to limited areas of neocortex [32] and are repaired by learning, but cortex can fail catastrophically with certain critical lesions, such as damage to the midbrain reticular formation leading to coma, the substantia nigra leading to Parkinson's disease, and the perforant path in Alzheimer's disease. Functional evidence for hubs might be seen in the patches of cortical activity seen in fMRI that are associated with a broad range of cognitive functions [27].
Significant preferentiality is seen in the mechanism by which pioneer neurons provide the initial guidance that is required for the construction of topographic maps, which are displayed in textbooks as motor and sensory homunculi. This mechanism addresses the local organization that is required for sensorimotor integration by neurons in Layers II to IV. The question is whether another form of preferentiality might be modelled in Layers I, V and VI, which might help to explain hubs, if indeed they exist. The connectivity of single neurons does not obey power-law, but have carrying numbers of input and output synapses numbering in the thousands. The requirements for macroscopic connectivity supporting hubs in neocortex are unknown. A mathematical model for large-scale preferentiality is needed to guide anatomists and physiologists in the search for the requisite structural and functional properties of cortex.
For example, scale-free connectivity at the population level may give a fresh start to understanding the enigmas of speech localization, and the paradox that the higher is the verbal IQ, the smaller is the speech area [91]. Broca's area is in close conjunction with the motor areas for the lips and tongue, yet linked by the uncinate fasciculus to the temporal lobe for audition and by U-fibers to the parietal lobe for proprioception. Wernicke's area is at the conjunction of somatic, visual and auditory sensory cortices. In both instances the tendencies for preferential connectivity may support the growth of hubs respectively for motor and perceptual speech functions. Focal electrical or magnetic pulse stimulation by neurosurgeons temporarily inactivates speech, but that does not imply that speech is localized to the site of stimulation. Instead the elucidation of speech mechanisms requires investigation of how neurons in a hub participate in macroscopic coordination of oscillatory neural activity. Definitive exploration of hubs may be optimized by recording EEG/MEG from a dense array of channels overlying a large percentage of the surface each hemisphere, and with sampling at 1 cm intervals corresponding to the width of gyri so as to capture adequately the texture of AM patterns of spatially coherent activity [47]. A foreseeable device with 512 channels for EEG would cover an area of scalp 23 × 23 cm, comfortably fitting over the calvarium.

Structure and Dynamics in Cortex Models
The cortical tissue has a complex anatomical structure of interconnected neural population producing well-studied behaviors manifested through spatio-temporal oscillations. These oscillations have been studied intensively using various brain monitoring tools. The major challenge is to determine the relationship between structure and function. The mathematical theory of random graphs and percolations [15], [22], [21] have been very useful for th characterization of structural properties of large-scale networks. There are some promising approaches to establish the structure-function relationship in complex networks, including brains [8], [105], [67], [7]. However, a systematic mathematical analysis of this field is still missing. Here two major aspects of the structure-function relationship will be emphasized: • Assuming a given structure, in the absence of significant growth and learning effects, one is interested in studying the dynamic behavior of the networks. How a desired oscillatory behavior can be achieved by changing the structural parameters of the network? • At the next level, structural changes are introduced, which are related to learning and adaptation. One may assume certain biologically motivated evolution rules acting through the addition/removal of nodes and connections. The aim is to study how the behavior of the network varies under such dynamically changing conditions?
Oscillations have been studied in 2-dimensional lattice using probabilistic cellular automata (PCA) [7]. The state of a node at a given time is defined by the sum of the active states of its neighbors in the previous steps, using a probabilistic threshold function. The PCA has been originally defined with a completely regular and homogeneous neighborhood structure with a binary function defined over the nodes of the graph. In addition, mean-field models have been studied as well over random graphs. The existence of critical behavior and phase transitions have been rigorously shown in the case of very low level of noise and mean field model [7]. Simulations indicate that mixed models with local and long-range connections exhibit critical behavior of weak Ising type [67]; such models are also called neuropercolation. An extended neuropercolation model with excitatory and inhibitory nodes can exhibit periodic oscillations, where the frequency of the oscillations is controlled by the coupling coefficients between the nodes [98]. These are narrow-band oscillations with well-defined frequencies in the gamma band.
We described critical oscillations over various PCA. It is of special interest to determine statistical properties of average activation (magnetization) as an order parameter. Mean field probabilistic cellular automata over finite tori can exhibit bi-stable and multi-stable oscillations with exponentially long waiting time between transitions, while each transition takes polynomial time [7]. Simulations show that the power spectra density (PSD) of magnetization oscillations exhibit 1/ f a behavior near the critical state, where f is the temporal frequency. The following major scenarios are distinguished: • Far from criticality, the dynamics stays for very long time at a resting state, i.e., oscillating around a given magnetization level, when clusters grow and dissolve randomly, without the emergence of a giant component representing large-scale ordering. • Near critical state, the waiting period is reduced and large clusters are frequently formed.
Simulations show that such large clusters eventually cover most of the network, and ultimately the system flips to another basic state. • Cluster formation and growth, and statistical distribution of the size of the clusters is a problem of great importance, and should be attempted using tools of rigorous mathematical analysis. Functional brain networks have been studied by EEG, MEG and fMRI tools. In the experimental studies it is typically postulated that there is a functional link between nodes if their activations show correlation above a threshold level. Such studies indicate the dominance of relatively simple connectivity motifs with 3 and 4 nodes [105].
Preferential attachment is a popular model of building networks. Due to the inherent shortcomings of certain versions of preferential attachment models of growth, more rigorous approaches are sought [21]. Cortical structures evolve along the path of pioneer neurons, which can be considered a form of preferentiality. It is highly desirable to introduce a mathematically tractable growth model to describe the observed neural processes. Such models are outlined in the next section.

Requirements the mathematical models
Several simplifications will be introduced when formulating a mathematically tractable theory of the structure-dynamics relationship in the cortex. The cortex is approximated as a 2 dimensional sheet of interconnected neurons and modelled as a planar graph G(V (x), E) over the vertex set V and directed edges E, where x ∈ S ⊂ R 2 . G(V (x), E) is a directed graph defined by the connected neurons, where axons and dendrites signifying outgoing and incoming connections, respectively. The following are some key parameters of the model: • ω x (r): Distribution function of the lengths of the connections originating from a node at location x. Biological arguments point to a 1/r k distribution, where r is the connection length, k is the exponent of the scale-free length distribution, k ≈ 2. • n x : Spatial distribution (density function) of vertices V in 2D. It can be uniform or may evolve following a some patterns specified by a given model V 0 ,V 1 ,V 2 , . . . ,V t , . . .. Often we consider a sequence of evolving subsets of vertices in Degree distribution of the outgoing/incoming edges of the nodes. Neurobiology indicates that the average in-degree of a node is 10 4 in the cortex. It is not clear if scale-free degree distribution is valid in the cortical tissue. The apparent presence of some distributed hub structure may point towards scale-free degree distribution.
The parameters d + x , ω x (r), n x describe certain important features of the planar graph network structure. Depending on the specifics of the model, various relationships can be derived between these quantities. For a variety of suitable models, see the Appendix.

Conclusions
The challenge of describing and modeling the dynamics of immense numbers of neurons and synapses in cortex has been met by invoking a continuum to represent the mesoscopic pulse and wave densities with state variables of neuronal populations in networks of ODE [44], [12]. This strategy works well in stationary, small-signal, near-linear ranges and can be extended into time-varying, amplitude-dependent nonlinear dynamics with piece-wise linearization giving analytic solutions to the equations for simple approximations. Large-scale modeling has provided a platform in K-sets. The KO set models the mesoscopic node in a space-time continuum. The KIe set models the synaptic interaction supporting the non-zero point attractor that sustains cortical background activity. The KIi set models mutual inhibition and spatial contrast enhancement that is required for spatial AM pattern formation. The KIIei set predicts one or more limit cycle attractors that support sustained oscillations in the gamma and beta ranges. The KIIei set suffices to model the essential dynamics of allocortex.
ODE in K-sets have been extended in several directions. The KIII set consisting of a KIe set and three KIIei sets chaos with multiple long feedback connections model the landscapes of chaotic attractors that form by learning and provide the memory system for conditioned stimuli [48], [46], [65], [73]. The KIV set consists of three KIII sets devised to model the limbic system in goal formation, learning, and navigation [66]. KIV has been successfully demonstrated intentional behavior in embedded robotics [68], [69]. The KV set is proposed to model neocortex. Advances using ODE are in progress in three directions: nonequilibrium thermodynamics [49], [50]; dissipative quantum field theory [53]; and renormalization group theory.
However, further use of ODE is severely limited. Networks of populations with multiple types of nonlinear feedback cannot be solved analytically and must be solved by numerical integration with digital computers, in which case the differential equations are discretized in time and space into coupled integrodifference equations. Owing to attractor crowding and numerical instabilities from the extremely large number of attractors in high-dimensional networks, the digital models are unstable and must be tempered with additive noise, which requires stochastic difference equations. The digital computations for solving the ODE are prohibitively slow. The alternative route of analog computation using VLSI to solve the ODE has met with early success [97] but has encountered problems with bias control. The equations suffice empirically to replicate the pulse and wave densities and their spatial patterns governed by nonconvergent chaotic attractors [65], but none of the equations can be used to model phase transitions, only the states before and after the transitions.
Random graph theory offers a fresh beginning, in which the discreteness of neurons and synapses can be approximated with numerical representations in percolation theory. It is readily adapted to describing neural nets at the microscopic level, the nodes at mesoscopic and macroscopic levels, and the relations among spatial and temporal variables between levels in phase transitions. Modeling of structural and functional connectivity in allocortex by neuropercolation theory is already well advanced, particularly in modeling the interplay of long connections, inhibitory feedback, and additive noise in the genesis of self-regulated spontaneous activity of large nets of nodes at the mesoscopic level. Results to date suffice to simulate white noise with a flat PSD (white noise) [101] at low levels of interaction strength, brown noise with 1/ f 2 PSD when close to a phase boundary, and narrow band gamma oscillation just beyond a phase transition. The extension to neocortex requires specification of the input and output connection densities of nodes in a form that is compatible with the known distributions of neuron sizes, connections, and correlation lengths.

Appendix: Mathematical models
In this Appendix we shall first describe some existing models of random graphs with properties that may make them useful in the analysis of the neocortex, and then go on to describe new such models.
We should emphasize that, as always when modelling complex systems, we have to try to reconcile two contradictory aims: a model should produce random graphs whose structure resembles that of the neocortex as much as possible, but it should not be so complicated that it is not susceptible to mathematical analysis. The hope is that eventually the mathematical theory of random graphs will be advanced enough that both requirements can be satisfied: a sophisticated model can be constructed that incorporates most of the requirements, and this very complicated model can be analyzed precisely, meaning that one can obtain mathematical descriptions both of its structure, and of the associated dynamics. At the moment we are very far from this Holy Grail.
In this appendix we shall approach the task just described from the mathematical side: we shall examine models of random graphs that can be analyzed, and that can be viewed as primitive models of the neocortex.
Let us start with the most studied random graph models, namely the 'classical' models G(n, M), introduced by Erdős and Rényi [37], and G(n, p), introduced by Gilbert [54]. In  G(n, M), the numbers n of vertices and M of edges are given, and the random graph is chosen uniformly from among all graphs with vertex set {1, 2, . . . , n} and M edges. In G(n, p), there are again n vertices, and each possible edge is present with probability p, independently of the other edges. In many contexts, such as the present one, the two models (with appropriately related parameters) are essentially equivalent, and both are now known as Erdős-Rényi random graphs, since it was they who pioneered the use of probabilistic methods to study random graphs.
The models G(n, M) and G(n, p) are homogeneous, in the sense that the vertex set has no structure; in particular, all vertices are equivalent. Most networks in the real world are inhomogeneous, either in the sense that different vertices have very different properties (for example, in scale-free networks, some have much higher degree than others), or in the sense that, while the roles of individual vertices are similar, the vertex set has (usually geometric) structure which influences the network structure: nearby vertices are more likely to be connected. In the last few years, many new inhomogeneous random graph models have been introduced; we shall briefly describe a few of these in the next subsection.

A general inhomogeneous model
Bollobás, Janson and Riordan [18] introduced a very general model of sparse inhomogeneous random graphs, designed for representing graphs with geometric or similar structure. Here 'sparse' means that the number of edges grows linearly with the number n of vertices, so the average degreed is constant, but the definitions adapt immediately to other density ranges, for example graphs in which the average degree is n 2/5 . In the fully dense range, with order n 2 edges, a closely related but much simpler model was introduced by Lovász and Szegedy [76]. Note that the mathematical definitions of 'sparse' and 'dense' involve limits as n → ∞; the distinction may not be so clear in practice when modelling real-world graphs.
The full definition of the BJR model is rather complicated, so let us present a simplified version, adapted to the directed graph setting. Given a probability space (S , µ), we shall call a measurable function κ from S × S to the non-negative reals a kernel on S . (In [18], where undirected graphs are considered, the kernel is assumed to be symmetric.) Let p = p(n) be a normalizing function, for example, p = n −3/5 , corresponding, somewhat crudely, to the probability p = p(n) in G(n, p). The random graph G p (n, κ) with vertex set {1, 2, . . . , n} is defined as follows: first choose x 1 , . . . , x n independently from S according to the distribution µ. Then, given these 'vertex types', add each possible directed edge i j with probability min{pκ(x i , x j ), 1}, independently of the other possible edges. Here the minimum has no significance: it is needed only to make sure that we work with probabilities that are at most 1. Much of the time pκ is less than 1, so the minimum is simply pκ.
One of the simplest instances of this model is defined in a geometric setting. Let R be a region in the plane, and let µ be Lebesgue measure, scaled so that µ(R) = 1, say. Then the 'types' x i are simply points in the plane, chosen independently and uniformly from R. The kernel κ is a function of two points x, y ∈ R, for example where d(x, y) denotes the Euclidean distance between x and y, the constant 0 is a characteristic length scale, α > 0 is the distance exponent, and c is a normalizing constant. In this case G p (n, κ) is a random graph on points distributed randomly in R, with nearby points more likely to be joined than distant ones, but edges of all lengths possible. Note that if κ is chosen as above, then it is explicitly built into the model that the probability that two vertices are joined is a function of their distance, and that this function follows a power law.
In general, κ can be any function of two points in R; it need not depend only on their distance, but might also vary in other ways, representing possible variations in the average number of connections in different parts of the cortex. Also, the number n of vertices need not be exactly fixed. It is often convenient to take n to have a Poisson distribution: the vertex types (in the geometric setting, simply their locations in R) are then distributed according to a Poisson process. This is the most natural mathematical model for objects distributed with roughly uniform density but not arranged in an ordered lattice.
It is very natural to consider graphs whose vertices are points in the plane, or in a higher (or sometimes lower!) dimensional space, in which the probability that two vertices are joined is a function of their distance. The very first such model, dating back almost to the beginning of the related subjects of random graph theory and percolation theory, is the disc percolation model of Gilbert [55]. Here the vertex set is a Poisson process in the plane, and two vertices are joined if they are within a certain distance 0 . A more recent example is the spread-out percolation model of Penrose [93], which is essentially the same as the special case of the (later) BJR model considered above: simplifying his model somewhat, the connection probability for points at distance d scales as r −2 ϕ(d/r), where ϕ is some function chosen in advance, and r is a parameter of the model. In spread-out percolation one studies the limiting behaviour as r → ∞ with ϕ fixed. In general, random graphs with geometric structure are known as random geometric graphs; see the book by Penrose [94].
An advantage of the more general formulation of the model G p (n, κ) is that one can incorporate additional features within the same mathematical framework. For example, it may make sense to model the neocortex by a random graph in the plane, reflecting the predominantly 2-dimensional structure, while representing neurons in different layers of the neocortex by vertices of different classes. To reflect the different patterns of connections within and between the different layers, the edge probabilities can be taken to depend on the vertex classes. For example, given constants p 1 , p 2 , . . . , p k with ∑ p i = 1 representing the relative numbers of neurons in different layers, one can take (roughly) p i n vertices of class i distributed uniformly at random in R, and then join a vertex of class i at point x with one of class j at point y with a probability p i j (x, y) depending on i, j, x and y. Although this appears to be a generalization of the BJR model, it is in fact included in the original definitions, by taking the type space S to consist of k copies of R, one corresponding to each class of vertex.
Furthermore, one can easily include spatial variations in the density of neurons, as well as in the connection probabilities, by taking the measure µ to be non-uniform. When there are several vertex classes, their densities need not be the same.
There is always a danger when introducing a random graph model, or any mathematical model, that the model becomes too general; while flexibility is desirable, it is important that the model remains simple and specific enough for mathematical analysis. Although the BJR model is very general, as shown in [18], it is still simple enough for analysis. There the main focus is on the percolation phase transition, but other properties are also studied. For example, it is shown that the degree distribution is mixed Poisson: roughly speaking, the degree of a vertex of type x is Poisson with mean λ (x) = κ(x, y) dµ(y).
Translating this to the present setting, suppose we have k vertex classes (for example, perhaps we should take k = 6, for the six layers of the neocortex). Suppose that the distribution of vertices of class i is given by a density function f i (x), x ∈ R, and that there are n i vertices of class i. Suppose also that the probability that a vertex of class i at point x is joined to one of type j at point y is given by p i j (x, y) as above. Then the expected out-degree of a vertex of type i at a point x is and the expected in-degree by As in [18], the degrees are concentrated around their means, so the in-and out-degree distributions of vertices of class i are given by sampling the functions above with respect to the distribution f i (x). In the simple special case where the connection probabilities depend only on which classes the vertices are in and the distance between them, then, except near the boundary, the degrees do not depend on x, and we find that while vertices of different classes may have very different degrees, and within a class out-and in-degrees may be different, all vertices in a class have roughly the same out-degree as each other, and roughly the same in-degree.

The exponentially expanding graph model
In the study of graphs with power-law degree distributions, there are two very different basic approaches. One is to build the power law into the model; this approach was taken by Aiello, Chung and Lu [2,3], for example. The other is to seek to explain the power law by incorporating into the model some simple features also present in the real-world networks that one hopes lead to power laws. For degree distributions, this was the approach of Barabási and Albert [9], who showed that growth with 'preferential attachment' leads to scale-free degree distributions. Their mathematically imprecise model was made precise in [20], and its degree distribution first analyzed rigorously by Bollobás, Riordan, Spencer and Tusnády in [22]. Many other models soon followed, for example the much more general (but harder to analyze) model of Cooper and Frieze [31], and the directed scale-free model of Bollobás, Borgs, Chayes and Riordan [16]. For general background on the rapidly expanding field of scale-free networks see, for example, [4,19,36,87].
In the previous subsection we took the analogue of the first approach, building powerlaw distribution of edge-lengths into the model. In this section we propose a new model, the exponentially expanding graph model, which incorporates one of the basic features of cortical development, described in Subsection 1.3.2; as we shall see, this automatically leads to a power-law distribution of edge lengths.
As described in Subsection 1.3.2, as the neocortex develops, new neurons force their way in between existing ones. To a first approximation, existing neurons retain their connections and relative positions, except that the distances between them increase with time, and new neurons connect to nearby neurons; see Fig. 1.5. In our mathematical model, it is convenient to rescale time so that the distances grow exponentially with time: we do not claim that this is the actual growth rate in the brain. At any given time, the density of neurons is roughly constant, but they do not form a regular lattice. The appropriate mathematical model for their locations is thus a Poisson process. Although a realistic model would start with a fairly small number of neurons (around 10 3 ), mathematically it happens to make no difference, and is somewhat cleaner, if we start from nothing.
To define the model formally, let R 0 be a region in the plane, with area A 0 . (One can also take R 0 to be the whole plane, obtaining an infinite random graph.) For all real numbers t, let R t = e t R 0 = {(e t x, e t y) : (x, y) ∈ R 0 } be the region obtained by scaling R 0 by a factor e t in each direction, and let C = {(t, x, y) : (x, y) ∈ R t , −∞ < t < ∞} be the corresponding 'exponential cone. ' Let P be a Poisson process on C with intensity 2. In other words, P is a random set of points of C such that the probability that a small volume V of C contains a point is 2V , and the numbers of points in disjoint regions are independent. We take a point (s, x, y) ∈ P to represent a vertex v that arrives at time s at position (x, y). At a later time t, the vertex is then at position (e t−s x, e t−s y). There are many possibilities for the edge-formation rule. The simplest is to join an arriving vertex v to all older vertices w that are within a fixed distance 0 of v at the time that v arrives. Let G t be the graph formed in this way by time t. Thus the vertex set of G t is the set of all points v = (s, x, y) of P with s ≤ t. There is a directed edge from v = (s, x, y) to w = (s , x , y ) if and only if s < s and the Euclidean distance between (x, y) and (e s −s x , e s −s y ) is at most 0 .
Let C t = {(s, x, y) ∈ C : s ≤ t} be the part of C corresponding to times up to t. Then the volume of C t is vol Let n t denote the number of vertices of G t . Then the expectation En t of n t is exactly 2vol(C t ) = A 0 e 2t , and n t has a Poisson distribution with this mean. By considering the part of C t corresponding to a small region of R t , we see that the positions at time t of the vertices of G t form a Poisson process on R t with intensity 1. This is the reason for considering exponential growth: the Poisson process P corresponds to new vertices arriving at a constant rate (2 per unit time per unit volume); exponential expansion is then exactly what is needed to maintain a constant density of vertices.
Having defined (the simplest form of) the model, let us turn to its basic properties. Although the graph G t grows as t varies, the motivation is to model the adult cortex, so we are primarily interested in the properties of the graph G t with t fixed, where in the application the parameters will be chosen so that G t has around 10 10 vertices. As noted above, the actual number n t of vertices has a Poisson distribution, but this is sharply concentrated around its mean A 0 e 2t .
The first property we consider is the degree distribution. When a vertex v arrives at time s, it sends edges to all vertices in a disk of radius 0 around it. Ignoring boundary effects (which are small once the linear size of R s is much larger than 0 ), i.e., assuming the whole disk lies inside R s , the number of vertices in this disk is Poisson with mean π 2 0 . Hence the limiting out-degree distribution in G t is Poisson with mean π 2 0 . If 0 is fairly large (as in a model of the brain, where the average degree is in the thousands), then the out-degrees are concentrated around their average value of π 2 0 . The average in-degree is of course equal to the average out-degree. Turning to the distribution, let v = (s, x, y) ∈ P be a vertex born at time s, and set Then a vertex w sends an edge to v if and only if w = (s , x , y ) ∈ I v . Thus the expected in-degree of v at time t is µ v = vol(I v ∩ C t ), and the distribution is Poisson with this mean. Ignoring boundary effects, I v ∩ C t is made up of a disk of radius 0 at each time s , s < s < t, so µ v = (t − s)π 2 0 . Once again, to a good approximation, the degrees are close to their expectations, but now there is some variation due to the ages of the vertices: the expected in-degree of a vertex v at time t is proportional to its age a = t − s. Since the number n s of vertices born by time s has mean A 0 e 2s , the fraction of vertices with age at least a > 0 is e −2a , so the in-degree distribution follows (approximately) an exponential distribution: the number of vertices with in-degree at least d is proportional to exp(−2d/(π 2 0 )). Let us turn to the edge-length distribution. Since the length of an edge increases in proportion to e t once it is formed, very roughly speaking, edges of length at least r ≥ 0 arise from vertices with age at least a = log(r/ 0 ), so the fraction f r of edges with length at least r is roughly the fraction of vertices with age at least a, i.e., f r ∼ e −2a = 2 0 /r 2 . A little more precisely, at any time s, for 0 < r < 0 the rate of formation of edges with length (at the time they are born) between r and r + dr is 2A 0 e 2s 2πr dr : the first factor represents the rate of formation of new vertices, and the second factor the area at the right distance from a new vertex, and hence the expected number of old vertices at the right distance.
For r > 0 , an edge at time t with length between r and r + dr must have been born at some time s = t − a with a > log(r/ 0 ), and its length when it was born must have been between re −a and (r + dr)e −a . It follows that the expected number of edges at time t with length between r and r + dr is Recalling that the expected number of edges at time t is π 2 0 En t = π 2 0 A 0 e 2t , the fraction f r of edges with length at least r is A 0 e 2t π 4 0 /(r ) 3 dr = 2 0 /(2r 2 ).
In other words, the edge-length distribution follows a power law; this power law is valid over all scales from r = 0 up to r comparable with the linear size of R t . The exponent 2 here fits well with the experimental exponent 2.2 seen in Fig. 1.4B. The exponent 2 above is equal to the dimension of the space in which the graph lives: this is no coincidence. Regardless of the real rate of growth, we may reparameterize time so that linear distances grow proportional to e t . Assuming the spatial density of vertices remains roughly constant, this means that n t grows with e dt , where d is the dimension. The derivation above is easily seen to lead to a power law with exponent d in this case.
We next turn to the typical graph distance d(v, w) between vertices of G t , often known as the diameter. (Note, however, that in graph theory the diameter of a graph G normally refers to the maximum value of d(v, w), v, w ∈ V (G), or, if the graph is not connected, the maximum over pairs (v, w) with v and w in the same component of G.) For the moment we view G t as an undirected graph; this is not a reasonable simplification in this context, but the argument we give for this case will apply to a related model treated as a directed graph; we return to this later.
In any reasonable model of the cortex, although the number of vertices is very large (around 10 10 ), the typical degrees are also fairly large (around 10 4 ), so it is potentially the case that most pairs of vertices are joined by a path of length 3. On the other hand, the scalefree distribution of axon lengths (see Fig. 1.4B) shows that almost all connections have lengths much shorter than the scale of the whole cortex; if all connections were short, then clearly the diameter would be very large. As is well known in graph theory (see, for example, Bollobás and Chung [17]), a few long-range connections are often enough to bring the diameter down. As we shall see, that is the case for the model G t , even if the average degree is not that large.
The key observation is that among the roughly π 2 0 earlier vertices that a new vertex v sends edges to, there is a distribution of ages. At any given time, the proportion of vertices with age at least a is e −2a , so typically v will send an edge to some vertex w born around ∆ = log(π 2 0 )/2 time units earlier than v. (Edges to vertices this old may or may not appear, but edges to somewhat less old vertices are very likely.) Starting at a vertex v, there will typically be a path v = v 0 v 1 v 2 . . . v k where each v i is older than the previous v i by a time of around ∆ . The sequence stops at around the time t 0 where the graph first starts developing, or, more precisely, when area(R t 0 ) ∼ π 2 0 : at this point, all vertices are joined to each other. Starting at two vertices and following two such paths, we find that typically vertices are at graph distance at most 2(t − t 0 )/∆ . Now 2(t − t 0 ) ∼ log(n t /n t 0 ) = log(n t /(π 2 0 )), so we find that the diameter is approximately diam(G t ) ∼ log(n t /(π 2 0 )) log(π 2 0 )/2 = 2 log n log(π 2 0 ) − 2.
This shows that, in the regime where the average degree π 2 0 is constant and the number n = n t of vertices tends to infinity, the diameter is logarithmic in n, as is to be expected. Note that the diameter is roughly twice the value log n/ logd that one would expect in a homogeneous random graph.
The calculation presented above is a simple heuristic, but it is possible to make it precise, using branching process methods to show that the desired paths really do exist for a large fraction of the vertices. In the other direction, one can estimate the expected numbers of paths of various lengths to show that the diameter is not significantly smaller than this estimate; this should not be taken for granted: in the scale-free LCD graph of [20] (a precise version of the Barabási-Albert model [9]), although the average degree is constant the diameter is (slightly) sublogarithmic in n.
The exponentially expanding graph model G t defined above incorporates one new feature that we believe is essential in any accurate model of the neocortex, namely a development process involving expansion in space that guides the formation of connections. So far, we concentrated only on this one feature, but to obtain a more realistic model, some modifications are required.
Firstly, assuming that a new vertex joins to all older vertices within distance 0 leads to unrealistically high clustering: a vertex born just before time t will be joined to most of the other vertices within distance 0 ; many of these will also be recent, and joined to each other. The simplest modification to correct this is to add an additional parameter p, and consider the model G t ( 0 , p) defined as above, except that a new vertex sends an edge to each older vertex w within distance 0 with probability p, independently of the other edges. The analysis above carries over essentially unchanged to this extension, as long as the average degree π 2 0 p is reasonably large. In this variant a typical (and therefore recent) vertex is joined to a fraction O(p) of the nearby vertices, so values of p around 10 −2 would be realistic.
There is a rather more natural generalization of G t ( 0 , p), where we replace the sharp cutoff at distance 0 by a smooth one. Most generally, we simply take a function ϕ : R + → [0, 1], and when a new vertex v arrives, the probability that it sends an edge to w is simply ϕ(d(v, w)), where d(v, w) is the distance from v to w at the time that v arrives. Note that the role of the function ϕ here is rather different from that of κ(x, y) in (1.3); taking ϕ dropping off sharply beyond some characteristic length, for example ϕ(r) = ϕ 0 e −r 2 / 2 0 , the analysis of G t above carries over, and we still see power-law distribution of the edge lengths in G t . Unlike in the previous subsection, where this was built into the model through the choice of κ, here it comes from the expansion.
The alert reader may have noticed that there is a much more serious problem with G t as a directed graph model of the neocortex: if vertices only send edges to older vertices, then there are no directed cycles, so no feedback is possible. One possible modification is to assume that when a new vertex v arrives, not only does v send out edges to nearby vertices w, but some nearby w send new edges to v.
An alternative and much more satisfactory modification of the model is as follows: suppose that we have a function ψ(t) defined on the positive reals, representing the rate at which a new vertex will send out edges at time t after it is born; the most natural example is perhaps ψ(t) = ae −bt for t > 0, with a and b positive constants. Given a function ϕ(r) as above, we may define a new model G t (ϕ, ψ) as follows: the vertex set evolves exactly as in the original model G t . Given the vertex set, in any time interval [s, s + ds] of infinitesimal length ds, each vertex v has probability ϕ(d(v, w))ψ(a(v)) ds of sending an edge to each vertex w, where d(v, w) is the distance from v to w at time s, and a(v) is the age of v at time s. Of course, more generally one can consider a single 2-variable function ϕ(d(v, w), a(v)), or even a function also depending on the age of w. In all these variants a significant fraction of edges will go from older vertices to (slightly) newer vertices, so there will be many directed cycles in the graph.
Once again, the simple heuristic analysis above carries over in this greater generality, and shows that the edge length distribution remains a power law with exponent 2, and the diameter remains logarithmic in n even if the average degree is constant as n → ∞. Of course, turning this into rigorous mathematics in this generality is likely to require considerable effort.
Of course, the exponential expansion in G t can be combined with features present in other models, such as the original preferential attachment mechanism of Barabási and Albert [9]. One natural formulation is to take the vertex set of G t , but to join a new vertex v to a vertex w with probability proportional to ϕ(d(v, w))ψ(deg(w)) for some functions ϕ and ψ, where d(v, w) is defined as above and deg(w) is the degree (or perhaps the in-degree) of vertex w at the time v is born. This model is much harder to analyze than the LCD model mentioned above, but we believe that if the 'preference function' ψ is linear, then it will also have a power-law degree distribution, with the exponent depending on the exact details of the model. A closely related (but not exponentially expanding) geometric random graph model with preferential attachment has been introduced and analyzed by Flaxman, Frieze and Vera [42]. Of course it is not clear that power-law degree distribution is appropriate in modelling the cortex.
For future work, we believe that there are two important directions. One is the rigorous mathematical analysis of models of the type we have described here, including the particular model G t (ϕ, ψ). Another, even more important, topic is the development of further models that more accurately represent the structure of the brain, but are still simple enough to analyze.