State space discovery in spatial representation circuits with persistent cohomology

Persistent cohomology is a powerful technique for discovering topological structure in data. Strategies for its use in neuroscience are still undergoing development. We explore the application of persistent cohomology to the brain’s spatial representation system. We simulate populations of grid cells, head direction cells, and conjunctive cells, each of which span low-dimensional topological structures embedded in high-dimensional neural activity space. We evaluate the ability for persistent cohomology to discover these structures and demonstrate its robustness to various forms of noise. We identify regimes under which mixtures of populations form product topologies can be detected. Our results suggest guidelines for applying persistent cohomology, as well as persistent homology, to experimental neural recordings.


Introduction
The enormous number of neurons that compose brain circuits must coordinate their firing to operate effectively. This organization often constrains neural activity to low-dimensional manifolds, which are embedded in the high-dimensional phase space of all possible activity patterns [1,2,3,4]. In certain cases, these low-dimensional manifolds exhibit nontrivial topological structure [5]. This structure may be imposed externally by inputs that are periodic in nature, such as the orientation of a visual stimulus or the direction of an animal's head. It may also be generated internally by the network itself; for example, the grid cell network constructs periodic representations of physical space which outperform non-periodic representations in several ways [6,7,8,9,10,11,12]. In either case, detecting and interpreting topological structure in neural data would provide insight into how the brain encodes information and performs computations.
One promising method for discovering topological features in data is persistent cohomology [13,14,15].
By tracking how the shape of the data changes as we examine it across different scales-thickening data points by growing balls around them-persistent cohomology detects prominent topological features in the data, such as loops and voids. This knowledge helps to identify the low-dimensional manifolds sampled by Figure 1: Pipeline for simulations and data analysis. (A) We generate activities for multiple neural populations along an experimentally recorded rat trajectory. For each population, we plot activity maps as a function of position (left) and direction (right) for one example neuron. (B) Then we choose neurons for topological analysis and form a high-dimensional vector of their firing rates at each timepoint along the trajectory. For computational tractability, we eliminate the most redundant points using a geometric subsampling algorithm. (C) We compute persistent cohomology on these subsampled timepoints to identify low-dimensional topological structure.
We characterize how persistent cohomology can discover topological structure in neural data through simulations of the brain's spatial representation system. This system contains several neural populations whose activity exhibits nontrivial topology, which we term periodic neural populations (Fig. 1A). Grid cells fire when an animal reaches certain locations in its environment that form a triangular lattice in space [16].
In each animal, grid cells are partitioned into 4-10 modules [17]. Within each module, grid cells share the same scale and orientation but their lattices have different spatial offsets. Modules appear to increase in scale by a constant ratio and exhibit small differences in orientation [17,18]. Head direction cells fire when an animal's head is oriented in a certain direction relative to its environment [19]. They respond independently of the animal's position. Finally, conjunctive grid × head direction cells respond when an animal is located at the vertices of a triangular lattice and is oriented in a certain direction [20]. Like grid cells, conjunctive cells are also believed to be partitioned into modules.
We also consider neural populations whose activity exhibits trivial topology, which we will term nonperiodic neural populations (Fig. 1A). Place and non-grid spatial cells are part of the spatial representation system, and they fire in one or multiple regions of the environment [21,22,23]. These two populations are found in different brain regions, and the former tend to have sharper spatial selectivity compared to the latter. Finally, we simulate neurons with irregular activity that exhibits no spatial tuning. We imagine these random cells may be responding to non-spatial stimuli or representing internal brain states.
Our use of simulated data enables us to comprehensively explore the capabilities of our methodology.
With complete control over the data, we can identify features that improve topological discovery and features that disrupt it. We can also freely generate datasets with varied quantities and proportions of different neural populations. A greater number of neurons embeds underlying activity manifolds in higher dimensions, which strengthens the signal. However, experimental limitations impose bounds to this number. Our simulations allow us to evaluate persistent cohomology in regimes currently accessible by experiments, as well as in regimes that may soon become experimentally tractable due to advances in recording technology [24]. Moreover, by combining neurons from different populations, we demonstrate the ability of persistent cohomology to succeed without the pre-processing step of neural classification.

Overview of methods and persistence diagrams
In this work, we simulate neural populations within the spatial representation system, prepare the simulated data for topological analysis, and compute persistent cohomology to discover topological structure within the data (Fig. 1). We will now briefly describe each of these three stages; a complete explanation is provided in the Methods section.
To generate neural recordings, we define tuning curves as a function of position and direction. For each grid module, we first create a triangular lattice in space. Each grid cell has peaks in its positional tuning curves at a randomly chosen offset from each lattice point. Its directional tuning curve is uniform. Head direction cells have peaks in their directional tuning curves at a randomly chosen angle and have uniform positional tuning curves. Conjunctive cells have positional tuning curves like grid cells and directional tuning curves like head direction cells. We describe tuning curves for the non-periodic neural populations in the Methods section.
These tuning curves are applied to an experimentally extracted trajectory of a rat exploring its circular enclosure, producing an activity, or firing rate, for each neuron at 0.2 s intervals. This simulates the simultaneous recording of a large number of neurons from the medial entorhinal cortex and the binning of their spikes into firing rates. The time series span 1000 s, or 5000 data points. Figure 1A shows examples of these time series data mapped back onto spatial coordinates.
Next, we choose a subset of these neurons and pre-process it for topological data analysis (Fig. 1B).
We form a vector of neural activities at each timepoint, which produces a point cloud in high-dimensional phase space. We wish to subsample it for computation tractability while maintaining as much evidence of topological structure embedded within it. To do so, we use a geometric subsampling algorithm that roughly eliminates the most redundant points (see the Methods section for a complete description), reducing the 5000 timepoints down to 1000.
Finally, we apply persistent cohomology to this subsampled point cloud (Fig. 1C). We describe this technique colloquially here, in terms of its dual, persistent homology. Both produce the same persistence diagrams, but we use cohomology throughout the paper both because it is faster to compute and because it allows us to parametrize the data, as described in the next subsection. See the Methods section for a precise description.
From the point cloud, we form a Vietoris-Rips filtration, which is a nested sequence of simplicial complexes. Each complex consists of all cliques in the near-neighbor graph, which contains all edges between points at distance at most r apart. As the threshold r increases, more edges enter the graph, and more cliques enter the Vietoris-Rips complex. Throughout this process, cycles (e.g., 1-dimensional loops) appear and get filled in the complex ( Fig. 2A). There is a unique way to pair the distance thresholds at which cycles are born and die.
All such birth and death distances are collected into a persistence diagram (Fig. 2B). The points farthest from the diagonal correspond to the most persistent cycles that appear for the longest range of distance thresholds. They recover topological structure in the space sampled by the point cloud, which corresponds to the processes underlying the data-in our case, the spatial representation networks and external inputs.
Persistent (co)homology is stable: the persistent points will remain in the diagram if we make small changes to the data, such as selecting slightly different timepoints or perturbing their values by a small amount of noise. The points closest to the diagonal would appear even if the processes underlying the data lack topological structure, and they are usually interpreted as noise.
The process we described above keeps track of cycles of different dimension (Fig. 2C). Besides the loops (1-dimensional cycles), it tracks connected components (0-dimensional cycles), voids (2-dimensional cycles) and higher-dimensional topological features, which lack a colloquial name. The number of independent kcycles is called the k-th Betti number and is a topological invariant of a space. We can infer the topology of a dataset by comparing the number of persistent k-cycles to the k-th Betti numbers of conjectured ideal spaces, such a circle or a torus. Note that for every dataset, the 0-(co)cycle corresponding to the entire point cloud will never die, so we consider its death distance to be infinity.

Persistent cohomology for periodic neural populations
Each periodic neural population spans a particular topological space. We recover these relationships when we compute persistent cohomology of our simulated data ( Fig. 2D-F). Each grid cell is active at one location and identified topological spaces (bottom). We compare the number of persistent k-(co)cycles to the k-th Betti numbers β k of different topological spaces to infer the underlying topological structure of the dataset. (D) Grid cells from module 1 exhibit one persistent 0-cocycle, two persistent 1-cocycles, and one persistent 2-cocycle, which corresponds to a torus. (E) Head direction cells exhibit one persistent 0-cocycle, one persistent 1-cocycle, and no persistent 2-cocycles, which corresponds to a circle. (F) Conjunctive cells exhibit one persistent 0-cocycle, three persistent 1-cocycles, and three persistent 2-cocycles, which corresponds to a 3-torus.
in a rhombic unit cell that is tiled over 2D space (Fig. 3A). Grid cells within a single module share the same unit cell but differ in their active location, so each grid module spans a torus [5]. Similarly, head direction cells span a circle and each conjunctive cell module spans a 3-torus. The correspondence between our results and predicted topological spaces validates the basic capabilities of our methods.
Persistent cohomology can not only discover topological structure in neural data, but it can also decode information embedded within this structure. Using the grid cell population presented in Fig. 2D as an exemplar, we can assign circular coordinates [15,25] from the two persistent 1-cocycles (Fig. 3B). This periodic topological space should represent the rhombic unit cell of the grid module, which tiles physical space. To explore this relationship, we project the entire time series of neural activities onto these coordinates.
For each neuron, we find the data points for which that neuron is the most active within the population.
These points are clustered in topological space (Fig. 3C), which means that persistent cohomology can indeed recover the firing fields of grid cells in topological space. We then analyze the population dynamics. The animal's trajectory in physical space corresponds to a trajectory through the point cloud in activity space.
We find that the projection of this trajectory on the circular coordinates matches well with the real animal trajectory (Fig. 3D). Thus, persistent cohomology can decode the behavior of an animal in a recording of its neural activity.
The ability of persistent cohomology to discover topological structure depends on the number of neurons in the dataset, or equivalently, the dimension of the time series embedding. Again using the grid cell population as an exemplar, we form multiple datasets with randomly selected neurons to measure the success rate of persistent cohomology as a function of neuron count (Fig. 4A). To measure success, we only use the first cohomology group H 1 , which contains 1-cocycles. We define successful discovery of the grid cell torus as a persistence diagram with two persistent 1-cocycles, and we define what it means to be persistent precisely using the largest-gap heuristic. We calculate the lifetime of each cocycle, which is the difference between its death and birth and corresponds to the vertical distance to the diagonal of its point in the persistence diagram (Fig. 4A,B). We find the largest gap in the lifetimes and consider points above this gap to be persistent (Fig. 4B). Figure 4C shows that reliable discovery of the torus using this heuristic can be achieved with ≈20 grid cells.
Persistent cohomology can succeed for mixed signals. Separation of raw electrode recordings into singleneuron spike trains may not always be possible or desired. To address this scenario, we form multi-neuron units by linearly combining time series of neural activity across different grid cells. The mixing coefficients are drawn from a uniform random distribution and then normalized. Example activity maps of these multineuron units as a function of position are shown in Fig. 4D. The combination of many neurons destroys the classic responses exhibited by individual grid cells. Yet, multi-neuron units retain topological information associated with the grid module that can be recovered by persistent cohomology (Fig. 4E). The success rate for discovering toroidal topology is remarkably independent of the number of grid cells in each unit.
Persistent cohomology can also succeed in the presence of spiking noise. To simulate such noise, we use our generated activity as a raw firing rate that drives a Poisson-like random process (see Methods).
We construct this process to have different Fano factors, which is the variance in the random process for a given firing rate divided by the firing rate. When the Fano factor is 1, the random process is Poisson. Figure 4F shows activity time series for two grid cells that have very similar tuning curves and thus very similar raw firing rates, which can be seen in the noise-free condition (top left). Higher Fano factors lead to more variability both across time for each neuron and across neurons. Persistent cohomology can still recover the toroidal topology of the grid module, though more neurons are required for higher Fano factors (Fig. 4G). In the mammalian cortex, Fano factors lie around ∼1.0-1.5 [26]. Applying this regime to our simulations implies that ≈80 grid cells are required for reliable topological discovery.

Persistent cohomology for mixtures of neural populations
Persistent cohomology can discover topological structure in mixtures of neural populations. When neurons are recorded from a periodic neural population and a non-periodic neural population, the latter adds additional dimensions to the point cloud embedding, but the topological structure contained within the former should persist. We test if persistent cohomology can recover this information in mixed datasets with neurons from both a periodic population (either grid or conjunctive) and a non-periodic population (either non-grid spatial or random). Reliable discovery of the grid module torus is possible when the number of spatial or random cells is less than twice the number of grid cells (Fig. 5A,B). Detection of the 3-torus formed by conjunctive cells requires more neurons, but it can also be reliably achieved in the presence of non-periodic populations (Fig. 5C,D). Thus, persistent cohomology demonstrates robustness to the inclusion of non-periodic populations. The size of the non-periodic population that can be tolerated increases with the size of the periodic population.
When neurons are recorded from multiple periodic neural populations, their structures are preserved along separate subspaces in high-dimensional activity space. We explore this scenario by forming mixed datasets with neurons from two periodic populations. When the two populations respond to unrelated signals-such as grid and head direction cells-the combined topological space should be the Cartesian product of those of the separate populations. Indeed, that persistent cohomology can discover the resultant 3-torus at intermediate mixing ratios (Fig. 6A,B). If one population contributes many more neurons-and thus embedding dimensions-than the other, we instead detect the corresponding single-population structure (Fig. 6B).

Figure 5: Persistent cohomology in combinations of periodic and non-periodic neural populations. Success is defined by discovering the number of persistent 1-cocycles expected from the periodic population, which is two for grid cells and three for conjunctive cells. (A) Grid cells from module 1 and non-grid spatial cells. (B) Grid cells from module 1 and random cells. (C) Conjunctive cells from module 1 and non-grid spatial cells. (D) Conjunctive cells from module 1 and random cells.
When the two populations respond to related signals-such as grid and conjunctive cells-the activity space of one is contained in the activity space of the other. Grid cells and conjunctive cells from the same module encode position with the same toroidal structure; they both tile space with the same rhombic unit cell of neural activity. In addition, the conjunctive population encodes direction with a circular topology.
Thus, the mixed dataset should span a 3-torus, which can be detected by persistent cohomology (Fig. 6C).
For reliable discovery, at least ≈120 conjunctive cells and at least ≈240 total neurons are required. However, discovery of the product topology is disrupted if the number of grid cells exceeds the number of conjunctive cells by more than a factor of ≈1.5. Thus, persistent cohomology can best detect product topologies when the mixed dataset is not dominated by one population.
Finally, we consider the case of mixing grid cells from multiple modules. Grid modules have different rhombic unit cells with different scales and orientations, so they map the same physical space onto different topological coordinates. Thus, a mixed dataset from two different modules should exhibit the product topology of two 2-tori, which is the 4-torus. However, we are unable to reliably discover this structure using the grid modules illustrated in Fig. 1A; they are too sparse. To produce a point cloud that embeds the toroidal structure for one grid module, the animal trajectory should densely sample its rhombic unit cell. This is achieved since the enclosure contains many unit cells. However, to produce a point cloud that embeds the 4-torus formed by two grid modules, the animal trajectory should densely sample all combinations of unit cells. This is not achieved by the grid modules illustrated in Fig. 1A because the enclosure contains too few rhombic unit cells for them to overlap in many different configurations.
Thus, we generate two grid modules separated by the same scale ratio as in Fig. 1A, but with one-fourth of its scale (Fig. 6D). In addition, we explore different relative orientations between the modules by generating different orientations for module 2. Notably, these scale ratios and orientation differences are not chosen such that the two rhombic unit cells would share a simple geometric relationship with each other [27], which would limit their possible overlap configurations. As we include more neurons from both modules into our dataset, we see that four persistent 1-cocycles eventually emerge from the points close to the diagonal that represent sampling noise (Fig. 6E). The success of persistent cohomology is independent of the orientation difference between the two modules (Fig. 6F).

Discussion
We demonstrate that persistent cohomology can discover topological structure in neural recordings with as few as tens of neurons from a periodic neural population (Fig. 4). From this structure, it can decode the trajectory of the animal using only the time series of neural activities (Fig. 3). It is robust to noise within these recordings and to the inclusion of neurons from non-periodic populations (Figs. 4 and 5).
Furthermore, it can also discover more complex topological structures formed by combinations of periodic neural populations if each population is well-represented within the dataset (Fig. 6). These conclusions have been obtained through the analysis of spatial representation circuits, but we hope that they may guide the use of persistent (co)homology in other neural systems as well.
We have characterized the capabilities of persistent cohomology using simple simulated data, but we expect our results to generalize to real neural data. The inputs to our analysis pipeline are firing rates over 0.2 s time bins, which averages over many neurophysiological processes, including major neural oscillations found in the hippocampal region [28]. Moreover, we have found that persistent cohomology is robust to variations in firing rate that may arise from spiking or other sources of biological noise. A key requirement for generalization is a separation of two timescales. The macroscopic timescale at which topological structures are explored-here, the time required to traverse a rhombic unit cell of a grid module or 360°of head direction-must be much longer than the microscopic timescale at which neuronal activity is generated.
This enables us to coarse-grain over the activity and describe it by a firing rate.
The application of persistent (co)homology to neuroscience data is still in its developing stages. Notable lines of work include: dimensionality reduction for manifold decoding in head direction cells [29,30]; simulations of hippocampal place cells in spatial environments with nontrivial topology [31,32,33,34,35]; analysis of EEG signals, for classification and detection of epileptic seizures [36,37] and for construction of functional networks in a mouse model of depression [38]; inferring intrinsic geometric structure in neural activity [39]; and detection of coordinated behavior between human agents [40]. There is potential for persistent (co)homology to provide insight to a wide range of neural systems. Topological structures generally can be found wherever periodicities exist. These periodicities can take many forms, such as the spatial periodicities in our work, temporal regularities in neural oscillations, motor patterns, and neural responses to periodic stimuli.
The toolbox of topological data analysis has more methods beneficial to the analysis of neural data. The methods described in this paper, including geometric subsampling, are sensitive to outliers. This problem can be addressed within the same framework of persistent cohomology by using distance-to-measure function [41].
In practice, this would translate into a slightly more elaborate construction [42] of the Vietoris-Rips complex.
Furthermore, our analysis pipeline benefits from having neural activity embedded in a high dimensional space, i.e., from having many more neurons than the intrinsic dimensions of the recovered tori. It is possible to adapt this technique to the regime of limited neural recordings (even to a single neuron) by using time-delay embeddings [43]. However, for spatial populations, such a technique would require control over the smoothness of the animal's trajectory, which may not be feasible in practice.
Our results also suggest research directions in topological data analysis. Throughout the paper we relied on 1-dimensional persistent cohomology to infer whether we recovered a particular torus. But that is a relatively weak method: many topological spaces have cohomology groups of the same dimension. Although the trajectories that we recover via circular coordinates serve as a convincing evidence that we are indeed recovering the tori, it is possible to confirm this further by exploiting cup product structure in cohomology, which is a particular kind of a topological operation that turns cohomology into a ring. Computing a "persistent cup product" would provide additional evidence about the structure of the recovered spaces.

Animal trajectory
We simulate the simultaneous recording of neurons from a rat as it explores a circular enclosure of diameter 1.8 m. We use 1000 s from a trajectory extracted from an experimental animal [16,44]. This trajectory is provided as velocities sampled at 0.5 ms intervals along with the initial position. We perform subsampling to produce a position and direction at 0.2 s intervals as follows. The position of the animal is simply the average position within each 0.2 s time bin. The direction of the animal is the circular mean of the velocity vector angle within each 0.2 s time bin; we ignore the distinction between body direction and head direction.

Periodic neural populations
We generate tuning curves as a function of position and/or direction for each neuron. These localized tuning curves are based on a shifted and truncated cosine function: For each grid module, we set a scale l and an orientation φ. This defines a transformation matrix from the space of phases − 1 2 , 1 2 × − 1 2 , 1 2 to the rhombic unit cell of the grid module in physical space: The inverse of this matrix A −1 maps the rhombic unit cell onto the space of phases. We also define · as the vector norm and a m ≡ (a + m mod 2m) − m as the shifted modulo function. The tuning curve of a grid cell as a function of position x is then Each grid cell is shifted by a uniformly random phase offset b. The full width at half maximum of each grid field is 0.45l.
The tuning curve of a head direction cell as a function of direction θ is Each head direction cell is shifted by a uniformly random direction offset c. The full width at half maximum of the head direction field is π/2.
The tuning curve of a conjunctive cell is simply the product Each conjunctive cell has uniformly random offsets b and c.

Non-periodic neural populations
For non-grid spatial cells, we generate tuning curves as a function of position for each neuron of the form where σ = 40 cm and the d i 's are each chosen uniformly randomly between (0 cm, 0 cm) and (180 cm, 180 cm).
For random neurons, we obtain activity time series by sampling from a distribution every 2 s, or 10 timepoints, and interpolating between using cubic polynomials. The distribution is Gaussian with mean 0 and width 0.5, truncated between 0 and 1.

From tuning curves to time series
To obtain activity time series for all populations except for random neurons, we apply the tuning curves to the subsampled trajectory. Whenever the velocity decreases below 5 cm/s, we set the activity to be 0. This threshold simulates the behavior of neurons in the hippocampal region that exhibit high activity during locomotion and low activity during idle periods [20,45,46].

Spiking noise for grid cells
The activities described above are dimensionless, and we typically do not need to assign a scale because we divide each time series by its mean before applying persistent cohomology. To create spiking noise, however, we must set the firing rate. We linearly rescale the rate given by Eq. 3: This sets the maximum firing rate to be 8 and creates a baseline rate of 0.4; with 0.2 s time bins, these values correspond to 40 Hz and 2 Hz, respectively. However, we still set the firing rate to 0 Hz when the animal's velocity decreases below 5 cm/s.
Using λ, we generate Poisson-like spiking noise with different levels of variability (Fig. 4). At each timestep, the noisy activity is given by F sets the Fano factor of the random process, which is its variance divided by its mean (for any given λ).
The F = 1 case corresponds to a Poisson random process; F < 1 implies sub-Poissonian noise and F > 1 implies super-Poissonian noise.

Multi-neuron units for grid cells
We generate multi-neuron units (Fig. 4) by linearly combining activity time series from multiple grid cells.
Each mixing coefficient is chosen from a uniform random distribution between 0 and 1. The activity is then normalized by the sum of squares of the mixing coefficients.

Neural activity maps
We construct activity maps for each neuron as a function of position or direction. To do so, we simply tally the total amount of activity in each positional or directional bin. Note that these maps do not depict firing rate because we do not divide by the occupancy of each bin; we decided against this in order to show the activity experienced through the animal trajectory.

Processing neural recordings
For each neuron, we first divide its activity at every timepoint by its mean activity.
To improve the computational efficiency, we reduce the number of input points while preserving their topological structure by applying a geometric subsampling strategy. We pick the first point at random, and then iteratively add a point to our subsample that is the furthest away from the already chosen points.
Specifically, if P is the input point set and Q i is the subsample after i iterations, we form Q i+1 by adding the point q i+1 chosen as arg max p∈P min q∈Qi p − q .
Fig. 1B illustrates a result of this strategy. By construction, the subsample Q i forms an ε i+1 -net of the input point sample, which means the largest distance from any input point to the nearest point of the subsample does not exceed ε i+1 = min q∈Qi q i+1 − q . Because persistent cohomology is stable, this guarantees that the persistence diagram we compute for the subsample Q i is at most ε i+1 away, in bottleneck distance [47], from the persistence diagram of the full point set P .

Applying persistent cohomology
We refer the reader to extensive literature on persistent (co)homology [13,14] for the full details, and only sketch a few of the involved constructions.
For technical reasons-both to recover the circular coordinates and for computational speed-we work with persistent cohomology, which is dual to persistent homology, which a reader might be more familiar with.
To recover the topology of the space sampled by a point set P , we construct a Vietoris-Rips simplicial complex. A simplex σ is a subset of the point set P . A simplicial complex is a collection of simplices closed under the subset relation, i.e., if K is a simplicial complex and σ ∈ K, then for every τ ⊂ σ, τ ∈ K. Given a parameter r, Vietoris-Rips complex consists of all subsets of the point set P , in which every pair of points is at most r away from each other, VR(P, r) = {σ ⊆ P | p − q ≤ r ∀p, q ∈ σ}.
The dimension of a simplex is one less than its cardinality, dim σ = card σ − 1. 0-simplices are called vertices; 1-simplices, edges; 2-simplices, triangles; etc. A coboundary of a k-simplex is the alternating sum of its supersets of dimension k + 1, denoted δσ. A k-cochain is a collection of k-dimensional simplices; its coboundary is the sum of the coboundaries of its simplices. A k-cochain is called a k-cocycle if its coboundary is 0. A k-cocycle is a k-coboundary if it is a coboundary of some (k − 1)-cochain. Two k-cocycles are said to be cohomologous if their difference is a k-coboundary. A cohomology equivalence class consists of all cohomologous k-cocycles. The cohomology group, H k (VR(P, r)), consists of all cohomology classes. The rank of this group is called the k-th Betti number; intuitively, it measures the number of "holes" in the space. When we compute the coboundaries with coefficient in a field, the cohomology group is a vector space. (In all our experiments we use Z mod 3 as the field.) As we vary the radius r in the definition of the Vietoris-Rips complex, the simplicial complexes nest: VR(P, r 1 ) ⊆ VR(P, r 2 ), for r 1 ≤ r 2 . The restriction of the larger complex to the smaller induces a linear map on cohomology groups, and all such maps form a sequence: H k (VR(P, r 1 )) ← H k (VR(P, r 2 ) ← H k (VR(P, r 3 )) ← . . .
It is possible to track when cohomology classes appear and disappear in this sequence. Specifically, it is possible to decompose this sequence into a collection of birth-death pairs (r i , r j ) that specify that a class born in H k (VR(P, r i )) dies in H k (VR(P, r j )). This collection of pairs, called a persistence diagram, completely describes the changes in the sequence of cohomology groups.