Combinatorial Analysis of Barcodes and Interval Graphs for Applications in Data Science
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Electronic Theses and Dissertations bannerUC Davis

Combinatorial Analysis of Barcodes and Interval Graphs for Applications in Data Science

Abstract

In this thesis we develop combinatorial methods for studying barcodes. A barcode is a collection of closed intervals on the real line. Barcodes arise as key objects in topological data analysis, as summaries of the persistent homology groups of a filtration, and in graph theory as interval graphs.We first introduce a map from the space of barcodes to certain equivalence classes of double occurrence words, i.e., permutations of a multiset in which every element occurs exactly twice. We call the set of all such words the space of combinatorial barcodes. We then define an order relation on this space, based on the weak-Bruhat order, and show that the resulting poset is a graded lattice. We show that this lattice can also be defined using new notions of inversion multisets/ inversion vectors of double occurrence words. We also use these objects to prove several properties of the lattice. For example, we compute its rank generating function and introduce a natural bijections between combinatorial barcodes, trapezoidal words, and Stirling permutations. In addition to being of interest from a combinatorial perspective, we also show that the cover relations in this lattice can be used to determine the set of barcode bases of persistence modules. We then generalize this construction, producing an entire family of multipermutations associ- ated to barcodes. We equip these new multipermutations with a similar order relation and show that the resulting posets also form lattices, which we call the power-k barcode lattices. Unfortu- nately, these lattices do not retain many of the other “nice” combinatorial properties found in our original construction. However, we show that these multipermutations record increasingly detailed information about the arrangement of the bars in a barcode. We prove that for a large class of bar- codes these multipermutations can be used to bound two classic, continuous metrics on barcodes: the Wasserstein and bottleneck distances. We also show that these lattices form the face lattices of certain Bruhat-interval polytopes. Finally, we study an original model for generating random interval graphs (or equivalently random barcodes). This model is motivated by scientific sampling problems, where one receives a sequence of time-stamped observations and wants to make conclusions about the start and end of certain events. Although our general model is difficult to analyze, we prove many results about the expected behaviour of this model in a special case which we call the stationary case. For example, we compute the expected number of edges, maximum degree, and maximum clique size. We also study the limit behavior of this model as the number of samples/ observations goes to infinity. In particular, we prove that the special case of this model converges to a complete graph and compute a lower bound on the expected waiting time for this to occur.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View