Shemetov, Dmitry; Shemetov, Dmitry

Concerning Some Statistical Problems on Graphs

2021

Shemetov, Dmitry
Advisor(s): Sharpnack, James

Abstract

In this dissertation, we consider three statistical problems unified by an underlying graph structure.

The first problem concerns graph scan statistics.Scan statistics can be used for early detection of disease outbreaks by testing large regions for faint but detectable abnormal measurements. Traditional scan statistic implementations, such as the SaTScan software, fuse neighboring measurements to obtain more powerful global null tests. In this project, we provide an algorithm for detecting and localizing elevated levels in graph data. We present methods and software that can utilize both spatial proximity and network information. We apply this methodology to an outbreak of porcine reproductive and respiratory syndrome (PRRS) in pig farms and find hotspots of the disease. Our method can provably control the family-wise error rate and control the false discovery rate while detecting and localizing outbreaks.

The second problem concerns distributed communication with memory constraints.Memory constraints are common in modern statistical systems like those occurring in wireless sensor networks. We consider a model system of sensors communicating in a tree structure under bit-rate constraints to solve a hypothesis testing problem. We obtain bounds on the hypothesis testing error rate for a central class of decision rules as functions of the bitrate, tree width, and depth. These bounds give fundamental tradeoffs on memory and communication in wireless sensor networks.

The final problem concerns random graph model fitting.Exponential random graph models (ERGMs) are a family of random graphs often used in social science due to their elegance of definition and flexibility of expression. Fitting the parameters of such models to data is quite challenging however, due to the complexity of the partition function and the need to count subgraphs. Traditional Markov chain Monte Carlo methods used for fitting suffer from poor initialization errors and the need to produce many graph samples. We study a pseudolikelihood based on the large deviations theory for ERGMs developed by Chatterjee and Diaconis and investigate its performance under noisy subgraph count estimates. The work in this project shows preliminary results that our proposed method may be viable in low edge density graph regimes.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Davis

Concerning Some Statistical Problems on Graphs