Grouped data is a topic that goes back to the end of the nineteenth century at least. Kulldorff (1961) refers to grouping as a special case of a more general kind of procedure, called partial grouping. A partially grouped sample refers to the case where available information is associated with a collection of disjoint sets partitioning a domain. The sample space is divided into non-overlapping sets. In some of these sets only the counts of observations are recorded (grouped data) while the individual values of the observations falling in the other sets are recorded (ungrouped data). This thesis focuses on spatially partially grouped data.
This work is motivated by an interest in modeling the locations and times of wildfire occurrences that happened in the Continental United States in the period from 1986 to 1996. The data cover fires that occurred in federal and non-federal lands. The federal data consisted of each fire's point location (latitude and longitude) while the non-federal fires were aggregated by county.
Wildfires occurrences can be considered as a point process in. Brillinger, Preisler and Benoit (2003) approximate a point process by a binary process. We propose integrating the two levels of aggregate data, points and counts, by modeling the fires as a binary 0-1 process on space. The sample space is partitioned into small pixels arranged in a regular two dimensional grid. Each pixel either has a fire or not. The numbers of fires in each non-overlapping set are assumed to be independent and to follow a Binomial distribution.
Under the assumption that the wildfire rate is a smooth varying function of space we propose a spatial smoothing method for partially grouped data. This smoother is based on local regression using the binary process to approximate the partial grouped data.
Based on the binary-valued approximation a logit model is used with the the National Fire Danger Rating System fuel model as explanatory variables. The estimated probabilities are included in a map with the associated uncertainty levels.