This paper explores the data cleaning challenges that arise in using WiFi
connectivity data to locate users to semantic indoor locations such as
buildings, regions, rooms. WiFi connectivity data consists of sporadic
connections between devices and nearby WiFi access points (APs), each of which
may cover a relatively large area within a building. Our system, entitled
semantic LOCATion cleanER (LOCATER), postulates semantic localization as a
series of data cleaning tasks - first, it treats the problem of determining the
AP to which a device is connected between any two of its connection events as a
missing value detection and repair problem. It then associates the device with
the semantic subregion (e.g., a conference room in the region) by postulating
it as a location disambiguation problem. LOCATER uses a bootstrapping
semi-supervised learning method for coarse localization and a probabilistic
method to achieve finer localization. The paper shows that LOCATER can achieve
significantly high accuracy at both the coarse and fine levels.