Accountable Data Fusion and Privacy Preservation Techniques in Cyber-Physical Systems
With the deployment of large sensor-actuator networks, Cyber-Physical Systems (CPSs), such as smart buildings, smart grids, and transportation systems, are producing massive amounts of data often in different forms and quality. These data are in turn being used collectively to inform decision-making of the entities that engage with the CPSs. The impact of these systems on people's lives has led to a strong call for accountability of system decisions made based upon various data sources. The collection, analysis, and dissemination of these data also present a privacy risk that needs to be addressed.
The first part of this dissertation focuses on accountable data fusion. We develop an online prediction framework that integrates dynamic sensor measurements with prior knowledge. The proposed framework facilitates reasoning about prediction confidence, which is crucial to making dependable decisions. We also move beyond predictive modeling to interpretable analytics by evaluating the influence of each data instance on the algorithmic outcome. We formalize the notion of ``data value,'' and provide efficient algorithms to compute it. This value notion not only enables us to better understand black-box predictions through the lens of training data, but allows for fair allocation of the profit generated from a prediction model that is built with data from cooperative entities. We further use the proposed value notion to develop an effective data sanitization mechanism, which screens off low-quality or even adversarial data instances from the training set.
In the second part, we address the problem of incorporating privacy as an active engineering constraint into the CPS design and operation. We discuss a privacy metric inspired by information theory and provide algorithms to optimize the privacy mechanism for a given system or co-design the privacy mechanism and system control. In order to avoid unnecessary privacy-utility tradeoffs, we develop a framework to identify redundant data for specific decision-making processes. Furthermore, we present a privacy-preserving data publishing system, which can achieve improved data utility by optimizing the privacy mechanism according to the use of published data. While the algorithms and techniques introduced can be applied to many CPSs, we will mainly focus on the implications for smart buildings.