Relationships and Data Sanitization: A Study in Scarlet
Published Web Locationhttps://doi.org/10.1145/1900546.1900567
Research in data sanitization (including anonymization) emphasizes ways to prevent an adversary from desanitizing data. Most work focuses on using mathematical mappings to sanitize data. A few papers examine incorporation of privacy requirements, either in the guise of templates or prioritization. Essentially these approaches reduce the information that can be gleaned from a data set. In contrast, this paper considers both the need to "desanitize" and the need to support privacy. We consider conflicts between privacy requirements and the needs of analysts examining the redacted data. Our goal is to enable an informed decision about the effects of redacting, and failing to redact data. We begin with relationships among the data being examined, including relationships with a known data set and other, additional, external data. By capturing these relationships, desanitization techniques that exploit them can be identified, and the information that must be concealed in order to thwart them can be determined. Knowing that, a realistic assessment of whether the information and relationships are already widely known or available will enable the sanitizers to assess whether irreversible sanitization is possible, and if so, what to conceal to prevent desanitization.