Skip to main content
eScholarship
Open Access Publications from the University of California

Relationships in Data Sanitization: A Study in Scarlet

  • Author(s): Bishop, Matt
  • Cummins, Justin
  • Peisert, Sean
  • Singh, Anhad
  • Bhumiratana, Bhume
  • Agarwal, Deborah
  • Frincke, Deborah A
  • Hogarth, Michael
  • et al.
Abstract

Research in data sanitization (including anonymization) emphasizes ways to prevent an adversary from desanitizing data. Most work focuses on using mathematical mappings to sanitize data. A few papers examine incorporation of privacy requirements, either in the guise of templates or prioritization. Essentially these approaches reduce the information that can be gleaned from a data set. In contrast, this paper considers both the need to "desanitize" and the need to support privacy. We consider conflicts between privacy requirements and the needs of analysts examining the redacted data. Our goal is to enable an informed decision about the effects of redacting, and failing to redact data. We begin with relationships among the data being examined, including relationships with a known data set and other, additional, external data. By capturing these relationships, desanitization techniques that exploit them can be identified, and the information that must be concealed in order to thwart them can be determined. Knowing that, a realistic assessment of whether the information and relationships are already widely known or available will enable the sanitizers to assess whether irreversible sanitization is possible, and if so, what to conceal to prevent desanitization.

Many UC-authored scholarly publications are freely available on this site because of the UC Academic Senate's Open Access Policy. Let us know how this access is important for you.

Main Content
Current View