Preserving Privacy in Hierarchical Data Publishing
Many applications today rely on storage and management of semi-structured information. Data stored in such databases often has to be shared with untrusted third parties for research-related or other purposes, which makes privacy a fundamental problem. In this thesis, we study privacy-preserving publishing of hierarchical data, where an individual's information in a database can be collected and represented using a tree structure. We provide practical algorithms and solutions regarding how some of the well-known privacy definitions (e.g., k-anonymity and l-diversity), which were designed for tabular data, can be generalized and applied to hierarchical data. We show that different solutions are preferable depending on the needs and expectations of data recipients, in order to maximize data applicability and utility. Thus, we introduce a range of techniques to anonymize hierarchical data. We experiment on realistic synthetic databases to test our algorithms and validate our claims.