Preserving Privacy in on-Line Analytical Processing Data Cubes

This dissertation presents a series of techniques that can keep secret the sensitive information in data while answering data cube queries about such data. These techniques enable online-analytic processing (OLAP) systems to support useful data analysis without worrying about harmful privacy breaches. This dissertation starts by extending two existing inference control methods in statistical databases to data cubes. It then proposes a novel lattice-based method and presents a query-driven variation of the method.

The first method is based on the intuition that an adversary can sometimes infer sensitive information in data when he/she possesses prior knowledge about such data. The method is cardinality-based and aims to control inferences by looking at the number of previously known values in a data cube. The study first shows that inappropriate inferences of sensitive values are possible even when an adversary is restricted to skeleton data cube queries. The study then presents the first result that is a data cube with no previously known values is always safe from inferences of sensitive values. To relax this rigid result, the second result says that a data cube whose previously known values are less than a given bound can remain safe from inferences; the result also shows that the given bound is tight in the sense that counterexamples can always be found for any tighter bound. These results show that by exploiting the special structure of a data cube, the cardinality-based method can be applied to data cubes. However, such a method has only limited power in controlling inferences because it only applies to data cubes with no or few previously known values. The study also proposes a three-tier model for controlling inferences, which also underlies all the proposed methods in this thesis.

The second method aims to remove the limitations that only skeleton queries can be asked and that inferences can only be determined for data cubes with no or few previously known values. The study first renders inferences significantly more difficult by restricting users to even multi-dimensional range (MDR) queries, that is the MDR queries involving even numbers of values. The study then shows that the collection of such even MDR queries is safe if and only if a special set of sum-two queries (that is, queries involving exactly two values) is safe. This result leads to an efficient algorithm that can decide the safety of even MDR queries. The study also obtains results for MDR queries that involve odd number of values, and for non-MDR queries. The results are integrated into the three-tier model for controlling inferences.

The following two methods remove the limitations that only sum queries can be asked and only exact values of in the core cuboid of data cubes can be regarded as sensitive. The study first devises a flexible framework for specifying authorization objects as any part of a data cube. The study then proposes a lattice-based method for inference control. The method first prevents an adversary from combining queries for inferences, and it then eliminates the remaining inferences. This novel approach makes the method efficient, broadly applicable, and readily implementable. To further improve the availability of queries, this method is modified to a query-driven variation, which reduces unneccessary restrictions on queries with reasonable additional performance overhead.