Numerous organizations distribute non-aggregate microdata for purposes such as demographic and public health research, and it is important that the released data protect the identities of individuals. At the same time, concern for individual privacy must be balanced with the need to maintain the quality and utility of the data.
In this talk, I will describe several new techniques for generalizing data to preserve anonymity. First, I will describe a "multidimensional recoding" paradigm and greedy partitioning algorithm. An extensive theoretical and empirical evaluation indicates that, in addition to being fast and scalable, this algorithm often produces higher-quality data than what would be considered optimal under previous recoding paradigms.
In the second part of this talk, I observe that the ultimate indicator of data utility is the task or workload for which the data will be used. Following this intuition, I will describe several extensions to the basic generalization framework that allow us to incorporate a robust set of target workloads, including classification, regression, and selection.
Kristen is a fifth-year Ph.D. student at the University of Wisconsin - Madison, where she is advised by Professors David DeWitt and Raghu Ramakrishnan. She received a Bachelor's degree in Computer Science from Dartmouth College in 2002, and spent time as an intern at IBM Almaden Research Center in 2003 and 2005.