Data mining technology identifies patterns and trends from large
collections of published data. Including personal information in
published data is necessary for data mining, but may violate the
privacy of individuals. So it is crucial to study how to
preserve privacy in data mining.
In this talk, we give an overview of recent work on anonymized
data publishing for privacy preservation data mining. In the
first part, we introduce the k-anonymity problem, together with
its complexity, (approximation) algorithms, and how to apply it
in data mining. Several extensions to k-anonymity are also
discussed. In the second part, we discuss the tradeoff between
the utility of the published data and the loss of the privacy.
In the third part, we show that anonymized data publishing is
vulnerable to composition attacks, and a possible solution.
Bolin Ding is a Ph.D. candidate in the Department of Computer
Science, University of Illinois at Urbana-Champaign. He is
interested in data mining, databases, and algorithm design and
analysis. He got his MPhil's degree on System Engineering from
the Chinese University of Hong Kong in 2007, and a Bachelor's
degree in Math and Applied Mathematics from Renmin University of
China in 2005.