M-invariance: Towards Privacy Preserving Re-publication Of Dynamic Datasets
Abstract
The previous literature of privacy preserving data publication has focused on performing "one-time" releases. Specifically, none of the existing solutions supports <i>re-publication</i> of the microdata, after it has been updated with insertions <u>and</u> deletions. This is a serious drawback, because currently a publisher cannot provide researchers with the most recent dataset continuously. This paper remedies the drawback. First, we reveal the characteristics of the re-publication problem that invalidate the conventional approaches leveraging <i>k</i>-anonymity and <i>l</i>-diversity. Based on rigorous theoretical analysis, we develop a new generalization principle <i>m-invariance</i> that effectively limits the risk of privacy disclosure in re-publication. We accompany the principle with an algorithm, which computes privacy-guarded relations that permit retrieval of accurate aggregate information about the original microdata. Our theoretical results are confirmed by extensive experiments with real data.