Limiting disclosure of sensitive data in sequential releases of databases

Erez Shmueli, Tamir Tassa, Raz Wasserstein, Bracha Shapira, Lior Rokach

פרסום מחקרי: פרסום בכתב עתמאמרביקורת עמיתים

תקציר

Privacy Preserving Data Publishing (PPDP) is a research field that deals with the development of methods to enable publishing of data while minimizing distortion, for maintaining usability on one hand, and respecting privacy on the other hand. Sequential release is a scenario of data publishing where multiple releases of the same underlying table are published over a period of time. A violation of privacy, in this case, may emerge from any one of the releases, or as a result of joining information from different releases. Similarly to [37], our privacy definitions limit the ability of an adversary who combines information from all releases, to link values of the quasi-identifiers to sensitive values. We extend the framework that was considered in Ref. [37] in three ways: We allow a greater number of releases, we consider the more flexible local recoding model of "cell generalization" (as opposed to the global recoding model of "cut generalization" in Ref. [37]), and we include the case where records may be added to the underlying table from time to time. Our extension of the framework requires also to modify the manner in which privacy is evaluated. We show that while [37] based their privacy evaluation on the notion of the Match Join between the releases, it is no longer suitable for the extended framework considered here. We define more restrictive types of join between the published releases (the Full Match Join and the Kernel Match Join) that are more suitable for privacy evaluation in this context. We then present a top-down algorithm for anonymizing sequential releases in the cell generalization model, that is based on our modified privacy evaluations. Our theoretical study is followed by experimentation that demonstrates a staggering improvement in terms of utility due to the adoption of the cell generalization model, and exemplifies the correction in the privacy evaluation as offered by using the Full or Kernel Match Joins instead of the Match Join.

שפה מקוריתאנגלית
עמודים (מ-עד)98-127
מספר עמודים30
כתב עתInformation Sciences
כרך191
מזהי עצם דיגיטלי (DOIs)
סטטוס פרסוםפורסם - 15 מאי 2012

טביעת אצבע

להלן מוצגים תחומי המחקר של הפרסום 'Limiting disclosure of sensitive data in sequential releases of databases'. יחד הם יוצרים טביעת אצבע ייחודית.

פורמט ציטוט ביבליוגרפי