TY - GEN
T1 - Identity obfuscation in graphs through the information theoretic lens
AU - Bonchi, Francesco
AU - Gionis, Aristides
AU - Tassa, Tamir
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2011
Y1 - 2011
N2 - Analyzing the structure of social networks is of interest in a wide range of disciplines, but such activity is limited by the fact that these data represent sensitive information and can not be published in their raw form. One of the approaches to sanitize network data is to randomly add or remove edges from the graph. Recent studies have quantified the level of anonymity that is obtained by random perturbation by means of a-posteriori belief probabilities and, by conducting experiments on small datasets, arrived at the conclusion that random perturbation can not achieve meaningful levels of anonymity without deteriorating the graph features. We offer a new information-theoretic perspective on this issue. We make an essential distinction between image and preimage anonymity and propose a more accurate quantification, based on entropy, of the anonymity level that is provided by the perturbed network. We explain why the entropy-based quantification, which is global, is more adequate than the previously used local quantification based on a-posteriori belief. We also prove that the anonymity level quantified by means of entropy is always greater than or equal to the one based on a-posteriori belief probabilities. In addition, we introduce and explore the method of random sparsification, which randomly removes edges, without adding new ones. Extensive experimentation on several very large datasets shows that randomization techniques for identity obfuscation are back in the game, as they may achieve meaningful levels of anonymity while still preserving features of the original graph.
AB - Analyzing the structure of social networks is of interest in a wide range of disciplines, but such activity is limited by the fact that these data represent sensitive information and can not be published in their raw form. One of the approaches to sanitize network data is to randomly add or remove edges from the graph. Recent studies have quantified the level of anonymity that is obtained by random perturbation by means of a-posteriori belief probabilities and, by conducting experiments on small datasets, arrived at the conclusion that random perturbation can not achieve meaningful levels of anonymity without deteriorating the graph features. We offer a new information-theoretic perspective on this issue. We make an essential distinction between image and preimage anonymity and propose a more accurate quantification, based on entropy, of the anonymity level that is provided by the perturbed network. We explain why the entropy-based quantification, which is global, is more adequate than the previously used local quantification based on a-posteriori belief. We also prove that the anonymity level quantified by means of entropy is always greater than or equal to the one based on a-posteriori belief probabilities. In addition, we introduce and explore the method of random sparsification, which randomly removes edges, without adding new ones. Extensive experimentation on several very large datasets shows that randomization techniques for identity obfuscation are back in the game, as they may achieve meaningful levels of anonymity while still preserving features of the original graph.
UR - http://www.scopus.com/inward/record.url?scp=79957862437&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2011.5767905
DO - 10.1109/ICDE.2011.5767905
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:79957862437
SN - 9781424489589
T3 - Proceedings - International Conference on Data Engineering
SP - 924
EP - 935
BT - 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
T2 - 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
Y2 - 11 April 2011 through 16 April 2011
ER -