Abstract:
With the increasing appetite for publicly available personal data for various analytics and decision making, due care must be taken to preserve the privacy of data subjects before any disclosure of data. Though many data anonymization techniques are available, there is no holistic understanding of their risk of re-identification and the conditions under which they could be applied. Therefore, it is imperative to study the risk of re-identification of anonymization techniques across different types of datasets. In this paper, we assess the re-identification risk of four popular anonymization techniques against four different datasets. We use population uniqueness to evaluate the risk of re-identification. As per the analysis, k-anonymity shows the lowest re-identification risk for unbiased samples of the population datasets. Moreover, our findings also emphasize that the risk assessment methodology should depend on the chosen dataset. Furthermore, for the datasets with higher linkability, the risk of re-identification measured using the uniqueness is much lower than the real risk of re-identification.
Citation:
P. L. M. K. Bandara, H. D. Bandara and S. Fernando, "Evaluation of Re-identification Risks in Data Anonymization Techniques Based on Population Uniqueness," 2020 5th International Conference on Information Technology Research (ICITR), 2020, pp. 1-5, doi: 10.1109/ICITR51448.2020.9310884.