Main Page

News

Currently, no news are available

Data anonymization and how to break it

Anonymization is the main legal paradigm for sharing data while limiting privacy harms. Yet, robust anonymization of individual-level data is very difficult to achieve in practice.

In this seminar, you will first learn what is not anonymous data, through (in)famous examples of anonymization failures. Then, we will turn our attention to modern data sharing systems, including query-based systems, synthetic data, and differential privacy. Finally, we will cover automated approaches for auditing the privacy of these systems, which holds interesting challenges.

Lecturer: Dr. Ana-Maria Cretu (CISPA)

Time: Every Thursday between 2:15 and 4 PM starting on April 16.

Location: C0 Room 0.02, located in the CISPA Stuhlsatzenhaus 5 building, except for 07.05.26 and 02.07.26. for which the location is TBD.

Requirements:

Strong interest in data privacy.
Basic knowledge of probabilities and statistics.
(Optional) Having taken a course on machine learning and/or optimization.

Grading: This seminar has three grading components:

Presentation (50%): You will prepare and deliver a 20-25 min presentation (followed by 10 mins question/discussion) of the paper assigned to you. You will have the possibility to get feedback on your slides before the presentation.
Participation in discussion (20%): Contribution to the discussion during the seminar meeting, including but not limited to 3 questions prepared in advance for one selected paper (see component 3 below).
Paper review: (30%) You will write a review of a different paper than the one you presented. The review can be up to 4 A4 pages (template TBD), not counting references. The use of LLMs in any capacity (ideation, correction, etc.) is strictly forbidden. The review should address the following questions:

What research problem does the paper study?
Why is this research problem worth studying (societal motivation)?
What has been done before (positioning relative to prior work)?
What is the threat model?
How does the paper address the research problem (methods, datasets)?
What are the strengths of the paper?
What are the limitations of the paper?
What novel research questions does this work inspire?

For this paper, in addition to the review, you will have to prepare 3 questions to ask the presenter of the paper.

Topics:

Crossed out are presentations skipped or already completed.

Week 1 (16.04.2026, C0 Room 0.02): Introduction & k-anonymity

- We will cover the course organization, a brief intro to anonymization, and an example of paper presentation on k-anonymity.

- Paper 1: k-anonymity: a model for protecting privacy. Presenter: Ana-Maria Cretu.

Optional reading:

- A scientific review of anonymization: Anonymization: The imperfect science of using data while preserving privacy

- EU guidelines for anonymization: Article 29 Data Working Party's Opinion 05/2014 on Anonymisation Techniques

- How to k-anonymize data: Incognito: Efﬁcient Full-Domain K-Anonymity

Week 2 (23.04.2026, C0 Room 0.02): K-anonymity revisited

~~Paper 2: ℓ-Diversity: Privacy Beyond k-Anonymity~~ (presentation skipped)

~~Paper 3: t-Closeness: Privacy Beyond k-Anonymity and l-Diversity.~~ Presenter: TBD.

Week 3 (30.04.2026, C0 Room 0.02): Re-identification attacks

~~Paper 4: Robust De-anonymization of Large Sparse Datasets ("Netflix paper").~~ Presenter: Eric Ansbach.

~~Paper 5: Unique in the Crowd: The privacy bounds of human mobility.~~ Presenter: Syed Kumail Raza Zaidi.

Optional reading on the de-anonymization of social network data:

- De-anonymizing Social Networks

- Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography

Week 4 (07.05.2026, UdS room SR 2 in E 2.5): Advanced re-identification - Part I

- ~~Paper 8: Interaction data are identifiable even across long periods of time.~~ Presenter: Amelie Klein.

- ~~Paper 9: The risk of re-identification remains high even in country-scale location datasets.~~ Presenter: Luis Henrique Bastos Tamura.

Optional reading:

- In-depth analysis of how re-identification risk scales with population size: A scaling law to model the effectiveness of identification techniques

Week 5 (14.05.2026): No seminar (public holiday)

Week 6 (21.05.2026): No seminar

Week 7 (28.05.2026, C0 Room 0.02): Advanced re-identification - Part II

Paper 6: Estimating the success of re-identifications in incomplete datasets using generative models. Presenter: Kerem Kılıç.

~~Paper 7: Attacks on Deidentification’s Defenses~~ (presentation skipped)

Week 8 (04.06.2026, C0 Room 0.02): No seminar (public holiday)

~~Paper 10: A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies.~~ Presentation skipped.

Week 9 (11.06.2026, C0 Room 0.02): Aggregation & query-based systems

Paper 11: Knock Knock, Who's There? Membership Inference on Aggregate Location Data. Presenter: Shaloni Modi.

Paper 12: When the Signal is in the Noise: Exploiting Diffix's Sticky Noise. Presenter: Assigned.

Paper 13: QuerySnout: Automating the Discovery of Attribute Inference Attacks against Query-Based Systems. Presenter: Ana-Maria Cretu.

Week 10 (18.06.2026, C0 Room 0.02): Differential privacy - part I

Paper 14: Revealing Information while Preserving Privacy. Presenter: Syed Kumail Raza Zaidi.

Paper 15: Calibrating Noise to Sensitivity in Private Data Analysis. Presenter: Kerem Kılıç.

Week 11 (25.06.2026, C0 Room 0.02): Differential privacy - part II

Paper 16: Census TopDown: Differentially Private Data, Incremental Schemas, and Consistency with Public Knowledge. Presenter: Eric Ansbach.

Paper 17: DP-Sniper: Black-Box Discovery of Differential Privacy Violations using Classifiers. Presenter: TBD.

Week 12 (02.07.2026, UdS room SR 2 in E 2.5): Advanced reconstruction & Synthetic data part I

Paper 18: Generate-then-Verify: Reconstructing Data from Limited Published Statistics. Presenter: Louai Alkhatib.

~~Paper 19: Confidence-Ranked Reconstruction of Census Microdata from Published Statistics. Presenter TBD.~~

Week 13 (09.07.2026, C0 Room 0.02): Synthetic data - part I

Paper 20: Synthetic Data – Anonymisation Groundhog Day. Presenter: Luis Henrique Bastos Tamura.

~~Paper 21: Achilles' Heels: Vulnerable Record Identification in Synthetic Data Publishing. Presenter: TBD.~~

Week 13 (09.07.2026, C0 Room 0.02): Synthetic data - part II

Paper 22: PrivBayes: Private Data Release via Bayesian Networks. Presenter: Ann Mariyam Manna.

Paper 23: The Inadequacy of Similarity-based Privacy Metrics: Privacy Attacks against "Truly Anonymous" Synthetic Datasets. Presenter: Shaloni Modi.

Optional reading:

- In-depth analysis of privacy metrics: The DCR Delusion: Measuring the Privacy Risk of Synthetic Data.

Week 14 (16.07.2026, C0 Room 0.02): Q&A for written review.

Week 15 (23.07.2026, C0 Room 0.02): Anonymization and machine learning

~~Paper 24: Algorithms that remember: model inversion attacks and data protection law~~

~~Paper 25: Extracting Training Data from Large Language Models~~