Main Page

News

Currently, no news are available

Differential Privacy in the Era of Foundation Models

Abstract:

In recent years, foundation models, such as GPT, LLaMA, Dall-E, or Stable Diffusion, have transformed the field of machine learning, particularly in large-scale tasks like natural language processing and computer vision. These models, trained on vast datasets, are capable of transferring their learned knowledge to a wide range of applications, making them incredibly powerful and versatile. However, this also raises significant privacy concerns when sensitive data is involved.

This seminar will explore how differential privacy (DP), the leading standard for privacy protection, can be applied to foundation models to mitigate these risks. DP ensures that changes in individual data points in a model’s training data minimally affect the overall model predictions, providing a safeguard for privacy even in the most data-intensive models. We will dive into the fundamentals of both DP and foundation models, study how they intersect, and explore strategies for integrating privacy guarantees into these cutting-edge systems. Key topics will include the theory behind DP, practical privacy-preserving mechanisms, and case studies of DP implementation in advanced foundation models.

Learning Objective:

There are two main learning objectives of this course.

1) Learning the foundations of Differential Privacy, Foundation Models, what they are, how they play together, how we can leverage them to achieve privacy preservation in machine learning.

2) Getting a glimpse into how to be a successful researcher. As part of research, you have to read papers, understand what they are about, and be able to apply what they talk about, in the best case to your own research ideas. Additionally, you will learn how to give a good (research) presentation, how to identify the relevant questions, ask and answer them, and how to do scientific writing.

Time:

The seminar will take place on Wednesdays 4:10 PM-6 PM in the CISPA building (Stuhlsatzenhaus 5, 66123 Saarbrücken). Please make sure to be on time.

Rooms, Dates, and Topics:

23.10.2024 (Room 0.02): Introduction: Presentation of Seminar Topics, and "How-To" give a presentation

30.10.2024 (Room 0.02): Topic 1: Introduction to Foundation Models & The Pre-train/Adapt Paradigm

13.11.2024 (Room 0.02): Topic 2: Introduction to Differential Privacy

20.11.2024 (Room 0.02): Topic 3: Privacy Risks in Foundation Models

18.12.2024 (Room 0.02): Topic 4: Privately Pre-Training Diffusion Models

8.1.2025 (Room 0.02): Topic 5: Privately Fine-Tuning Diffusion Models

15.1.2025 (Room 0.02): Topic 6: Privately Training Large Language Models

22.1.2025 (Room 0.02): Topic 7: Other Private Language Model Adaptations

29.1.2025 (Room 0.02): Topic 8: Differential Privacy Auditing

5.2.2025 (Room 0.02): Topic 9: Problems and Open Research Directions in Privacy-Preserving Machine Learning in Foundation Models

Papers:

Topic 1: Introduction to Foundation Models & The Pre-train/Adapt Paradigm

Antoni: Diffusion Models: Denoising Diffusion Probabilistic Models (https://arxiv.org/abs/2006.11239)

Nupur: Large Language Models: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/abs/1810.04805)

Summary of the course in G-Doc: Simon

Topic 2: Introduction to Differential Privacy

Pratik: Differential Privacy: Differential privacy (https://www.comp.nus.edu.sg/~tankl/cs5322/readings/dwork.pdf)

Deepali: Differential Privacy in Machine Learning: Deep learning with differential privacy (https://arxiv.org/abs/1607.00133)

Summary of the course in G-Doc: Swathi

Topic 3: Privacy Risks in Foundation Models

Swathi: Data extraction from Language Models: Extracting training data from large language models (https://www.usenix.org/system/files/sec21-carlini-extracting.pdf)

Yashodhara: Date extraction from Diffusion Models: Extracting training data from diffusion models (https://www.usenix.org/system/files/usenixsecurity23-carlini.pdf)

Summary of the course in G-Doc: Anupam

Topic 4: Privately Pre-Training Diffusion Models

Jyotika: Pre-training using semantics: PrivImage: Differentially private synthetic image generation using diffusion models with semantic-aware pretraining (https://www.usenix.org/system/files/usenixsecurity24-li-kecen.pdf)

Xicheng: Pre-training using advanced noise composition: dp-promise: Differentially private diffusion probabilistic models for image synthesis (https://www.usenix.org/system/files/sec24fall-prepub-1157-wang-haichen.pdf)

Summary of the course in G-Doc: Deepali

Topic 5: Privately Fine-Tuning Diffusion Models

Jyotika: Low: rank fine-tuning: Differentially Private Fine-Tuning of Diffusion Models (https://arxiv.org/pdf/2406.01355)

Xicheng: DP for useful synthetic images: Differentially private diffusion models generate useful synthetic images (https://arxiv.org/pdf/2302.13861)

Summary of the course in G-Doc: Simon

Topic 6: Privately Training Large Language Models

Nupur: Private Pretraining: Large-scale differentially private BERT (https://arxiv.org/pdf/2108.01624)

Srushti: Private Fine-Tuning: Large language models can be strong differentially private learners (https://arxiv.org/pdf/2110.05679)

Summary of the course in G-Doc: Swathi

Topic 7: Other Private Language Model Adaptations

Anupam: Private Low Rank Training: Differentially private fine-tuning of language models (https://arxiv.org/pdf/2110.06500)

Simon: Private Prompting: Flocks of stochastic parrots: Differentially private prompt learning for large language models (https://proceedings.neurips.cc/paper_files/paper/2023/file/f26119b4ffe38c24d97e4c49d334b99e-Paper-Conference.pdf)

Summary of the course in G-Doc: Deepali

Topic 8: Differential Privacy Auditing

Antoni: Efficient Audits: Privacy auditing with one (1) training run (https://proceedings.neurips.cc/paper_files/paper/2023/file/9a6f6e0d6781d1cb8689192408946d73-Paper-Conference.pdf)

Pratik: Baysian Estimation: Bayesian Estimation of Differential Privacy (https://proceedings.mlr.press/v202/zanella-beguelin23a/zanella-beguelin23a.pdf)

Summary of the course in G-Doc: Anupam

Topic 9: Problems and Open Research Directions in Privacy-Preserving Machine Learning in Foundation Models

Yashodhara: Problems in private LLMs: What does it mean for a language model to preserve privacy? (https://dl.acm.org/doi/pdf/10.1145/3531146.3534642)

Srushti: Position on privacy in the pretrain-adapt paradigm: Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining (https://openreview.net/pdf?id=ncjhi4qAPV)

Summary of the course in G-Doc: Swathi

Peer Groups:

Always the two students who present on the same day form the peer group for that same day.

Questions:

Questions can be posed here: https://docs.google.com/document/d/1E6eKWuVaVsR_ywjWC6UwgnGEuCCy_NKnmwksEiiQlmE/edit?usp=sharing

All presentations should be uploaded here: https://drive.google.com/drive/folders/1Y_0qYuxn2nCYQdGde1TGNrTyTPe3-_G1?usp=sharing

Requirements and Deliverables:

This seminar is open to senior Bachelor, Masters, and doctoral students. Ideally, students should have a solid background in mathematics through the base lectures, and a strong interest in deep learning. Each student will present one or two topics during the seminar hours in the form of an oral presentation. In addition, each student will read the relevant papers for the other students’ presentations, and hand in a seminar paper at the end of the semester.

Administration and LSF Registration:

A registration to LSF is required if you want the points. I have myself as an instructor no access to the system, hence, questions need to be posed to the administration: studium@cs.uni-saarland.de A guide for registration can be found here: https://saarland-informatics-campus.de/studium-studies/#lehrveranstaltungen-ansprechpartnerInnen