News
Currently, no news are available
Differential Privacy in the Era of Foundation Models
Abstract:
In recent years, foundation models, such as GPT, LLaMA, Dall-E, or Stable Diffusion, have transformed the field of machine learning, particularly in large-scale tasks like natural language processing and computer vision. These models, trained on vast datasets, are capable of transferring their learned knowledge to a wide range of applications, making them incredibly powerful and versatile. However, this also raises significant privacy concerns when sensitive data is involved.
This seminar will explore how differential privacy (DP), the leading standard for privacy protection, can be applied to foundation models to mitigate these risks. DP ensures that changes in individual data points in a model’s training data minimally affect the overall model predictions, providing a safeguard for privacy even in the most data-intensive models. We will dive into the fundamentals of both DP and foundation models, study how they intersect, and explore strategies for integrating privacy guarantees into these cutting-edge systems. Key topics will include the theory behind DP, practical privacy-preserving mechanisms, and case studies of DP implementation in advanced foundation models.
Learning Objective:
There are two main learning objectives of this course.
1) Learning the foundations of Differential Privacy, Foundation Models, what they are, how they play together, how we can leverage them to achieve privacy preservation in machine learning.
2) Getting a glimpse into how to be a successful researcher. As part of research, you have to read papers, understand what they are about, and be able to apply what they talk about, in the best case to your own research ideas. Additionally, you will learn how to give a good (research) presentation, how to identify the relevant questions, ask and answer them, and how to do scientific writing.
Time:
The seminar will take place on Wednesdays 4:10 PM-6 PM in the CISPA building (Stuhlsatzenhaus 5, 66123 Saarbrücken). Please make sure to be on time.
Rooms, Dates, and Topics:
23.10.2024 (Room 0.02): Introduction: Presentation of Seminar Topics, and "How-To" give a presentation
30.10.2024 (Room 0.02): Topic 1: Introduction to Foundation Models & The Pre-train/Adapt Paradigm
13.11.2024 (Room 0.02): Topic 2: Introduction to Differential Privacy
20.11.2024 (Room 0.02): Topic 3: Privacy Risks in Foundation Models
18.12.2024 (Room 0.02): Topic 4: Privately Pre-Training Diffusion Models
8.1.2025 (Room 0.02): Topic 5: Privately Fine-Tuning Diffusion Models
15.1.2025 (Room 0.02): Topic 6: Privately Training Large Language Models
22.1.2025 (Room 0.02): Topic 7: Other Private Language Model Adaptations
29.1.2025 (Room 0.02): Topic 8: Differential Privacy Auditing
5.2.2025 (Room 0.02): Topic 9: Problems and Open Research Directions in Privacy-Preserving Machine Learning in Foundation Models
Papers:
Topic 1: Introduction to Foundation Models & The Pre-train/Adapt Paradigm
Antoni: Diffusion Models: Denoising Diffusion Probabilistic Models (https://arxiv.org/abs/2006.11239)
Nupur: Large Language Models: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/abs/1810.04805)
Summary of the course in G-Doc: Simon
Topic 2: Introduction to Differential Privacy
Pratik: Differential Privacy: Differential privacy (https://www.comp.nus.edu.sg/~tankl/cs5322/readings/dwork.pdf)
Deepali: Differential Privacy in Machine Learning: Deep learning with differential privacy (https://arxiv.org/abs/1607.00133)
Summary of the course in G-Doc: Swathi
Topic 3: Privacy Risks in Foundation Models
Swathi: Data extraction from Language Models: Extracting training data from large language models (https://www.usenix.org/system/files/sec21-carlini-extracting.pdf)
Yashodhara: Date extraction from Diffusion Models: Extracting training data from diffusion models (https://www.usenix.org/system/files/usenixsecurity23-carlini.pdf)
Summary of the course in G-Doc: Anupam
Topic 4: Privately Pre-Training Diffusion Models
Jyotika: Pre-training using semantics: PrivImage: Differentially private synthetic image generation using diffusion models with semantic-aware pretraining (https://www.usenix.org/system/files/usenixsecurity24-li-kecen.pdf)
Xicheng: Pre-training using advanced noise composition: dp-promise: Differentially private diffusion probabilistic models for image synthesis (https://www.usenix.org/system/files/sec24fall-prepub-1157-wang-haichen.pdf)
Summary of the course in G-Doc: Deepali
Topic 5: Privately Fine-Tuning Diffusion Models
Jyotika: Low: rank fine-tuning: Differentially Private Fine-Tuning of Diffusion Models (https://arxiv.org/pdf/2406.01355)
Xicheng: DP for useful synthetic images: Differentially private diffusion models generate useful synthetic images (https://arxiv.org/pdf/2302.13861)
Summary of the course in G-Doc: Simon
Topic 6: Privately Training Large Language Models
Nupur: Private Pretraining: Large-scale differentially private BERT (https://arxiv.org/pdf/2108.01624)
Srushti: Private Fine-Tuning: Large language models can be strong differentially private learners (https://arxiv.org/pdf/2110.05679)
Summary of the course in G-Doc: Swathi
Topic 7: Other Private Language Model Adaptations
Anupam: Private Low Rank Training: Differentially private fine-tuning of language models (https://arxiv.org/pdf/2110.06500)
Simon: Private Prompting: Flocks of stochastic parrots: Differentially private prompt learning for large language models (https://proceedings.neurips.cc/paper_files/paper/2023/file/f26119b4ffe38c24d97e4c49d334b99e-Paper-Conference.pdf)
Summary of the course in G-Doc: Deepali
Topic 8: Differential Privacy Auditing
Antoni: Efficient Audits: Privacy auditing with one (1) training run (https://proceedings.neurips.cc/paper_files/paper/2023/file/9a6f6e0d6781d1cb8689192408946d73-Paper-Conference.pdf)
Pratik: Baysian Estimation: Bayesian Estimation of Differential Privacy (https://proceedings.mlr.press/v202/zanella-beguelin23a/zanella-beguelin23a.pdf)
Summary of the course in G-Doc: Anupam
Topic 9: Problems and Open Research Directions in Privacy-Preserving Machine Learning in Foundation Models
Yashodhara: Problems in private LLMs: What does it mean for a language model to preserve privacy? (https://dl.acm.org/doi/pdf/10.1145/3531146.3534642)
Srushti: Position on privacy in the pretrain-adapt paradigm: Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining (https://openreview.net/pdf?id=ncjhi4qAPV)
Summary of the course in G-Doc: Swathi
Peer Groups:
Always the two students who present on the same day form the peer group for that same day.
Questions:
Questions can be posed here: https://docs.google.com/document/d/1E6eKWuVaVsR_ywjWC6UwgnGEuCCy_NKnmwksEiiQlmE/edit?usp=sharing
All presentations should be uploaded here: https://drive.google.com/drive/folders/1Y_0qYuxn2nCYQdGde1TGNrTyTPe3-_G1?usp=sharing
Requirements and Deliverables:
This seminar is open to senior Bachelor, Masters, and doctoral students. Ideally, students should have a solid background in mathematics through the base lectures, and a strong interest in deep learning. Each student will present one or two topics during the seminar hours in the form of an oral presentation. In addition, each student will read the relevant papers for the other students’ presentations, and hand in a seminar paper at the end of the semester.