News

Presentation Schedule

Written on 27.10.25 by Ziqing Yang

Dear all,


After receiving your responses, we have arranged a schedule for you to give the presentations.

Starting from November 4th, every Tuesday from 2 pm to 3 pm, we will have two presenters introduce their preferred papers.

 


04.11.2025        
Rishika Kumari, PLeak: Prompt… Read more

Dear all,


After receiving your responses, we have arranged a schedule for you to give the presentations.

Starting from November 4th, every Tuesday from 2 pm to 3 pm, we will have two presenters introduce their preferred papers.

 


04.11.2025        
Rishika Kumari, PLeak: Prompt Leaking Attacks against Large Language Model Applications
Prachi Sajwan, I Don't Know If We're Doing Good. I Don't Know If We're Doing Bad: Investigating How Practitioners Scope, Motivate, and Conduct Privacy Work When Developing AI Products        

11.11.2025        
Manu Vyshnavam Viswakarmav, Unveiling Privacy Risks in LLM Agent Memory
Ansu Varghese, Do Anything Now: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
        
18.11.2025        
Tianze Chang, Universal and Transferable Adversarial Attacks on Aligned Language Models
Farzaneh Soltanzadeh, JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
        
25.11.2025        
Syed Usfar Wasim, Benchmarking and Defending against Indirect Prompt Injection Attacks on Large Language Models
Shreya Atul Kolhapure, Formalizing and Benchmarking Prompt Injection Attacks and Defenses
        
02.12.2025        
Tarik Kemal Gundogdu, From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language Models
Mengfei Liang, Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
        
09.12.2025        
Elena Bondarevskaya, On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
Xinyu Zhang, HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
        
16.12.2025        
Daniyal Azfar, Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Shaun Paul, Societal Alignment Frameworks Can Improve LLM Alignment


Best,
Ziqing

Paper List Available

Written on 23.10.25 by Ziqing Yang

Dear all,

The paper list is online. Please select three papers (ranked by preference) and send them to Ziqing Yang (ziqing.yang@cispa.de) by 24.10.2025.

Note that the assignment will be based on the first-come, first-served principle.

The assignment will be announced at 11 am on… Read more

Dear all,

The paper list is online. Please select three papers (ranked by preference) and send them to Ziqing Yang (ziqing.yang@cispa.de) by 24.10.2025.

Note that the assignment will be based on the first-come, first-served principle.

The assignment will be announced at 11 am on 27.10.2025.

Best,

Ziqing

 


Paper List

  1. Membership Inference Attacks Against In-Context Learning
  2. PLeak: Prompt Leaking Attacks against Large Language Model Applications
  3. Towards label-only membership inference attack against pre-trained large language models
  4. "I Don't Know If We're Doing Good. I Don't Know If We're Doing Bad": Investigating How Practitioners Scope, Motivate, and Conduct Privacy Work When Developing AI Products
  5. Unveiling Privacy Risks in LLM Agent Memory
  6. Privacy Backdoors: Stealing Data with Corrupted Pretrained Models
  7. Black-box Membership Inference Attacks against Fine-tuned Diffusion Models
  8. JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
  9. Universal and Transferable Adversarial Attacks on Aligned Language Models
  10. "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
  11. Jailbreak in pieces: Compositional adversarial attacks on multi-modal language model
  12. Benchmarking and Defending against Indirect Prompt Injection Attacks on Large Language Models
  13. Formalizing and Benchmarking Prompt Injection Attacks and Defenses
  14. Instruction Backdoor Attacks Against Cutomized LLMs
  15. Prompt Stealing Attacks Against Text-to-Image Generation Models
  16. AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
  17. Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
  18. From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language Models
  19. HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
  20. Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
  21. UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
  22. Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities
  23. Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications
  24. On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
  25. Safety Alignment Should Be Made More Than Just a Few Tokens Deep
  26. Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling
  27. Learning Safety Constraints for Large Language Models
  28. Antidote: Post‑fine‑tuning Safety Alignment for LLMs against Harmful Fine‑tuning
  29. Societal Alignment Frameworks Can Improve LLM Alignment
  30. One-Shot Safety Alignment for Large Language Models via Optimal Dualization
  31. Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack
  32. SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
     

AI Safety

As AI systems become increasingly powerful and integrated into critical aspects of society, ensuring they behave safely and reliably has never been more important. AI Safety is the interdisciplinary field focused on minimizing risks associated with AI, from algorithmic bias and system failures to the long-term challenges posed by advanced autonomous agents.

In this seminar, we will explore the key technical, ethical, and societal issues related to AI safety. Topics include value alignment, robustness, and the governance of powerful AI systems. By the end of the seminar, students will gain a foundational understanding of how to assess and mitigate risks, design safer AI systems, and contribute to responsible AI development.

 

Logistics:

Time: Tuesday 2pm - 4pm

Location: TBD

TAs:

  • Ziqing Yang (ziqing.yang@cispa.de)
  • Yihan Ma
  • Bo Shao

List of Papers

Privacy Policy | Legal Notice
If you encounter technical problems, please contact the administrators.