News
Currently, no news are available
Opportunities and Risks of Large Language Models and Foundation Models
The advent of Large Language Models (e.g. ChatGPT) and other foundation models (e.g. stable diffusion) has and will contintue to change the way to AI/ML applications are developed and deployed.
On the one hand, these models show unprecedented performance and can often be adapted to new tasks with little effort. In particular, large language models like ChatGPT have the potential to change the way we implement and deploy functionality.
On the other hand, these models raise several questions related to safety, security and general aspects of trustworthiness, that urgently need to be address to comply with ou high expectations on future AI systems.
Therefore, this seminar will investigate aspects of trustworthiness, security, safety, privacy, robustness, intellectual property.
This is a lecture in the context of the ELSA - European Lighthouse on Secure and Safe AI: https://elsa-ai.eu
Preliminary List of Topics and Papers:
-
Models/technology
-
Language Models are Few-Shot Learners
-
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei
https://arxiv.org/abs/2005.14165
-
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei
https://arxiv.org/abs/2001.08361
-
Alpaca: A Strong, Replicable Instruction-Following Model (Once it's published)
Rohan Taori* and Ishaan Gulrajani* and Tianyi Zhang* and Yann Dubois* and Xuechen Li* and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto -
GPT-4 systems card
https://cdn.openai.com/papers/gpt-4-system-card.pdf
https://arxiv.org/pdf/2303.08774.pdf
-
Vision-language models:
-
Flamingo: a Visual Language Model for Few-Shot Learning
-
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, Karen Simonyan
https://arxiv.org/abs/2204.14198
-
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
https://arxiv.org/pdf/2103.00020.pdf
-
Verbs in Action: Improving verb understanding in video-language models
Liliane Momeni, Mathilde Caron, Arsha Nagrani, Andrew Zisserman, Cordelia Schmid
https://arxiv.org/abs/2304.06708
-
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski
https://arxiv.org/abs/2304.07193
-
Zero-Shot Text-to-Image Generation
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever
https://arxiv.org/abs/2102.12092
-
Denoising Diffusion Probabilistic Models
Jonathan Ho, Ajay Jain, Pieter Abbeel
https://arxiv.org/abs/2006.11239
-
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi
https://arxiv.org/abs/2205.11487
-
RLHF (https://huggingface.co/blog/rlhf) and alternative methods
-
Training language models to follow instructions with human feedback
-
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe
-
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, Jared Kaplan
https://arxiv.org/pdf/2212.08073.pdf
-
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Ben Mann, Jared Kaplan
https://arxiv.org/abs/2204.05862
-
Compositional complexity/plugins:
- Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom
https://arxiv.org/abs/2302.04761
-
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
Yongliang Shen∗ , Kaitao Song∗ , Xu Tan , Dongsheng Li , Weiming Lu , Yueting Zhuang
https://arxiv.org/abs/2303.17580
- AGI debate (Artificial General Intelligence)
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
-
https://arxiv.org/abs/2206.04615
-
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang
https://arxiv.org/abs/2303.12712 -
Emergent Abilities of Large Language Models
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus
https://arxiv.org/pdf/2206.07682.pdf
- Alignment/divergence
-
Discovering Language Model Behaviors with Model-Written Evaluations
-
Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemí Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Jack Clark, Samuel R. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, Jared Kaplan
https://arxiv.org/abs/2212.09251
-
The Alignment Problem from a Deep Learning Perspective
Richard Ngo, Lawrence Chan, Sören Mindermann
https://arxiv.org/abs/2209.00626
-
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale
https://arxiv.org/abs/2303.05453
-
Measuring Progress on Scalable Oversight for Large Language Models
Samuel R. Bowman, Jeeyoon Hyun, Ethan Perez, Edwin Chen, Craig Pettit, Scott Heiner, Kamilė Lukošiūtė, Amanda Askell, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Christopher Olah, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Jackson Kernion, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Liane Lovitt, Nelson Elhage, Nicholas Schiefer, Nicholas Joseph, Noemí Mercado, Nova DasSarma, Robin Larson, Sam McCandlish, Sandipan Kundu, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Ben Mann, Jared Kaplan
https://arxiv.org/abs/2211.03540
-
Pretraining Language Models with Human Preferences
Tomasz Korbak, Kejian Shi, Angelica Chen, Rasika Bhalerao, Christopher L. Buckley, Jason Phang, Samuel R. Bowman, Ethan Perez
https://arxiv.org/pdf/2302.08582.pdf
-
Fundamental Limitations of Alignment in Large Language Models
Yotam Wolf, Noam Wies, Yoav Levine, Amnon Shashua
https://arxiv.org/abs/2304.11082
-
Security and Safety
-
Ignore Previous Prompt: Attack Techniques For Language Models
-
Fábio Perez, Ian Ribeiro
https://arxiv.org/abs/2211.09527
-
More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz
https://arxiv.org/abs/2302.12173
-
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei, Tom Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, Jack Clark
https://arxiv.org/abs/2209.07858
-
Prompting
-
Diffusion models:
-
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery
-
-
Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, Tom Goldstein
https://arxiv.org/abs/2302.03668
-
Few/zero shot prompting
-
What learning algorithm is in-context learning? Investigations with linear models
-
Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou
https://arxiv.org/pdf/2211.15661.pdf
-
Overthinking the Truth: Understanding how Language Models process False Demonstrations
Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt
https://openreview.net/forum?id=em4xg1Gvxa
-
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa
https://arxiv.org/abs/2205.11916
-
Calibrate Before Use: Improving Few-Shot Performance of Language Models
Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh
http://proceedings.mlr.press/v139/zhao21c/zhao21c.pdf
-
Larger language models do in-context learning differently
Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma
https://arxiv.org/abs/2303.03846
-
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer
https://arxiv.org/abs/2202.12837
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, Denny Zhou
https://arxiv.org/abs/2201.11903
-
ReAct: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
https://arxiv.org/abs/2210.03629
-
Ask Me Anything: A simple strategy for prompting language models
Simran Arora, Avanika Narayan, Mayee Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Christopher Ré
https://openreview.net/forum?id=bhUPJnS2g0X
-
Moderation
-
Red-Teaming the Stable Diffusion Safety Filter
-
Javier Rando, Daniel Paleka, David Lindner, Lennart Heim, Florian Tramèr
https://arxiv.org/abs/2210.04610
-
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia, Tatsunori Hashimoto
https://arxiv.org/abs/2302.05733
-
LLMs as programming paradigm/programming in plain English
-
Large Language Models Are Human-Level Prompt Engineers
-
Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba
https://arxiv.org/abs/2211.01910
-
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
Laria Reynolds, Kyle McDonell
https://arxiv.org/abs/2102.07350
-
GPT is becoming a Turing machine: Here are some ways to program it
Ana Jojic, Zhen Wang, Nebojsa Jojic
https://arxiv.org/abs/2303.14310
-
Prompting Is Programming: A Query Language For Large Language Models
Luca Beurer-Kellner, Marc Fischer, Martin Vechev
https://arxiv.org/abs/2212.06094
-
Training data memorization and intellectual property violation
-
Extracting Training Data from Diffusion Models
-
Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace
https://arxiv.org/pdf/2301.13188.pdf
-
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, Tom Goldstein
https://arxiv.org/abs/2212.03860
-
Quantifying Memorization Across Neural Language Models
Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, Chiyuan Zhang
https://arxiv.org/abs/2202.07646
-
Extracting Training Data from Large Language Models
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel
https://arxiv.org/abs/2012.07805
- LLM interpretability/probing/auditing
-
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
-
Kevin Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt
https://arxiv.org/abs/2211.00593
-
Eliciting Latent Predictions from Transformers with the Tuned Lens
Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt
https://arxiv.org/abs/2303.08112
-
Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt
https://arxiv.org/abs/2212.03827
-
Automatically Auditing Large Language Models via Discrete Optimization
Erik Jones, Anca Dragan, Aditi Raghunathan, Jacob Steinhardt
https://arxiv.org/abs/2303.04381
-
Scaling Laws and Interpretability of Learning from Repeated Data
Danny Hernandez, Tom Brown, Tom Conerly, Nova DasSarma, Dawn Drain, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Tom Henighan, Tristan Hume, Scott Johnston, Ben Mann, Chris Olah, Catherine Olsson, Dario Amodei, Nicholas Joseph, Jared Kaplan, Sam McCandlish
https://arxiv.org/abs/2205.10487
- Privacy:
-
Considerations for Differentially Private Learning with Large-Scale Public Pretraining
-
Florian Tramèr, Gautam Kamath, Nicholas Carlini
https://arxiv.org/pdf/2212.06470.pdf
-
Large Language Models Can Be Strong Differentially Private Learners
Xuechen Li, Florian Tramèr, Percy Liang, Tatsunori Hashimoto
https://arxiv.org/abs/2110.05679
-
Detection and Watermarking:
-
Can AI-Generated Text be Reliably Detected?
-
Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, Soheil Feizi
https://arxiv.org/pdf/2303.11156.pdf
-
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn
https://arxiv.org/abs/2301.11305
-
Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense
Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, Mohit Iyyer
https://arxiv.org/abs/2303.13408
-
A Watermark for Large Language Models
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein
https://arxiv.org/abs/2301.10226
-
Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding
Sahar Abdelnabi; Mario Fritz
https://arxiv.org/pdf/2009.03015.pdf
-
Poisoning
-
Poisoning Language Models During Instruction Tuning
-
Alexander Wan, Eric Wallace, Sheng Shen, Dan Klein
https://arxiv.org/abs/2305.00944
-
TrojanPuzzle: Covertly Poisoning Code-Suggestion Models
Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fernandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, Robert Sim
https://arxiv.org/abs/2301.02344
-
Poisoning and Backdooring Contrastive Learning
Nicholas Carlini, Andreas Terzis
https://arxiv.org/abs/2106.09667
-
Poisoning Web-Scale Training Datasets is Practical
Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, Florian Tramèr
https://arxiv.org/abs/2302.10149
-
Factual correctness/incorrectness
-
TruthfulQA: Measuring How Models Mimic Human Falsehoods
-
Stephanie Lin, Jacob Hilton, Owain Evans
https://arxiv.org/abs/2109.07958
-
Evaluating Verifiability in Generative Search Engines
Nelson F. Liu, Tianyi Zhang, Percy Liang
https://arxiv.org/abs/2304.09848
-
Biases/toxicity
-
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale
-
Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan
https://arxiv.org/abs/2211.03759
-
Toxicity in ChatGPT: Analyzing Persona-assigned Language Models
Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan
https://arxiv.org/pdf/2304.05335.pdf
- Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots
Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, Yang Zhang
https://arxiv.org/pdf/2209.03463.pdf
-
Towards Understanding and Mitigating Social Biases in Language Models
Paul Pu Liang, Chiyu Wu, Louis-Philippe Morency, Ruslan Salakhutdinov
http://proceedings.mlr.press/v139/liang21a/liang21a.pdf
-
Whose Opinions Do Language Models Reflect?
Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto
https://arxiv.org/abs/2303.17548
-
Self-evaluation/refinement/uncertainty estimation
-
Self-Refine: Iterative Refinement with Self-Feedback
-
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Sean Welleck, Bodhisattwa Prasad Majumder, Shashank Gupta, Amir Yazdanbakhsh, Peter Clark
https://arxiv.org/abs/2303.17651
-
Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models
Kaitlyn Zhou, Dan Jurafsky, Tatsunori Hashimoto
https://arxiv.org/abs/2302.13439
-
Language Models (Mostly) Know What They Know
Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom Brown, Jack Clark, Nicholas Joseph, Ben Mann, Sam McCandlish, Chris Olah, Jared Kaplan
https://arxiv.org/abs/2207.05221
-
Teaching Models to Express Their Uncertainty in Words
Stephanie Lin, Jacob Hilton, Owain Evans
https://arxiv.org/abs/2205.14334
-
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli, Amanda Askell, Nicholas Schiefer, Thomas I. Liao, Kamilė Lukošiūtė, Anna Chen, Anna Goldie, Azalia Mirhoseini, Catherine Olsson, Danny Hernandez, Dawn Drain, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jackson Kernion, Jamie Kerr, Jared Mueller, Joshua Landau, Kamal Ndousse, Karina Nguyen, Liane Lovitt, Michael Sellitto, Nelson Elhage, Noemi Mercado, Nova DasSarma, Oliver Rausch, Robert Lasenby, Robin Larson, Sam Ringer, Sandipan Kundu, Saurav Kadavath, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, Christopher Olah, Jack Clark, Samuel R. Bowman, Jared Kaplan
https://arxiv.org/abs/2302.07459
-
Novel Application of Foundation Models:
-
Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine
-
Peter Lee, Sebastien Bubeck, Joseph Petro
https://www.nejm.org/doi/pdf/10.1056/NEJMsr2214184
-
Emergent autonomous scientific research capabilities of large language models
Daniil A. Boiko, Robert MacKnight, Gabe Gomes
https://arxiv.org/abs/2304.05332
-
Can Large Language Models Transform Computational Social Science?
Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, Diyi Yang
https://calebziems.com/assets/pdf/preprints/css_chatgpt.pdf
-
Generative Agents: Interactive Simulacra of Human Behavior
Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein
https://arxiv.org/abs/2304.03442
-
Outlook
-
Ecosystem Graphs: The Social Footprint of Foundation Models
-
Rishi Bommasani, Dilara Soylu, Thomas I. Liao, Kathleen A. Creel, Percy Liang
https://arxiv.org/abs/2303.15772
-
On the Opportunities and Risks of Foundation Models
https://arxiv.org/abs/2108.07258
-
On the Dangers of Stochastic Parrots
Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell
https://dl.acm.org/doi/10.1145/3442188.3445922
-
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
Emily M. Bender, Alexander Koller
https://aclanthology.org/2020.acl-main.463.pdf
-
Tools/websites/resources