News

Currently, no news are available

Opportunities and Risks of Large Language Models and Foundation Models

The advent of Large Language Models (e.g. ChatGPT) and other foundation models (e.g. stable diffusion) has and will contintue to change the way to AI/ML applications are developed and deployed.

On the one hand, these models show unprecedented performance and can often be adapted to new tasks with little effort. In particular, large language models like ChatGPT have the potential to change the way we implement and deploy functionality.

On the other hand, these models raise several questions related to safety, security and general aspects of trustworthiness, that urgently need to be address to comply with ou high expectations on future AI systems.

Therefore, this seminar will investigate aspects of trustworthiness, security, safety, privacy, robustness, intellectual property.

This is a lecture in the context of the ELSA - European Lighthouse on Secure and Safe AI: https://elsa-ai.eu

Preliminary List of Topics and Papers:

  • Models/technology

    • Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei

https://arxiv.org/abs/2005.14165 

  • Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

https://arxiv.org/abs/2001.08361 

  • Alpaca: A Strong, Replicable Instruction-Following Model (Once it's published) 
    Rohan Taori* and Ishaan Gulrajani* and Tianyi Zhang* and Yann Dubois* and Xuechen Li* and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto

  • GPT-4 systems card

https://cdn.openai.com/papers/gpt-4-system-card.pdf

https://arxiv.org/pdf/2303.08774.pdf

  • Vision-language models:

    • Flamingo: a Visual Language Model for Few-Shot Learning

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, Karen Simonyan

https://arxiv.org/abs/2204.14198 

  • Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

https://arxiv.org/pdf/2103.00020.pdf 

  • Verbs in Action: Improving verb understanding in video-language models

Liliane Momeni, Mathilde Caron, Arsha Nagrani, Andrew Zisserman, Cordelia Schmid

https://arxiv.org/abs/2304.06708 

  • DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski

https://arxiv.org/abs/2304.07193 

  • Zero-Shot Text-to-Image Generation

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever

https://arxiv.org/abs/2102.12092 

  • Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, Pieter Abbeel

https://arxiv.org/abs/2006.11239 

  • Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

https://arxiv.org/abs/2205.11487 

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe

https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf 

  • Constitutional AI: Harmlessness from AI Feedback 

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, Jared Kaplan

https://arxiv.org/pdf/2212.08073.pdf 

  • Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Ben Mann, Jared Kaplan

https://arxiv.org/abs/2204.05862 

  • Compositional complexity/plugins:

    • Toolformer: Language Models Can Teach Themselves to Use Tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom

https://arxiv.org/abs/2302.04761

  • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace

Yongliang Shen∗ , Kaitao Song∗ , Xu Tan , Dongsheng Li , Weiming Lu , Yueting Zhuang

https://arxiv.org/abs/2303.17580 

  • AGI debate (Artificial General Intelligence)
    • Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

https://arxiv.org/abs/2206.04615 

  • Sparks of Artificial General Intelligence: Early experiments with GPT-4
    Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang
    https://arxiv.org/abs/2303.12712

  • Emergent Abilities of Large Language Models 

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

https://arxiv.org/pdf/2206.07682.pdf

  • Alignment/divergence
    • Discovering Language Model Behaviors with Model-Written Evaluations  

Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemí Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Jack Clark, Samuel R. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, Jared Kaplan

https://arxiv.org/abs/2212.09251

  • The Alignment Problem from a Deep Learning Perspective  

Richard Ngo, Lawrence Chan, Sören Mindermann

https://arxiv.org/abs/2209.00626

  • Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback 

Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

https://arxiv.org/abs/2303.05453

  • Measuring Progress on Scalable Oversight for Large Language Models

Samuel R. Bowman, Jeeyoon Hyun, Ethan Perez, Edwin Chen, Craig Pettit, Scott Heiner, Kamilė Lukošiūtė, Amanda Askell, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Christopher Olah, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Jackson Kernion, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Liane Lovitt, Nelson Elhage, Nicholas Schiefer, Nicholas Joseph, Noemí Mercado, Nova DasSarma, Robin Larson, Sam McCandlish, Sandipan Kundu, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Ben Mann, Jared Kaplan

https://arxiv.org/abs/2211.03540 

  • Pretraining Language Models with Human Preferences

Tomasz Korbak, Kejian Shi, Angelica Chen, Rasika Bhalerao, Christopher L. Buckley, Jason Phang, Samuel R. Bowman, Ethan Perez

https://arxiv.org/pdf/2302.08582.pdf 

  • Fundamental Limitations of Alignment in Large Language Models

Yotam Wolf, Noam Wies, Yoav Levine, Amnon Shashua

https://arxiv.org/abs/2304.11082 

  • Security and Safety

    • Ignore Previous Prompt: Attack Techniques For Language Models

Fábio Perez, Ian Ribeiro

https://arxiv.org/abs/2211.09527 

  • More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz
https://arxiv.org/abs/2302.12173  

  • Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned 

Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei, Tom Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, Jack Clark

https://arxiv.org/abs/2209.07858 

  • Prompting

    • Diffusion models:

      • Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery

Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, Tom Goldstein

https://arxiv.org/abs/2302.03668  

  • Few/zero shot prompting

    • What learning algorithm is in-context learning? Investigations with linear models

Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou

https://arxiv.org/pdf/2211.15661.pdf 

  • Overthinking the Truth: Understanding how Language Models process False Demonstrations 

Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt

https://openreview.net/forum?id=em4xg1Gvxa 

  • Large Language Models are Zero-Shot Reasoners

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa

https://arxiv.org/abs/2205.11916 

  • Calibrate Before Use: Improving Few-Shot Performance of Language Models

Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh

http://proceedings.mlr.press/v139/zhao21c/zhao21c.pdf 

  • Larger language models do in-context learning differently

Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

https://arxiv.org/abs/2303.03846 

  • Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer

https://arxiv.org/abs/2202.12837 

  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, Denny Zhou

https://arxiv.org/abs/2201.11903 

  • ReAct: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao 

https://arxiv.org/abs/2210.03629 

  • Ask Me Anything: A simple strategy for prompting language models

Simran Arora, Avanika Narayan, Mayee Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Christopher Ré

https://openreview.net/forum?id=bhUPJnS2g0X

  • Moderation 

    • Red-Teaming the Stable Diffusion Safety Filter

Javier Rando, Daniel Paleka, David Lindner, Lennart Heim, Florian Tramèr

https://arxiv.org/abs/2210.04610 

  • Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks

Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia, Tatsunori Hashimoto

https://arxiv.org/abs/2302.05733 

  • LLMs as programming paradigm/programming in plain English

    • Large Language Models Are Human-Level Prompt Engineers

Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba

https://arxiv.org/abs/2211.01910 

  • Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm

Laria Reynolds, Kyle McDonell

https://arxiv.org/abs/2102.07350 

  • GPT is becoming a Turing machine: Here are some ways to program it

Ana Jojic, Zhen Wang, Nebojsa Jojic

https://arxiv.org/abs/2303.14310 

  • Prompting Is Programming: A Query Language For Large Language Models

Luca Beurer-Kellner, Marc Fischer, Martin Vechev

https://arxiv.org/abs/2212.06094 

  • Training data memorization and intellectual property violation 

    • Extracting Training Data from Diffusion Models 

Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace

https://arxiv.org/pdf/2301.13188.pdf

  • Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, Tom Goldstein

https://arxiv.org/abs/2212.03860

  • Quantifying Memorization Across Neural Language Models

Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, Chiyuan Zhang

https://arxiv.org/abs/2202.07646 

  • Extracting Training Data from Large Language Models 

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel

https://arxiv.org/abs/2012.07805   

  • LLM interpretability/probing/auditing 
    • Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Kevin Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt

https://arxiv.org/abs/2211.00593 

  • Eliciting Latent Predictions from Transformers with the Tuned Lens  

Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

https://arxiv.org/abs/2303.08112 

  • Discovering Latent Knowledge in Language Models Without Supervision

Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt

https://arxiv.org/abs/2212.03827 

  • Automatically Auditing Large Language Models via Discrete Optimization

Erik Jones, Anca Dragan, Aditi Raghunathan, Jacob Steinhardt

 https://arxiv.org/abs/2303.04381

  • Scaling Laws and Interpretability of Learning from Repeated Data

Danny Hernandez, Tom Brown, Tom Conerly, Nova DasSarma, Dawn Drain, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Tom Henighan, Tristan Hume, Scott Johnston, Ben Mann, Chris Olah, Catherine Olsson, Dario Amodei, Nicholas Joseph, Jared Kaplan, Sam McCandlish

https://arxiv.org/abs/2205.10487 

  • Privacy:
    • Considerations for Differentially Private Learning with Large-Scale Public Pretraining

Florian Tramèr, Gautam Kamath, Nicholas Carlini

 https://arxiv.org/pdf/2212.06470.pdf

  • Large Language Models Can Be Strong Differentially Private Learners

Xuechen Li, Florian Tramèr, Percy Liang, Tatsunori Hashimoto

https://arxiv.org/abs/2110.05679 

  • Detection and Watermarking: 

    • Can AI-Generated Text be Reliably Detected?

Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, Soheil Feizi

https://arxiv.org/pdf/2303.11156.pdf 

  • DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn 

https://arxiv.org/abs/2301.11305

  • Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense 

Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, Mohit Iyyer

https://arxiv.org/abs/2303.13408 

  • A Watermark for Large Language Models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein

https://arxiv.org/abs/2301.10226 

  • Poisoning 

    • Poisoning Language Models During Instruction Tuning

Alexander Wan, Eric Wallace, Sheng Shen, Dan Klein

https://arxiv.org/abs/2305.00944 

  • TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fernandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, Robert Sim

https://arxiv.org/abs/2301.02344 

  • Poisoning and Backdooring Contrastive Learning

Nicholas Carlini, Andreas Terzis

https://arxiv.org/abs/2106.09667 

  • Poisoning Web-Scale Training Datasets is Practical

Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, Florian Tramèr
https://arxiv.org/abs/2302.10149 

  • Factual correctness/incorrectness

    • TruthfulQA: Measuring How Models Mimic Human Falsehoods

Stephanie Lin, Jacob Hilton, Owain Evans

https://arxiv.org/abs/2109.07958 

  • Evaluating Verifiability in Generative Search Engines

Nelson F. Liu, Tianyi Zhang, Percy Liang

https://arxiv.org/abs/2304.09848

  • Biases/toxicity 

    • Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale

Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan

https://arxiv.org/abs/2211.03759 

  • Toxicity in ChatGPT: Analyzing Persona-assigned Language Models

Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan

https://arxiv.org/pdf/2304.05335.pdf 

  • Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots

Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, Yang Zhang

https://arxiv.org/pdf/2209.03463.pdf 

  • Towards Understanding and Mitigating Social Biases in Language Models

Paul Pu Liang, Chiyu Wu, Louis-Philippe Morency, Ruslan Salakhutdinov

http://proceedings.mlr.press/v139/liang21a/liang21a.pdf 

  • Whose Opinions Do Language Models Reflect?

Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto

https://arxiv.org/abs/2303.17548 

  • Self-evaluation/refinement/uncertainty estimation 

    • Self-Refine: Iterative Refinement with Self-Feedback

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Sean Welleck, Bodhisattwa Prasad Majumder, Shashank Gupta, Amir Yazdanbakhsh, Peter Clark

https://arxiv.org/abs/2303.17651

  • Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models

Kaitlyn Zhou, Dan Jurafsky, Tatsunori Hashimoto

https://arxiv.org/abs/2302.13439   

  • Language Models (Mostly) Know What They Know

Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom Brown, Jack Clark, Nicholas Joseph, Ben Mann, Sam McCandlish, Chris Olah, Jared Kaplan

https://arxiv.org/abs/2207.05221  

  • Teaching Models to Express Their Uncertainty in Words

Stephanie Lin, Jacob Hilton, Owain Evans

https://arxiv.org/abs/2205.14334 

  • The Capacity for Moral Self-Correction in Large Language Models

Deep Ganguli, Amanda Askell, Nicholas Schiefer, Thomas I. Liao, Kamilė Lukošiūtė, Anna Chen, Anna Goldie, Azalia Mirhoseini, Catherine Olsson, Danny Hernandez, Dawn Drain, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jackson Kernion, Jamie Kerr, Jared Mueller, Joshua Landau, Kamal Ndousse, Karina Nguyen, Liane Lovitt, Michael Sellitto, Nelson Elhage, Noemi Mercado, Nova DasSarma, Oliver Rausch, Robert Lasenby, Robin Larson, Sam Ringer, Sandipan Kundu, Saurav Kadavath, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, Christopher Olah, Jack Clark, Samuel R. Bowman, Jared Kaplan

https://arxiv.org/abs/2302.07459 

  • Novel Application of Foundation Models:

    • Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine

Peter Lee, Sebastien Bubeck, Joseph Petro

https://www.nejm.org/doi/pdf/10.1056/NEJMsr2214184 

  • Emergent autonomous scientific research capabilities of large language models

Daniil A. Boiko, Robert MacKnight, Gabe Gomes

https://arxiv.org/abs/2304.05332 

  • Can Large Language Models Transform Computational Social Science?

Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, Diyi Yang

https://calebziems.com/assets/pdf/preprints/css_chatgpt.pdf 

  • Generative Agents: Interactive Simulacra of Human Behavior

Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein

https://arxiv.org/abs/2304.03442

  • Outlook

    • Ecosystem Graphs: The Social Footprint of Foundation Models

Rishi Bommasani, Dilara Soylu, Thomas I. Liao, Kathleen A. Creel, Percy Liang

https://arxiv.org/abs/2303.15772 

  • On the Opportunities and Risks of Foundation Models

https://arxiv.org/abs/2108.07258 

  • On the Dangers of Stochastic Parrots

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell

https://dl.acm.org/doi/10.1145/3442188.3445922

  • Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

Emily M. Bender, Alexander Koller

https://aclanthology.org/2020.acl-main.463.pdf 

Privacy Policy | Legal Notice
If you encounter technical problems, please contact the administrators.