Alan Chan And Max Kauffman on Model Evaluations, Coordination and AI Safety

The Inside View

Content provided by Michaël Trazzi. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Michaël Trazzi or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

1+ y ago 1:13:15

M4A•Episode home

Max Kaufmann⁠ and Alan Chan discuss the evaluation of large language models, AI Governance and more generally the impact of the deployment of foundational models. is currently a Research Assistant to Owain Evans, mainly thinking about (and fixing) issues that might arise as we scale up our current ML systems, but also interested in issues arising from multi-agent failures and situational awareness.

Alan is PhD student at Mila advised by Nicolas Le Roux, with a strong interest in AI Safety, AI Governance and coordination. He has also recently been working with David Krueger and helped me with some of the interviews that have been published recently (ML Street talk and Christoph Schuhmann). Disclaimer: this discussion is much more casual than the rest of the conversations in this podcast. This was completely impromptu: I just thought it would be interesting to have Max and Alan discuss model evaluations (also called “evals” for short), since they are both interested in the topic. Transcript: https://heinsideview.ai/alan_and_max

Youtube: https://youtu.be/BOLxeR_culU

Outline

(0:00:00) Introduction
(0:01:16) LLMs Translating To Systems In The Future Is Confusing
(0:03:23) Evaluations Should Measure Actions Instead of Asking Yes or No Questions
(0:04:17) Identify Key Contexts for Dangerous Behavior to Write Concrete Evals
(0:07:29) Implicit Optimization Process Affects Evals and Benchmarks
(0:08:45) Passing Evals Doesn't Guarantee Safety
(0:09:41) Balancing Technical Evals With Social Governance
(0:11:00) Evaluations Must Be Convincing To Influence AI Development
(0:12:04) Evals Might Convince The AI Safety Community But Not People in FAccT
(0:13:21) Difficulty In Explaining AI Risk To Other Communities
(0:14:19) Both Existential Safety And Fairness Are Important
(0:15:14) Reasons Why People Don't Care About AI Existential Risk
(0:16:10) The Association Between Sillicon Valley And People in FAccT
(0:17:39) Timelines And RL Understanding Might Impact The Perception Existential Risk From AI
(0:19:01) Agentic Models And Longtermism Hinder AI Safety Awareness
(0:20:17) The Focus On Immediate AI Harms Might Be A Rejection Of Speculative Claims
(0:21:50) Is AI Safety A Pascal Mugging
(0:23:15) Believing In The Deployment Of Large Foundational Models Should Be Enough To Start Worrying
(0:25:38) AI Capabilities Becomign More Evident to the Public Might Not Be Enough
(0:27:27) Addressing Generalization and Reward Specification in AI
(0:27:59) Evals as an Additional Layer of Security in AI Safety
(0:28:41) A Portfolio Approach to AI Alignment and Safety
(0:29:03) Imagine Alignment Is Solved In 2040, What Made It Happen?
(0:33:04) AGI Timelines Are Uncertain And Anchored By Vibes
(0:35:24) What Matters Is Agency, Strategical Awareness And Planning
(0:37:15) Alignment Is A Public Good, Coordination Is Difficult
(0:06:48) Dignity As AN Useful Heuristic In The Face Of Doom
(0:42:28) What Will Society Look Like If We Actually Get Superintelligent Gods
(0:45:41) Uncertainty About Societal Dynamics Affecting Long-Term Future With AGI
(0:47:42) Biggest Frustration With The AI Safety Community
(0:48:34) AI Safety Includes Addressing Negative Consequences of AI
(0:50:41) Frustration: Lack of Bridge Building Between AI Safety and Fairness Communities
(0:53:07) Building Bridges by Attending Conferences and Understanding Different Perspectives
(0:56:02) AI Systems with Weird Instrumental Goals Pose Risks to Society
(0:58:43) Advanced AI Systems Controlling Resources Could Magnify Suffering
(1:00:24) Cooperation Is Crucial to Achieve Pareto Optimal Outcomes and Avoid Global Catastrophes
(1:01:54) Alan's Origin Story
(1:02:47) Alan's AI Safety Research Is Driven By Desire To Reduce Suffering And Improve Lives
(1:04:52) Diverse Interests And Concern For Global Problems Led To AI Safety Research
(1:08:46) The Realization Of The Potential Dangers Of AGI Motivated AI Safety Work
(1:10:39) What is Alan Chan Working On At The Moment

55 episodes

#Tech #Michaël Trazzi