EA - AI companies are not on track to secure model weights by Jeffrey Ladish

The Nonlinear Library

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

1M ago 28:25

MP3•Episode home

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI companies are not on track to secure model weights, published by Jeffrey Ladish on July 19, 2024 on The Effective Altruism Forum. This post is a write-up of my talk from EA Global: Bay Area 2024, which has been lightly edited for clarity. Speaker background Jeffrey is the Executive Director of Palisade Research, a nonprofit that studies AI capabilities to better understand misuse risks from current systems and how advances in hacking, deception, and persuasion will affect the risk of catastrophic AI outcomes. Palisade is also creating concrete demonstrations of dangerous capabilities to advise policymakers and the public of the risks from AI. Introduction and context Do you want the good news first or the bad news? The bad news is what my talk's title says: I think AI companies are not currently on track to secure model weights. The good news is, I don't think we have to solve any new fundamental problems in science in order to solve this problem. Unlike AI alignment, I don't think that we have to go into territory that we've never gotten to before. I think this is actually one of the most tractable problems in the AI safety space. So, even though I think we're not on track and the problem is pretty bad, it's quite solvable. That's exciting, right? I'm going to talk about how difficult I think it is to secure companies or projects against attention from motivated, top state actors. I'm going to talk about what I think the consequences of failing to do so are. And then I'm going to talk about the so-called incentive problem, which is, I think, one of the reasons why this is so thorny. Then, let's talk about solutions. I think we can solve it, but it's going to take some work. I was already introduced, so I don't need to say much about that. I was previously at Anthropic working on the security team. I have some experience working to defend AI companies, although much less than some people in this room. And while I'm going to talk about how I think we're not yet there, I want to be super appreciative of all the great people working really hard on this problem already - people at various companies such as RAND and Pattern Labs. I want to give a huge shout out to all of them. So, a long time ago - many, many years ago in 2022 [audience laughs] - I wrote a post with Lennart Heim on the EA Forum asking, "What are the big problems information security might help solve?" One we talked about is this core problem of how to secure companies from attention from state actors. At the time, Ben Mann and I were the only security team members at Anthropic, and we were part time. I was working on field-building to try to find more people working in this space. Jarrah was also helping me. And there were a few more people working on this, but that was kind of it. It was a very nervous place to be emotionally. I was like, "Oh man, we are so not on track for this. We are so not doing well." Note from Jeffrey: I left Anthropic in 2022, and I gave this talk in Feb 2024, ~5 months ago. My comments about Anthropic here reflect my outside understanding at the time and don't include recent developments on security policy. Here's how it's going now. RAND is now doing a lot of work to try to map out what is really required in this space. The security team at Anthropic is now a few dozen people, with Jason Clinton leading the team. He has a whole lot of experience at Google. So, we've gone from two part-time people to a few dozen people - and that number is scheduled to double soon. We've already made a tremendous amount of progress on this problem. Also, there's a huge number of events happening. At DEF CON, we had about 100 people and Jason ran a great reading group to train security engineers. In general, there have been a lot more people coming to me and coming to 80,000 Hours really intere...

2430 episodes

#Podcasting Education #The Nonlinear Fund