Artwork

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

LW - Claude 3.5 Sonnet by Zach Stein-Perlman

1:35
 
Share
 

Manage episode 424660626 series 3337129
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Claude 3.5 Sonnet, published by Zach Stein-Perlman on June 20, 2024 on LessWrong. we'll be releasing Claude 3.5 Haiku and Claude 3.5 Opus later this year. They made a mini model card. Notably: The UK AISI also conducted pre-deployment testing of a near-final model, and shared their results with the US AI Safety Institute . . . . Additionally, METR did an initial exploration of the model's autonomy-relevant capabilities. It seems that UK AISI only got maximally shallow access, since Anthropic would have said if not, and in particular it mentions "internal research techniques to acquire non-refusal model responses" as internal. This is better than nothing, but it would be unsurprising if an evaluator is unable to elicit dangerous capabilities but users - with much more time and with access to future elicitation techniques - ultimately are. Recall that DeepMind, in contrast, gave "external testing groups . . . . the ability to turn down or turn off safety filters." Anthropic CEO Dario Amodei gave Dustin Moskovitz the impression that Anthropic committed "to not meaningfully advance the frontier with a launch." (Plus Gwern, and others got this impression from Anthropic too.) Perhaps Anthropic does not consider itself bound by this, which might be reasonable - it's quite disappointing that Anthropic hasn't clarified its commitments, particularly after the confusion on this topic around the Claude 3 launch. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
  continue reading

1697 episodes

Artwork
iconShare
 
Manage episode 424660626 series 3337129
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Claude 3.5 Sonnet, published by Zach Stein-Perlman on June 20, 2024 on LessWrong. we'll be releasing Claude 3.5 Haiku and Claude 3.5 Opus later this year. They made a mini model card. Notably: The UK AISI also conducted pre-deployment testing of a near-final model, and shared their results with the US AI Safety Institute . . . . Additionally, METR did an initial exploration of the model's autonomy-relevant capabilities. It seems that UK AISI only got maximally shallow access, since Anthropic would have said if not, and in particular it mentions "internal research techniques to acquire non-refusal model responses" as internal. This is better than nothing, but it would be unsurprising if an evaluator is unable to elicit dangerous capabilities but users - with much more time and with access to future elicitation techniques - ultimately are. Recall that DeepMind, in contrast, gave "external testing groups . . . . the ability to turn down or turn off safety filters." Anthropic CEO Dario Amodei gave Dustin Moskovitz the impression that Anthropic committed "to not meaningfully advance the frontier with a launch." (Plus Gwern, and others got this impression from Anthropic too.) Perhaps Anthropic does not consider itself bound by this, which might be reasonable - it's quite disappointing that Anthropic hasn't clarified its commitments, particularly after the confusion on this topic around the Claude 3 launch. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
  continue reading

1697 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide