Artwork

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

EA - Announcing the AI Forecasting Benchmark Series | July 8, $120k in Prizes by christian

7:57
 
Share
 

Manage episode 424497714 series 3314709
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing the AI Forecasting Benchmark Series | July 8, $120k in Prizes, published by christian on June 20, 2024 on The Effective Altruism Forum. On July 8th Metaculus is launching the first in a series of quarterly tournaments benchmarking the state of the art in AI forecasting and how it compares to the best human forecasting on real-world questions. Why a forecasting benchmark? Many Metaculus questions call for complex, multi-step thinking to predict accurately. A good forecaster needs a mix of capabilities and sound judgment to apply them appropriately. And because the outcomes are not yet known, it's difficult to narrowly train a model to the task to simply game the benchmark. Benchmarking forecasting ability offers a way to measure and better understand key AI capabilities. AI forecasting accuracy is well below human level, but the gap is narrowing - and it's important to know just how quickly. And it's not just accuracy we want to measure over time, but a variety of forecasting metrics, including calibration and logical consistency. In this post we lay out how to get started creating your own forecasting bot, so you can predict in the upcoming series, compete for $120,000 in prizes, and help track critical AI capabilities that encompass strategic thinking and world-modeling. Read on to learn about the series - or scroll ahead to get started with Bot Creation and Forecast Prompting. The Series - Feedback Wanted The first of the four $30,000 contests starts July 8th, with a new contest and prize pool launching each quarter. Every day, 5-10 binary questions will launch, remain open for 24 hours, and then close, for a total of 250-500 questions per contest. These AI Forecasting Benchmark contests will be bot-only, with no human participation. Open questions will display no Community Prediction, and we'll use "spot scoring" that considers only a bot's forecast at question close. Bots' performances will be compared against each other - and against the Metaculus community and Metaculus Pro Forecasters on a range of paired questions. These questions will be pulled from the human Quarterly Cup, from the regular question feed, from modifications to existing questions, and from easy-to-check metrics such as FRED economic indicators. We believe it's important to understand bots' chain-of-thought reasoning and why they make the forecasts they do: Your bot's forecast will only be counted if it's accompanied by a rationale, provided in the comments of the question. We will not score the rationale for the prize, however, we would like to have it be accessible for reasoning transparency. (And though it's not required, we do encourage participants to share details of their bots' code publicly.) Bot Creation and Forecast Prompting We have several ways for you to get started with AI forecasting in advance of the first contest in the benchmarking series: You can experiment with LLM prompting using our new prompting interface You can jump right into building your own bot with our provided Google Colab Notebooks Note that if you're building a bot but aren't using one of our Colab Notebooks as a template, you'll still need to register your bot to access your Metaculus API token. Let's start with registering your bot: Visit metaculus.com/aib/. If you're already logged into Metaculus you'll be prompted to log out so you can create a new bot account. Pro tip: If you already have a Metaculus account associated with a Google email address, you can simply add '+' and any other text to create a username and account that's associated with the same address. For example, in addition to my christian@metaculus.com account, I have a christian+bot@metaculus.com account. After creating your account, click the activation link in your email. (Check the spam folder if you don't see anything.) Now you...
  continue reading

2432 episodes

Artwork
iconShare
 
Manage episode 424497714 series 3314709
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing the AI Forecasting Benchmark Series | July 8, $120k in Prizes, published by christian on June 20, 2024 on The Effective Altruism Forum. On July 8th Metaculus is launching the first in a series of quarterly tournaments benchmarking the state of the art in AI forecasting and how it compares to the best human forecasting on real-world questions. Why a forecasting benchmark? Many Metaculus questions call for complex, multi-step thinking to predict accurately. A good forecaster needs a mix of capabilities and sound judgment to apply them appropriately. And because the outcomes are not yet known, it's difficult to narrowly train a model to the task to simply game the benchmark. Benchmarking forecasting ability offers a way to measure and better understand key AI capabilities. AI forecasting accuracy is well below human level, but the gap is narrowing - and it's important to know just how quickly. And it's not just accuracy we want to measure over time, but a variety of forecasting metrics, including calibration and logical consistency. In this post we lay out how to get started creating your own forecasting bot, so you can predict in the upcoming series, compete for $120,000 in prizes, and help track critical AI capabilities that encompass strategic thinking and world-modeling. Read on to learn about the series - or scroll ahead to get started with Bot Creation and Forecast Prompting. The Series - Feedback Wanted The first of the four $30,000 contests starts July 8th, with a new contest and prize pool launching each quarter. Every day, 5-10 binary questions will launch, remain open for 24 hours, and then close, for a total of 250-500 questions per contest. These AI Forecasting Benchmark contests will be bot-only, with no human participation. Open questions will display no Community Prediction, and we'll use "spot scoring" that considers only a bot's forecast at question close. Bots' performances will be compared against each other - and against the Metaculus community and Metaculus Pro Forecasters on a range of paired questions. These questions will be pulled from the human Quarterly Cup, from the regular question feed, from modifications to existing questions, and from easy-to-check metrics such as FRED economic indicators. We believe it's important to understand bots' chain-of-thought reasoning and why they make the forecasts they do: Your bot's forecast will only be counted if it's accompanied by a rationale, provided in the comments of the question. We will not score the rationale for the prize, however, we would like to have it be accessible for reasoning transparency. (And though it's not required, we do encourage participants to share details of their bots' code publicly.) Bot Creation and Forecast Prompting We have several ways for you to get started with AI forecasting in advance of the first contest in the benchmarking series: You can experiment with LLM prompting using our new prompting interface You can jump right into building your own bot with our provided Google Colab Notebooks Note that if you're building a bot but aren't using one of our Colab Notebooks as a template, you'll still need to register your bot to access your Metaculus API token. Let's start with registering your bot: Visit metaculus.com/aib/. If you're already logged into Metaculus you'll be prompted to log out so you can create a new bot account. Pro tip: If you already have a Metaculus account associated with a Google email address, you can simply add '+' and any other text to create a username and account that's associated with the same address. For example, in addition to my christian@metaculus.com account, I have a christian+bot@metaculus.com account. After creating your account, click the activation link in your email. (Check the spam folder if you don't see anything.) Now you...
  continue reading

2432 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide