Go offline with the Player FM app!
Podcasts Worth a Listen
SPONSORED


1 David French | Friends or Enemies? Overcoming Divides with Justice, Kindness, and Humility in a Polarized America 1:15:36
[QA] Measuring AI Ability to Complete Long Tasks
Manage episode 472277532 series 3524393
The paper introduces a new metric, 50%-task-completion time horizon, to evaluate AI capabilities, revealing rapid advancements and predicting significant automation of software tasks within five years.
https://arxiv.org/abs//2503.14499
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
2059 episodes
Manage episode 472277532 series 3524393
The paper introduces a new metric, 50%-task-completion time horizon, to evaluate AI capabilities, revealing rapid advancements and predicting significant automation of software tasks within five years.
https://arxiv.org/abs//2503.14499
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
2059 episodes
All episodes
×
1 [QA] LookAhead Tuning: Safer Language Models via Partial Answer Previews 7:22

1 LookAhead Tuning: Safer Language Models via Partial Answer Previews 8:06

1 [QA] ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning 7:46

1 ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning 13:40

1 [QA] FFN Fusion: Rethinking Sequential Computation in Large Language Models 8:23

1 FFN Fusion: Rethinking Sequential Computation in Large Language Models 21:57

1 [QA] Modifying Large Language Model Post-Training for Diverse Creative Writing 8:47

1 Modifying Large Language Model Post-Training for Diverse Creative Writing 18:38

1 [QA] Users Favor LLM-Generated Content—Until They Know It's AI 6:33

1 Users Favor LLM-Generated Content—Until They Know It's AI 7:01

1 [QA] Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs 7:30

1 Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs 20:10

1 [QA] DAPO: An Open-Source LLM Reinforcement Learning System at Scale 8:07

1 DAPO: An Open-Source LLM Reinforcement Learning System at Scale 14:00

1 [QA] SynCity: Training-Free Generation of 3D Worlds 7:33
Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.