LW - "AI achieves silver-medal standard solving International Mathematical Olympiad problems" by gjm

The Nonlinear Library: LessWrong

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

3M ago 3:18

MP3•Episode home

Archived series ("Inactive feed" status)

When? This feed was archived on October 23, 2024 10:10 (15d ago). Last successful fetch was on September 22, 2024 16:12 (2M ago)

Why? Inactive feed status. Our servers were unable to retrieve a valid podcast feed for a sustained period.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "AI achieves silver-medal standard solving International Mathematical Olympiad problems", published by gjm on July 25, 2024 on LessWrong.
Google DeepMind reports on a system for solving mathematical problems that allegedly is able to give complete solutions to four of the six problems on the 2024 IMO, putting it near the top of the silver-medal category.
Well, actually, two systems for solving mathematical problems: AlphaProof, which is more general-purpose, and AlphaGeometry, which is specifically for geometry problems. (This is AlphaGeometry 2; they reported earlier this year on a previous version of AlphaGeometry.)
AlphaProof works in the "obvious" way: an LLM generates candidate next steps which are checked using a formal proof-checking system, in this case Lean. One not-so-obvious thing, though: "The training loop was also applied during the contest, reinforcing proofs of self-generated variations of the contest problems until a full solution could be found."
(That last bit is reminiscent of something from the world of computer go: a couple of years ago someone trained a custom version of KataGo specifically to solve the infamous Igo Hatsuyoron problem 120, starting with ordinary KataGo and feeding it training data containing positions reachable from the problem's starting position. They claim to have laid that problem to rest at last.)
AlphaGeometry is similar but uses something specialized for (I think) Euclidean planar geometry problems in place of Lean. The previous version of AlphaGeometry allegedly already performed at gold-medal IMO standard; they don't say anything about whether that version was already able to solve the 2024 IMO problem that was solved using AlphaGeometry 2.
AlphaProof was able to solve questions 1, 2, and 6 on this year's IMO (two algebra, one number theory). It produces Lean-formalized proofs. AlphaGeometry 2 was able to solve question 4 (plane geometry). It produces proofs in its own notation.
The solutions found by the Alpha... systems are at https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/imo-2024-solutions/index.html. (There are links in the top-of-page navbar to solutions to the individual problems.)
(If you're curious about the IMO questions or want to try them yourself before looking at the machine-generated proofs, you can find them -- and those for previous years -- at https://www.imo-official.org/problems.aspx.)
One caveat (note: an earlier version of what I wrote failed to notice this and quite wrongly explicitly claimed something different): "First, the problems were manually translated into formal mathematical language for our systems to understand." It feels to me like it shouldn't be so hard to teach an LLM to convert IMO problems into Lean or whatever, but apparently they aren't doing that yet.
Another caveat: "Our systems solved one problem within minutes and took up to three days to solve the others." Later on they say that AlphaGeometry 2 solved the geometry question within 19 seconds, so I guess that was also the one that was done "within minutes". Three days is a lot longer than human IMO contestants get given, but this feels to me like the sort of thing that will predictably improve pretty rapidly.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

1851 episodes

#The Nonlinear Fund #Podcasting Education #Of TexttoSpeech