Artwork

Content provided by Machine Learning Street Talk (MLST). All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Machine Learning Street Talk (MLST) or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Prof. Subbarao Kambhampati - LLMs don't reason, they memorize (ICML2024 2/13)

1:42:27
 
Share
 

Manage episode 431312769 series 2803422
Content provided by Machine Learning Street Talk (MLST). All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Machine Learning Street Talk (MLST) or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Prof. Subbarao Kambhampati argues that while LLMs are impressive and useful tools, especially for creative tasks, they have fundamental limitations in logical reasoning and cannot provide guarantees about the correctness of their outputs. He advocates for hybrid approaches that combine LLMs with external verification systems.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

Refs

Can LLMs Really Reason and Plan?

https://cacm.acm.org/blogcacm/can-llms-really-reason-and-plan/

On the Planning Abilities of Large Language Models : A Critical Investigation

https://arxiv.org/pdf/2305.15771

Chain of Thoughtlessness? An Analysis of CoT in Planning

https://arxiv.org/pdf/2405.04776

On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks

https://arxiv.org/pdf/2402.08115

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

https://arxiv.org/pdf/2402.01817

Embers of Autoregression: Understanding Large Language

Models Through the Problem They are Trained to Solve

https://arxiv.org/pdf/2309.13638

https://arxiv.org/abs/2402.04210

"Task Success" is not Enough

Partition function (number theory) (Srinivasa Ramanujan and G.H. Hardy's work)

https://en.wikipedia.org/wiki/Partition_function_(number_theory)

Poincaré conjecture

https://en.wikipedia.org/wiki/Poincar%C3%A9_conjecture

Gödel's incompleteness theorems

https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems

ROT13 (Rotate13, "rotate by 13 places")

https://en.wikipedia.org/wiki/ROT13

A Mathematical Theory of Communication (C. E. SHANNON)

https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf

Sparks of AGI

https://arxiv.org/abs/2303.12712

Kambhampati thesis on speech recognition (1983)

https://rakaposhi.eas.asu.edu/rao-btech-thesis.pdf

PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change

https://arxiv.org/abs/2206.10498

Explainable human-AI interaction

https://link.springer.com/book/10.1007/978-3-031-03767-2

Tree of Thoughts

https://arxiv.org/abs/2305.10601

On the Measure of Intelligence (ARC Challenge)

https://arxiv.org/abs/1911.01547

Getting 50% (SoTA) on ARC-AGI with GPT-4o (Ryan Greenblatt ARC solution)

https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

PROGRAMS WITH COMMON SENSE (John McCarthy) - "AI should be an advice taker program"

https://www.cs.cornell.edu/selman/cs672/readings/mccarthy-upd.pdf

Original chain of thought paper

https://arxiv.org/abs/2201.11903

ICAPS 2024 Keynote: Dale Schuurmans on "Computing and Planning with Large Generative Models" (COT)

https://www.youtube.com/watch?v=YnMqbpdHcaY

The Hardware Lottery (Hooker)

https://arxiv.org/abs/2009.06489

A Path Towards Autonomous Machine Intelligence (JEPA/LeCun)

https://openreview.net/pdf?id=BZ5a1r-kVsf

AlphaGeometry

https://www.nature.com/articles/s41586-023-06747-5

FunSearch

https://www.nature.com/articles/s41586-023-06924-6

Emergent Abilities of Large Language Models

https://arxiv.org/abs/2206.07682

Language models are not naysayers (Negation in LLMs)

https://arxiv.org/abs/2306.08189

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

https://arxiv.org/abs/2309.12288

Embracing negative results

https://openreview.net/forum?id=3RXAiU7sss

  continue reading

161 episodes

Artwork
iconShare
 
Manage episode 431312769 series 2803422
Content provided by Machine Learning Street Talk (MLST). All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Machine Learning Street Talk (MLST) or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Prof. Subbarao Kambhampati argues that while LLMs are impressive and useful tools, especially for creative tasks, they have fundamental limitations in logical reasoning and cannot provide guarantees about the correctness of their outputs. He advocates for hybrid approaches that combine LLMs with external verification systems.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

Refs

Can LLMs Really Reason and Plan?

https://cacm.acm.org/blogcacm/can-llms-really-reason-and-plan/

On the Planning Abilities of Large Language Models : A Critical Investigation

https://arxiv.org/pdf/2305.15771

Chain of Thoughtlessness? An Analysis of CoT in Planning

https://arxiv.org/pdf/2405.04776

On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks

https://arxiv.org/pdf/2402.08115

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

https://arxiv.org/pdf/2402.01817

Embers of Autoregression: Understanding Large Language

Models Through the Problem They are Trained to Solve

https://arxiv.org/pdf/2309.13638

https://arxiv.org/abs/2402.04210

"Task Success" is not Enough

Partition function (number theory) (Srinivasa Ramanujan and G.H. Hardy's work)

https://en.wikipedia.org/wiki/Partition_function_(number_theory)

Poincaré conjecture

https://en.wikipedia.org/wiki/Poincar%C3%A9_conjecture

Gödel's incompleteness theorems

https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems

ROT13 (Rotate13, "rotate by 13 places")

https://en.wikipedia.org/wiki/ROT13

A Mathematical Theory of Communication (C. E. SHANNON)

https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf

Sparks of AGI

https://arxiv.org/abs/2303.12712

Kambhampati thesis on speech recognition (1983)

https://rakaposhi.eas.asu.edu/rao-btech-thesis.pdf

PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change

https://arxiv.org/abs/2206.10498

Explainable human-AI interaction

https://link.springer.com/book/10.1007/978-3-031-03767-2

Tree of Thoughts

https://arxiv.org/abs/2305.10601

On the Measure of Intelligence (ARC Challenge)

https://arxiv.org/abs/1911.01547

Getting 50% (SoTA) on ARC-AGI with GPT-4o (Ryan Greenblatt ARC solution)

https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

PROGRAMS WITH COMMON SENSE (John McCarthy) - "AI should be an advice taker program"

https://www.cs.cornell.edu/selman/cs672/readings/mccarthy-upd.pdf

Original chain of thought paper

https://arxiv.org/abs/2201.11903

ICAPS 2024 Keynote: Dale Schuurmans on "Computing and Planning with Large Generative Models" (COT)

https://www.youtube.com/watch?v=YnMqbpdHcaY

The Hardware Lottery (Hooker)

https://arxiv.org/abs/2009.06489

A Path Towards Autonomous Machine Intelligence (JEPA/LeCun)

https://openreview.net/pdf?id=BZ5a1r-kVsf

AlphaGeometry

https://www.nature.com/articles/s41586-023-06747-5

FunSearch

https://www.nature.com/articles/s41586-023-06924-6

Emergent Abilities of Large Language Models

https://arxiv.org/abs/2206.07682

Language models are not naysayers (Negation in LLMs)

https://arxiv.org/abs/2306.08189

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

https://arxiv.org/abs/2309.12288

Embracing negative results

https://openreview.net/forum?id=3RXAiU7sss

  continue reading

161 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide