Artwork

Nội dung được cung cấp bởi Machine Learning Street Talk (MLST). Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Machine Learning Street Talk (MLST) hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
Player FM - Ứng dụng Podcast
Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !

Prof. Subbarao Kambhampati - LLMs don't reason, they memorize (ICML2024 2/13)

1:42:27
 
Chia sẻ
 

Manage episode 431312769 series 2803422
Nội dung được cung cấp bởi Machine Learning Street Talk (MLST). Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Machine Learning Street Talk (MLST) hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.

Prof. Subbarao Kambhampati argues that while LLMs are impressive and useful tools, especially for creative tasks, they have fundamental limitations in logical reasoning and cannot provide guarantees about the correctness of their outputs. He advocates for hybrid approaches that combine LLMs with external verification systems.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

TOC (sorry the ones baked into the MP3 were wrong apropos due to LLM hallucination!)

[00:00:00] Intro

[00:02:06] Bio

[00:03:02] LLMs are n-gram models on steroids

[00:07:26] Is natural language a formal language?

[00:08:34] Natural language is formal?

[00:11:01] Do LLMs reason?

[00:19:13] Definition of reasoning

[00:31:40] Creativity in reasoning

[00:50:27] Chollet's ARC challenge

[01:01:31] Can we reason without verification?

[01:10:00] LLMs cant solve some tasks

[01:19:07] LLM Modulo framework

[01:29:26] Future trends of architecture

[01:34:48] Future research directions

Youtube version: https://www.youtube.com/watch?v=y1WnHpedi2A

Refs: (we didn't have space for URLs here, check YT video description instead)

  • Can LLMs Really Reason and Plan?
  • On the Planning Abilities of Large Language Models : A Critical Investigation
  • Chain of Thoughtlessness? An Analysis of CoT in Planning
  • On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks
  • LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
  • Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
  • "Task Success" is not Enough
  • Partition function (number theory) (Srinivasa Ramanujan and G.H. Hardy's work)
  • Poincaré conjecture
  • Gödel's incompleteness theorems
  • ROT13 (Rotate13, "rotate by 13 places")
  • A Mathematical Theory of Communication (C. E. SHANNON)
  • Sparks of AGI
  • Kambhampati thesis on speech recognition (1983)
  • PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
  • Explainable human-AI interaction
  • Tree of Thoughts
  • On the Measure of Intelligence (ARC Challenge)
  • Getting 50% (SoTA) on ARC-AGI with GPT-4o (Ryan Greenblatt ARC solution)
  • PROGRAMS WITH COMMON SENSE (John McCarthy) - "AI should be an advice taker program"
  • Original chain of thought paper
  • ICAPS 2024 Keynote: Dale Schuurmans on "Computing and Planning with Large Generative Models" (COT)
  • The Hardware Lottery (Hooker)
  • A Path Towards Autonomous Machine Intelligence (JEPA/LeCun)
  • AlphaGeometry
  • FunSearch
  • Emergent Abilities of Large Language Models
  • Language models are not naysayers (Negation in LLMs)
  • The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
  • Embracing negative results
  continue reading

189 tập

Artwork
iconChia sẻ
 
Manage episode 431312769 series 2803422
Nội dung được cung cấp bởi Machine Learning Street Talk (MLST). Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Machine Learning Street Talk (MLST) hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.

Prof. Subbarao Kambhampati argues that while LLMs are impressive and useful tools, especially for creative tasks, they have fundamental limitations in logical reasoning and cannot provide guarantees about the correctness of their outputs. He advocates for hybrid approaches that combine LLMs with external verification systems.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

TOC (sorry the ones baked into the MP3 were wrong apropos due to LLM hallucination!)

[00:00:00] Intro

[00:02:06] Bio

[00:03:02] LLMs are n-gram models on steroids

[00:07:26] Is natural language a formal language?

[00:08:34] Natural language is formal?

[00:11:01] Do LLMs reason?

[00:19:13] Definition of reasoning

[00:31:40] Creativity in reasoning

[00:50:27] Chollet's ARC challenge

[01:01:31] Can we reason without verification?

[01:10:00] LLMs cant solve some tasks

[01:19:07] LLM Modulo framework

[01:29:26] Future trends of architecture

[01:34:48] Future research directions

Youtube version: https://www.youtube.com/watch?v=y1WnHpedi2A

Refs: (we didn't have space for URLs here, check YT video description instead)

  • Can LLMs Really Reason and Plan?
  • On the Planning Abilities of Large Language Models : A Critical Investigation
  • Chain of Thoughtlessness? An Analysis of CoT in Planning
  • On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks
  • LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
  • Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
  • "Task Success" is not Enough
  • Partition function (number theory) (Srinivasa Ramanujan and G.H. Hardy's work)
  • Poincaré conjecture
  • Gödel's incompleteness theorems
  • ROT13 (Rotate13, "rotate by 13 places")
  • A Mathematical Theory of Communication (C. E. SHANNON)
  • Sparks of AGI
  • Kambhampati thesis on speech recognition (1983)
  • PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
  • Explainable human-AI interaction
  • Tree of Thoughts
  • On the Measure of Intelligence (ARC Challenge)
  • Getting 50% (SoTA) on ARC-AGI with GPT-4o (Ryan Greenblatt ARC solution)
  • PROGRAMS WITH COMMON SENSE (John McCarthy) - "AI should be an advice taker program"
  • Original chain of thought paper
  • ICAPS 2024 Keynote: Dale Schuurmans on "Computing and Planning with Large Generative Models" (COT)
  • The Hardware Lottery (Hooker)
  • A Path Towards Autonomous Machine Intelligence (JEPA/LeCun)
  • AlphaGeometry
  • FunSearch
  • Emergent Abilities of Large Language Models
  • Language models are not naysayers (Negation in LLMs)
  • The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
  • Embracing negative results
  continue reading

189 tập

Tất cả các tập

×
 
Loading …

Chào mừng bạn đến với Player FM!

Player FM đang quét trang web để tìm các podcast chất lượng cao cho bạn thưởng thức ngay bây giờ. Đây là ứng dụng podcast tốt nhất và hoạt động trên Android, iPhone và web. Đăng ký để đồng bộ các theo dõi trên tất cả thiết bị.

 

Hướng dẫn sử dụng nhanh