78 subscribers
Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !
Everything You Wanted to Know About LLM Post-Training, with Nathan Lambert of Allen Institute for AI
Manage episode 451307537 series 3452589
In this episode of The Cognitive Revolution, we dive deep into frontier post-training techniques for large language models with Nathan Lambert from the Allen Institute for AI. Nathan discusses the groundbreaking Tulu 3 release, which matches Meta's post-training performance using the LlAMA base model. We explore supervised fine-tuning, preference-based reinforcement learning, and the innovative reinforcement learning from verifiable reward technique. Nathan provides unprecedented insights into the practical aspects of model development, compute requirements, and data generation strategies. This technically rich conversation illuminates previously opaque aspects of LLM development, achieved by a small team of 10-15 people. Join us for one of our most detailed and valuable discussions on state-of-the-art AI model development.
Check out Nathan's Lambert newsletter:
Be notified early when Turpentine's drops new publication: https://www.turpentine.co/exclusiveaccess
SPONSORS:
Incogni: Take your personal data back with Incogni! Use code REVOLUTION at the link below and get 60% off an annual plan: https://incogni.com/revolution
Notion: Notion offers powerful workflow and automation templates, perfect for streamlining processes and laying the groundwork for AI-driven automation. With Notion AI, you can search across thousands of documents from various platforms, generating highly relevant analysis and content tailored just for you - try it for free at https://notion.com/cognitiverevolution
Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive
80,000 Hours: 80,000 Hours offers free one-on-one career advising for Cognitive Revolution listeners aiming to tackle global challenges, especially in AI. They connect high-potential individuals with experts, opportunities, and personalized career plans to maximize positive impact. Apply for a free call at https://80000hours.org/cognitiverevolution to accelerate your career and contribute to solving pressing AI-related issues.
RECOMMENDED PODCAST:
Unpack Pricing - Dive into the dark arts of SaaS pricing with Metronome CEO Scott Woody and tech leaders. Learn how strategic pricing drives explosive revenue growth in today's biggest companies like Snowflake, Cockroach Labs, Dropbox and more.
Apple: https://podcasts.apple.com/us/podcast/id1765716600
Spotify: https://open.spotify.com/show/38DK3W1Fq1xxQalhDSueFg
CHAPTERS:
(00:00:00) Teaser
(00:00:59) Sponsors: Incogni
(00:02:20) About the Episode
(00:05:56) Introducing AI2
(00:09:56) Tulu: Deep Dive (Part 1)
(00:17:43) Sponsors: Shopify | Oracle Cloud Infrastructure (OCI)
(00:20:38) Open vs. Closed Recipes
(00:29:48) Compute & Value (Part 1)
(00:34:22) Sponsors: 80,000 Hours | Notion
(00:37:02) Compute & Value (Part 2)
(00:42:41) Model Weight Evolution
(00:53:16) DPO vs. PPO
(01:06:36) Project Trajectory
(01:20:39) Synthetic Data & LLM Judge
(01:27:39) Verifiable RL
(01:38:17) Advice for Practitioners
(01:44:01) Open Source vs. Closed
(01:49:18) Outro
234 tập
Everything You Wanted to Know About LLM Post-Training, with Nathan Lambert of Allen Institute for AI
"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
Manage episode 451307537 series 3452589
In this episode of The Cognitive Revolution, we dive deep into frontier post-training techniques for large language models with Nathan Lambert from the Allen Institute for AI. Nathan discusses the groundbreaking Tulu 3 release, which matches Meta's post-training performance using the LlAMA base model. We explore supervised fine-tuning, preference-based reinforcement learning, and the innovative reinforcement learning from verifiable reward technique. Nathan provides unprecedented insights into the practical aspects of model development, compute requirements, and data generation strategies. This technically rich conversation illuminates previously opaque aspects of LLM development, achieved by a small team of 10-15 people. Join us for one of our most detailed and valuable discussions on state-of-the-art AI model development.
Check out Nathan's Lambert newsletter:
Be notified early when Turpentine's drops new publication: https://www.turpentine.co/exclusiveaccess
SPONSORS:
Incogni: Take your personal data back with Incogni! Use code REVOLUTION at the link below and get 60% off an annual plan: https://incogni.com/revolution
Notion: Notion offers powerful workflow and automation templates, perfect for streamlining processes and laying the groundwork for AI-driven automation. With Notion AI, you can search across thousands of documents from various platforms, generating highly relevant analysis and content tailored just for you - try it for free at https://notion.com/cognitiverevolution
Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive
80,000 Hours: 80,000 Hours offers free one-on-one career advising for Cognitive Revolution listeners aiming to tackle global challenges, especially in AI. They connect high-potential individuals with experts, opportunities, and personalized career plans to maximize positive impact. Apply for a free call at https://80000hours.org/cognitiverevolution to accelerate your career and contribute to solving pressing AI-related issues.
RECOMMENDED PODCAST:
Unpack Pricing - Dive into the dark arts of SaaS pricing with Metronome CEO Scott Woody and tech leaders. Learn how strategic pricing drives explosive revenue growth in today's biggest companies like Snowflake, Cockroach Labs, Dropbox and more.
Apple: https://podcasts.apple.com/us/podcast/id1765716600
Spotify: https://open.spotify.com/show/38DK3W1Fq1xxQalhDSueFg
CHAPTERS:
(00:00:00) Teaser
(00:00:59) Sponsors: Incogni
(00:02:20) About the Episode
(00:05:56) Introducing AI2
(00:09:56) Tulu: Deep Dive (Part 1)
(00:17:43) Sponsors: Shopify | Oracle Cloud Infrastructure (OCI)
(00:20:38) Open vs. Closed Recipes
(00:29:48) Compute & Value (Part 1)
(00:34:22) Sponsors: 80,000 Hours | Notion
(00:37:02) Compute & Value (Part 2)
(00:42:41) Model Weight Evolution
(00:53:16) DPO vs. PPO
(01:06:36) Project Trajectory
(01:20:39) Synthetic Data & LLM Judge
(01:27:39) Verifiable RL
(01:38:17) Advice for Practitioners
(01:44:01) Open Source vs. Closed
(01:49:18) Outro
234 tập
Tất cả các tập
×
1 New in Nature: Google Agents Beat Human Doctors, Make Scientific Discoveries – With Vivek Natarajan and Anil Palepu 1:27:57

1 Scaling "Thinking": Gemini 2.5 Tech Lead Jack Rae on Reasoning, Long Context, & the Path to AGI 1:16:28

1 Reward Hacking by Reasoning Models & Loss of Control Scenarios w/ Jeffrey Ladish of Palisade Research, from FLI Podcast 1:32:17

1 Shortwave Rides the Tidal Wave: Inbox Agents, Hyper-Growth & Hiring AI Managers, with CEO Andrew Lee 1:51:39

1 Code Context is King: Augment’s AI Assistant for Professional Software Engineers, with Guy Gur-Ari 1:25:44

1 Unlocking Cells' Secrets: Diffusion, Deconvolution, & Discovery with Siyu He, author of Squidiff & CORAL 1:46:17

1 a16z on AI Voices: Call Centers, Coaches, and Companions with Olivia Moore & Anish Acharya 1:07:35

1 Agency over AI? Allan Dafoe on Technological Determinism & DeepMind's Safety Plans, from 80000 Hours 3:02:28

1 China's Tech Tightrope: Power, Regulation, and the AI Race with Angela Zhang 1:31:56

1 Historic AI Developments & the Emerging Shape of Superintelligence, from the Consistently Candid Podcast 1:57:36

1 Frontier Models for Frontier Science with Professor Derya Unutmaz, Immunologist & ChatGPT Pro Grantee 1:32:34

1 US-China Relations: History, Culture, and AI Competition, with Noah Smith, from Econ 102 1:09:49

1 The Adversarial Mind: Defeating AI Defenses with Nicholas Carlini of Google DeepMind 2:34:38

1 New Jersey’s AI Moonshot: Governor Phil Murphy on Partnerships, Progress, and Preparedness 55:54

1 Inference Scaling, Alignment Faking, Deal Making? Frontier Research with Ryan Greenblatt of Redwood Research 3:21:07
Chào mừng bạn đến với Player FM!
Player FM đang quét trang web để tìm các podcast chất lượng cao cho bạn thưởng thức ngay bây giờ. Đây là ứng dụng podcast tốt nhất và hoạt động trên Android, iPhone và web. Đăng ký để đồng bộ các theo dõi trên tất cả thiết bị.