Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !
How a Moonshot Led to Google DeepMind's Veo 3
Manage episode 513868902 series 3624003
Dumi Erhan, co-lead of the Veo project at Google DeepMind, joins host Logan Kilpatrick for a deep dive into the evolution of generative video models. They discuss the journey from early research in 2018 to the launch of state-of-the-art Veo 3 model with native audio generation. Learn about the technical hurdles in evaluating and scaling video models, the challenges of long-duration video coherence and how user feedback is shaping the future of AI-powered video creation.
Chapter:
0:00 - Intro
0:47 - Veo project's beginnings
3:02 - Veo's origins in Google Brain
5:07 - Video prediction and robotics applications
7:45 - Early progress and evaluation challenges
10:30 - Physics-based evaluations and their limitations
12:18 - The launch of the original Veo model
14:06 - Scaling challenges for video models
16:02 - The leap from Veo1 to Veo2
19:40 - Veo 3’s viral audio moment
21:17 - User trends shaping Veo's roadmap
23:49 - Image-to-video vs. text-to-video complexity
26:00 - New prompting methods and user control
27:55 - Coherence in long video generation
31:03 - Genie 3 and world models
35:54 - The steerability challenge
41:59 - Capability transfer and image data's role
47:25 - Closing
21 tập
Manage episode 513868902 series 3624003
Dumi Erhan, co-lead of the Veo project at Google DeepMind, joins host Logan Kilpatrick for a deep dive into the evolution of generative video models. They discuss the journey from early research in 2018 to the launch of state-of-the-art Veo 3 model with native audio generation. Learn about the technical hurdles in evaluating and scaling video models, the challenges of long-duration video coherence and how user feedback is shaping the future of AI-powered video creation.
Chapter:
0:00 - Intro
0:47 - Veo project's beginnings
3:02 - Veo's origins in Google Brain
5:07 - Video prediction and robotics applications
7:45 - Early progress and evaluation challenges
10:30 - Physics-based evaluations and their limitations
12:18 - The launch of the original Veo model
14:06 - Scaling challenges for video models
16:02 - The leap from Veo1 to Veo2
19:40 - Veo 3’s viral audio moment
21:17 - User trends shaping Veo's roadmap
23:49 - Image-to-video vs. text-to-video complexity
26:00 - New prompting methods and user control
27:55 - Coherence in long video generation
31:03 - Genie 3 and world models
35:54 - The steerability challenge
41:59 - Capability transfer and image data's role
47:25 - Closing
21 tập
Усі епізоди
×Chào mừng bạn đến với Player FM!
Player FM đang quét trang web để tìm các podcast chất lượng cao cho bạn thưởng thức ngay bây giờ. Đây là ứng dụng podcast tốt nhất và hoạt động trên Android, iPhone và web. Đăng ký để đồng bộ các theo dõi trên tất cả thiết bị.