Artwork

Nội dung được cung cấp bởi Yannic Kilcher. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Yannic Kilcher hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
Player FM - Ứng dụng Podcast
Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !

Author Interview: SayCan - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

58:31
 
Chia sẻ
 

Manage episode 328260827 series 2974171
Nội dung được cung cấp bởi Yannic Kilcher. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Yannic Kilcher hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.

#saycan #robots #ai

This is an interview with the authors Brian Ichter, Karol Hausman, and Fei Xia.

Original Paper Review Video: https://youtu.be/Ru23eWAQ6_E

Large Language Models are excellent at generating plausible plans in response to real-world problems, but without interacting with the environment, they have no abilities to estimate which of these plans are feasible or appropriate. SayCan combines the semantic capabilities of language models with a bank of low-level skills, which are available to the agent as individual policies to execute. SayCan automatically finds the best policy to execute by considering a trade-off between the policy's ability to progress towards the goal, given by the language model, and the policy's probability of executing successfully, given by the respective value function. The result is a system that can generate and execute long-horizon action sequences in the real world to fulfil complex tasks.

OUTLINE:

0:00 - Introduction & Setup

3:40 - Acquiring atomic low-level skills

7:45 - How does the language model come in?

11:45 - Why are you scoring instead of generating?

15:20 - How do you deal with ambiguity in language?

20:00 - The whole system is modular

22:15 - Going over the full algorithm

23:20 - What if an action fails?

24:30 - Debunking a marketing video :)

27:25 - Experimental Results

32:50 - The insane scale of data collection

40:15 - How do you go about large-scale projects?

43:20 - Where did things go wrong?

45:15 - Where do we go from here?

52:00 - What is the largest unsolved problem in this?

53:35 - Thoughts on the Tesla Bot

55:00 - Final thoughts

Paper: https://arxiv.org/abs/2204.01691

Website: https://say-can.github.io/

Abstract:

Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment.

Authors: Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan

  continue reading

177 tập

Artwork
iconChia sẻ
 
Manage episode 328260827 series 2974171
Nội dung được cung cấp bởi Yannic Kilcher. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Yannic Kilcher hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.

#saycan #robots #ai

This is an interview with the authors Brian Ichter, Karol Hausman, and Fei Xia.

Original Paper Review Video: https://youtu.be/Ru23eWAQ6_E

Large Language Models are excellent at generating plausible plans in response to real-world problems, but without interacting with the environment, they have no abilities to estimate which of these plans are feasible or appropriate. SayCan combines the semantic capabilities of language models with a bank of low-level skills, which are available to the agent as individual policies to execute. SayCan automatically finds the best policy to execute by considering a trade-off between the policy's ability to progress towards the goal, given by the language model, and the policy's probability of executing successfully, given by the respective value function. The result is a system that can generate and execute long-horizon action sequences in the real world to fulfil complex tasks.

OUTLINE:

0:00 - Introduction & Setup

3:40 - Acquiring atomic low-level skills

7:45 - How does the language model come in?

11:45 - Why are you scoring instead of generating?

15:20 - How do you deal with ambiguity in language?

20:00 - The whole system is modular

22:15 - Going over the full algorithm

23:20 - What if an action fails?

24:30 - Debunking a marketing video :)

27:25 - Experimental Results

32:50 - The insane scale of data collection

40:15 - How do you go about large-scale projects?

43:20 - Where did things go wrong?

45:15 - Where do we go from here?

52:00 - What is the largest unsolved problem in this?

53:35 - Thoughts on the Tesla Bot

55:00 - Final thoughts

Paper: https://arxiv.org/abs/2204.01691

Website: https://say-can.github.io/

Abstract:

Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment.

Authors: Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan

  continue reading

177 tập

Tất cả các tập

×
 
Loading …

Chào mừng bạn đến với Player FM!

Player FM đang quét trang web để tìm các podcast chất lượng cao cho bạn thưởng thức ngay bây giờ. Đây là ứng dụng podcast tốt nhất và hoạt động trên Android, iPhone và web. Đăng ký để đồng bộ các theo dõi trên tất cả thiết bị.

 

Hướng dẫn sử dụng nhanh

Nghe chương trình này trong khi bạn khám phá
Nghe