Artwork

Nội dung được cung cấp bởi Real Python. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Real Python hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
Player FM - Ứng dụng Podcast
Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !

Focusing on Data Science & Less on Engineering and Dependencies

1:01:20
 
Chia sẻ
 

Manage episode 400074317 series 2637014
Nội dung được cung cấp bởi Real Python. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Real Python hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.

How do you manage the dependencies of a large-scale data science project? How do you migrate that project from a laptop to cloud infrastructure or utilize GPUs and multiple instances in parallel? This week on the show, Savin Goyal returns to discuss the updates to the open-source framework Metaflow.

Savin briefly describes the Metaflow platform and the goal of simplifying engineering overhead for data scientists and programmers. We discuss how the platform captures snapshots of a project as you work, allowing you to go back in time or share the state of your project with another team member.

We dig into the complicated process of managing dependencies for machine learning and data science projects. Savin describes how the required external libraries can be specified within a flow with the new @pypi or @conda decorators. This allows a project to scale from a local machine to the cloud or multiple instances with all dependencies included.

He talks about starting a new company, Outerbounds, with fellow co-workers from Netflix. Their vision is to continue to build the Metaflow open-source platform and offer customers scalable enterprise-grade infrastructure.

This week’s episode is brought to you by Intel.

Course Spotlight: Everyday Project Packaging With pyproject.toml

In this Code Conversation video course, you’ll learn how to package your everyday projects with pyproject.toml. Playing on the same team as the import system means you can call your project from anywhere, ensure consistent imports, and have one file that’ll work for many build systems.

Topics:

  • 00:00:00 – Introduction
  • 00:02:25 – Update on Metaflow
  • 00:04:13 – What is Outerbounds?
  • 00:07:26 – An ML platform to serve data scientists needs
  • 00:13:02 – Dependency reproducibility via @conda and @pypi decorators
  • 00:26:18 – Sponsor: Intel
  • 00:27:10 – Storing lock files along with snapshots
  • 00:29:17 – Working alongside code and dependency management systems
  • 00:34:03 – Scaling a project from laptop to the cloud
  • 00:40:13 – Video Course Spotlight
  • 00:41:41 – Getting visibility on processes
  • 00:47:23 – Adjusting your project due to GPU availability
  • 00:52:27 – Example of jumping back into a project one year later
  • 00:55:54 – What are you excited about in the world of Python?
  • 00:57:39 – What do you want to learn next?
  • 00:59:35 – How can people follow your work online?
  • 01:00:19 – Thanks and goodbye

Show Links:

Level up your Python skills with our expert-led courses:

Support the podcast & join our community of Pythonistas

  continue reading

278 tập

Artwork
iconChia sẻ
 
Manage episode 400074317 series 2637014
Nội dung được cung cấp bởi Real Python. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Real Python hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.

How do you manage the dependencies of a large-scale data science project? How do you migrate that project from a laptop to cloud infrastructure or utilize GPUs and multiple instances in parallel? This week on the show, Savin Goyal returns to discuss the updates to the open-source framework Metaflow.

Savin briefly describes the Metaflow platform and the goal of simplifying engineering overhead for data scientists and programmers. We discuss how the platform captures snapshots of a project as you work, allowing you to go back in time or share the state of your project with another team member.

We dig into the complicated process of managing dependencies for machine learning and data science projects. Savin describes how the required external libraries can be specified within a flow with the new @pypi or @conda decorators. This allows a project to scale from a local machine to the cloud or multiple instances with all dependencies included.

He talks about starting a new company, Outerbounds, with fellow co-workers from Netflix. Their vision is to continue to build the Metaflow open-source platform and offer customers scalable enterprise-grade infrastructure.

This week’s episode is brought to you by Intel.

Course Spotlight: Everyday Project Packaging With pyproject.toml

In this Code Conversation video course, you’ll learn how to package your everyday projects with pyproject.toml. Playing on the same team as the import system means you can call your project from anywhere, ensure consistent imports, and have one file that’ll work for many build systems.

Topics:

  • 00:00:00 – Introduction
  • 00:02:25 – Update on Metaflow
  • 00:04:13 – What is Outerbounds?
  • 00:07:26 – An ML platform to serve data scientists needs
  • 00:13:02 – Dependency reproducibility via @conda and @pypi decorators
  • 00:26:18 – Sponsor: Intel
  • 00:27:10 – Storing lock files along with snapshots
  • 00:29:17 – Working alongside code and dependency management systems
  • 00:34:03 – Scaling a project from laptop to the cloud
  • 00:40:13 – Video Course Spotlight
  • 00:41:41 – Getting visibility on processes
  • 00:47:23 – Adjusting your project due to GPU availability
  • 00:52:27 – Example of jumping back into a project one year later
  • 00:55:54 – What are you excited about in the world of Python?
  • 00:57:39 – What do you want to learn next?
  • 00:59:35 – How can people follow your work online?
  • 01:00:19 – Thanks and goodbye

Show Links:

Level up your Python skills with our expert-led courses:

Support the podcast & join our community of Pythonistas

  continue reading

278 tập

همه قسمت ها

×
 
Loading …

Chào mừng bạn đến với Player FM!

Player FM đang quét trang web để tìm các podcast chất lượng cao cho bạn thưởng thức ngay bây giờ. Đây là ứng dụng podcast tốt nhất và hoạt động trên Android, iPhone và web. Đăng ký để đồng bộ các theo dõi trên tất cả thiết bị.

 

Hướng dẫn sử dụng nhanh

Nghe chương trình này trong khi bạn khám phá
Nghe