Artwork

Nội dung được cung cấp bởi O'Reilly Media. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được O'Reilly Media hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
Player FM - Ứng dụng Podcast
Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !

Labeling, transforming, and structuring training data sets for machine learning

40:51
 
Chia sẻ
 

Manage episode 248276630 series 61203
Nội dung được cung cấp bởi O'Reilly Media. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được O'Reilly Media hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.

In this episode of the Data Show, I speak with Alex Ratner, project lead for Stanford’s Snorkel open source project; Ratner also recently garnered a faculty position at the University of Washington and is currently working on a company supporting and extending the Snorkel project. Snorkel is a framework for building and managing training data. Based on our survey from earlier this year, labeled data remains a key bottleneck for organizations building machine learning applications and services.

Ratner was a guest on the podcast a little over two years ago when Snorkel was a relatively new project. Since then, Snorkel has added more features, expanded into computer vision use cases, and now boasts many users, including Google, Intel, IBM, and other organizations. Along with his thesis advisor professor Chris Ré of Stanford, Ratner and his collaborators have long championed the importance of building tools aimed squarely at helping teams build and manage training data. With today’s release of Snorkel version 0.9, we are a step closer to having a framework that enables the programmatic creation of training data sets.

Snorkel pipeline for data labeling
Snorkel pipeline for data labeling. Source: Alex Ratner, used with permission.

We had a great conversation spanning many topics, including:

  • Why he and his collaborators decided to focus on “data programming” and tools for building and managing training data.
  • A tour through Snorkel, including its target users and key components.
  • What’s in the newly released version (v 0.9) of Snorkel.
  • The number of Snorkel’s users has grown quite a bit since we last spoke, so we went through some of the common use cases for the project.
  • Data lineage, AutoML, and end-to-end automation of machine learning pipelines.
  • Holoclean and other projects focused on data quality and data programming.
  • The need for tools that can ease the transition from raw data to derived data (e.g., entities), insights, and even knowledge.

Related resources:

  continue reading

168 tập

Artwork
iconChia sẻ
 
Manage episode 248276630 series 61203
Nội dung được cung cấp bởi O'Reilly Media. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được O'Reilly Media hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.

In this episode of the Data Show, I speak with Alex Ratner, project lead for Stanford’s Snorkel open source project; Ratner also recently garnered a faculty position at the University of Washington and is currently working on a company supporting and extending the Snorkel project. Snorkel is a framework for building and managing training data. Based on our survey from earlier this year, labeled data remains a key bottleneck for organizations building machine learning applications and services.

Ratner was a guest on the podcast a little over two years ago when Snorkel was a relatively new project. Since then, Snorkel has added more features, expanded into computer vision use cases, and now boasts many users, including Google, Intel, IBM, and other organizations. Along with his thesis advisor professor Chris Ré of Stanford, Ratner and his collaborators have long championed the importance of building tools aimed squarely at helping teams build and manage training data. With today’s release of Snorkel version 0.9, we are a step closer to having a framework that enables the programmatic creation of training data sets.

Snorkel pipeline for data labeling
Snorkel pipeline for data labeling. Source: Alex Ratner, used with permission.

We had a great conversation spanning many topics, including:

  • Why he and his collaborators decided to focus on “data programming” and tools for building and managing training data.
  • A tour through Snorkel, including its target users and key components.
  • What’s in the newly released version (v 0.9) of Snorkel.
  • The number of Snorkel’s users has grown quite a bit since we last spoke, so we went through some of the common use cases for the project.
  • Data lineage, AutoML, and end-to-end automation of machine learning pipelines.
  • Holoclean and other projects focused on data quality and data programming.
  • The need for tools that can ease the transition from raw data to derived data (e.g., entities), insights, and even knowledge.

Related resources:

  continue reading

168 tập

Tất cả các tập

×
 
Loading …

Chào mừng bạn đến với Player FM!

Player FM đang quét trang web để tìm các podcast chất lượng cao cho bạn thưởng thức ngay bây giờ. Đây là ứng dụng podcast tốt nhất và hoạt động trên Android, iPhone và web. Đăng ký để đồng bộ các theo dõi trên tất cả thiết bị.

 

Hướng dẫn sử dụng nhanh

Nghe chương trình này trong khi bạn khám phá
Nghe