#32 Deep tensor factorization and a pitfall for machine learning methods with Jacob Schreiber

the bioinformatics chat

Nội dung được cung cấp bởi Roman Cheplyaka. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Roman Cheplyaka hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.

5+ y ago 1:15:14

MP3•Trang chủ episode

In this episode, we hear from Jacob Schreiber about his algorithm, Avocado.

Avocado uses deep tensor factorization to break a three-dimensional tensor of epigenomic data into three orthogonal dimensions corresponding to cell types, assay types, and genomic loci. Avocado can extract a low-dimensional, information-rich latent representation from the wealth of experimental data from projects like the Roadmap Epigenomics Consortium and ENCODE. This representation allows you to impute genome-wide epigenomics experiments that have not yet been performed.

Jacob also talks about a pitfall he discovered when trying to predict gene expression from a mix of genomic and epigenomic data. As you increase the complexity of a machine learning model, its performance may be increasing for the wrong reason: instead of learning something biologically interesting, your model may simply be memorizing the average gene expression for that gene across your training cell types using the nucleotide sequence.

Links:

If you enjoyed this episode, please consider supporting the podcast on Patreon.

70 tập

#Bioinformatics #Genetics #Algorithms #Ngs #Roman Cheplyaka #Biology #Science #Natural Sciences #Sequence #Genomics