Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !
Podcast đáng để nghe
TÀI TRỢ BỞI


Learning to Retrieve Passages without Supervision: finally unsupervised Neural IR?
Manage episode 355037189 series 3446693
In this third episode of the Neural Information Retrieval Talks podcast, Andrew Yates and Sergi Castella discuss the paper "Learning to Retrieve Passages without Supervision" by Ori Ram et al.
Despite the massive advances in Neural Information Retrieval in the past few years, statistical models still overperform neural models when no annotations are available at all. This paper proposes a new self-supervised pertaining task for Dense Information Retrieval that manages to beat BM25 on some benchmarks without using any label.
Paper: https://arxiv.org/abs/2112.07708
Timestamps:
00:00 Introduction
00:36 "Learning to Retrieve Passages Without Supervision"
02:20 Open Domain Question Answering
05:05 Related work: Families of Retrieval Models
08:30 Contrastive Learning
11:18 Siamese Networks, Bi-Encoders and Dual-Encoders
13:33 Choosing Negative Samples
17:46 Self supervision: how to train IR models without labels.
21:31 The modern recipe for SOTA Retrieval Models
23:50 Methodology: a new proposed self supervision task
26:40 Datasets, metrics and baselines
\33:50 Results: Zero-Shot performance
43:07 Results: Few-shot performance
47:15 Practically, is not using labels relevant after all?
51:37 How would you "break" the Spider model?
53:23 How long until Neural IR models outperform BM25 out-of-the-box robustly?
54:50 Models as a service: OpenAI's text embeddings API
Contact: castella@zeta-alpha.com
21 tập
Manage episode 355037189 series 3446693
In this third episode of the Neural Information Retrieval Talks podcast, Andrew Yates and Sergi Castella discuss the paper "Learning to Retrieve Passages without Supervision" by Ori Ram et al.
Despite the massive advances in Neural Information Retrieval in the past few years, statistical models still overperform neural models when no annotations are available at all. This paper proposes a new self-supervised pertaining task for Dense Information Retrieval that manages to beat BM25 on some benchmarks without using any label.
Paper: https://arxiv.org/abs/2112.07708
Timestamps:
00:00 Introduction
00:36 "Learning to Retrieve Passages Without Supervision"
02:20 Open Domain Question Answering
05:05 Related work: Families of Retrieval Models
08:30 Contrastive Learning
11:18 Siamese Networks, Bi-Encoders and Dual-Encoders
13:33 Choosing Negative Samples
17:46 Self supervision: how to train IR models without labels.
21:31 The modern recipe for SOTA Retrieval Models
23:50 Methodology: a new proposed self supervision task
26:40 Datasets, metrics and baselines
\33:50 Results: Zero-Shot performance
43:07 Results: Few-shot performance
47:15 Practically, is not using labels relevant after all?
51:37 How would you "break" the Spider model?
53:23 How long until Neural IR models outperform BM25 out-of-the-box robustly?
54:50 Models as a service: OpenAI's text embeddings API
Contact: castella@zeta-alpha.com
21 tập
Todos os episódios
×
1 AGI vs ASI: The future of AI-supported decision making with Louis Rosenberg 54:42

1 EXAONE 3.0: An Expert AI for Everyone (with Hyeongu Yun) 24:57

1 Zeta-Alpha-E5-Mistral: Finetuning LLMs for Retrieval (with Arthur Câmara) 19:35

1 ColPali: Document Retrieval with Vision-Language Models only (with Manuel Faysse) 34:48

1 Using LLMs in Information Retrieval (w/ Ronak Pradeep) 22:15

1 Designing Reliable AI Systems with DSPy (w/ Omar Khattab) 59:57

1 The Power of Noise (w/ Florin Cuconasu) 11:45

1 Benchmarking IR Models (w/ Nandan Thakur) 21:55

1 Baking the Future of Information Retrieval Models 27:05

1 Hacking JIT Assembly to Build Exascale AI Infrastructure 38:04

1 The Promise of Language Models for Search: Generative Information Retrieval 1:07:31

1 Task-aware Retrieval with Instructions 1:11:13

1 Generating Training Data with Large Language Models w/ Special Guest Marzieh Fadaee 1:16:14

1 ColBERT + ColBERTv2: late interaction at a reasonable inference cost 57:30

1 Evaluating Extrapolation Performance of Dense Retrieval: How does DR compare to cross encoders when it comes to generalization? 58:30

1 Open Pre-Trained Transformer Language Models (OPT): What does it take to train GPT-3? 47:12

1 Few-Shot Conversational Dense Retrieval (ConvDR) w/ special guest Antonios Krasakis 1:23:11

1 Transformer Memory as a Differentiable Search Index: memorizing thousands of random doc ids works!? 1:01:40

1 Learning to Retrieve Passages without Supervision: finally unsupervised Neural IR? 59:10

1 The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes 54:13

1 Shallow Pooling for Sparse Labels: the shortcomings of MS MARCO 1:07:17
Chào mừng bạn đến với Player FM!
Player FM đang quét trang web để tìm các podcast chất lượng cao cho bạn thưởng thức ngay bây giờ. Đây là ứng dụng podcast tốt nhất và hoạt động trên Android, iPhone và web. Đăng ký để đồng bộ các theo dõi trên tất cả thiết bị.