AF - Linear infra-Bayesian Bandits by Vanessa Kosoy

The Nonlinear Library: Alignment Forum

Nội dung được cung cấp bởi The Nonlinear Fund. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được The Nonlinear Fund hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.

26d ago 2:36

MP3•Trang chủ episode

Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Linear infra-Bayesian Bandits, published by Vanessa Kosoy on May 10, 2024 on The AI Alignment Forum. Linked is my MSc thesis, where I do regret analysis for an infra-Bayesian[1] generalization of stochastic linear bandits. The main significance that I see in this work is: Expanding our understanding of infra-Bayesian regret bounds, and solidifying our confidence that infra-Bayesianism is a viable approach. Previously, the most interesting IB regret analysis we had was Tian et al which deals (essentially) with episodic infra-MDPs. My work here doesn't supersede Tian et al because it only talks about bandits (i.e. stateless infra-Bayesian laws), but it complements it because it deals with a parameteric hypothesis space (i.e. fits into the general theme in learning-theory that generalization bounds should scale with the dimension of the hypothesis class). Discovering some surprising features of infra-Bayesian learning that have no analogues in classical theory. In particular, it turns out that affine credal sets (i.e. such that are closed w.r.t. arbitrary affine combinations of distributions and not just convex combinations) have better learning-theoretic properties, and the regret bound depends on additional parameters that don't appear in classical theory (the "generalized sine" S and the "generalized condition number" R). Credal sets defined using conditional probabilities (related to Armstrong's "model splinters") turn out to be well-behaved in terms of these parameters. In addition to the open questions in the "summary" section, there is also a natural open question of extending these results to non-crisp infradistributions[2]. (I didn't mention it in the thesis because it requires too much additional context to motivate.) 1. ^ I use the word "imprecise" rather than "infra-Bayesian" in the title, because the proposed algorithms achieves a regret bound which is worst-case over the hypothesis class, so it's not "Bayesian" in any non-trivial sense. 2. ^ In particular, I suspect that there's a flavor of homogeneous ultradistributions for which the parameter S becomes unnecessary. Specifically, an affine ultradistribution can be thought of as the result of "take an affine subspace of the affine space of signed distributions, intersect it with the space of actual (positive) distributions, then take downwards closure into contributions to make it into a homogeneous ultradistribution". But we can also consider the alternative "take an affine subspace of the affine space of signed distributions, take downwards closure into signed contributions and then intersect it with the space of actual (positive) contributions". The order matters! Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

386 tập

#The Nonlinear Fund #Podcasting Education #Of TexttoSpeech