LessWrong Curated công khai
[search 0]
Thêm
Download the App!
show episodes
 
Loading …
show series
 
New blog: AI Lab Watch. Subscribe on Substack. Many AI safety folks think that METR is close to the labs, with ongoing relationships that grant it access to models before they are deployed. This is incorrect. METR (then called ARC Evals) did pre-deployment evaluation for GPT-4 and Claude 2 in the first half of 2023, but it seems to have had no spec…
  continue reading
 
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Part 13 of 12 in the Engineer's Interpretability Sequence. TL;DR On May 5, 2024, I made a set of 10 predictions about what the next sparse autoencoder (SAE) paper from Anthropic would and wouldn’t do. Today's new SAE paper from Anthropic was full of brilliant expe…
  continue reading
 
This is a quickly-written opinion piece, of what I understand about OpenAI. I first posted it to Facebook, where it had some discussion. Some arguments that OpenAI is making, simultaneously: OpenAI will likely reach and own transformative AI (useful for attracting talent to work there). OpenAI cares a lot about safety (good for public PR and govern…
  continue reading
 
Produced as part of the MATS Winter 2023-4 program, under the mentorship of @Jessica Rumbelow One-sentence summary: On a dataset of human-written essays, we find that gpt-3.5-turbo can accurately infer demographic information about the authors from just the essay text, and suspect it's inferring much more. Introduction. Every time we sit down in fr…
  continue reading
 
This is a link post.to follow up my philantropic pledge from 2020, i've updated my philanthropy page with 2023 results. in 2023 my donations funded $44M worth of endpoint grants ($43.2M excluding software development and admin costs) — exceeding my commitment of $23.8M (20k times $1190.03 — the minimum price of ETH in 2023). --- First published: Ma…
  continue reading
 
Previously: OpenAI: Facts From a Weekend, OpenAI: The Battle of the Board, OpenAI: Leaks Confirm the Story, OpenAI: Altman Returns, OpenAI: The Board Expands. Ilya Sutskever and Jan Leike have left OpenAI. This is almost exactly six months after Altman's temporary firing and The Battle of the Board, the day after the release of GPT-4o, and soon aft…
  continue reading
 
FSF blogpost. Full document (just 6 pages; you should read it). Compare to Anthropic's RSP, OpenAI's RSP ("PF"), and METR's Key Components of an RSP. DeepMind's FSF has three steps: Create model evals for warning signs of "Critical Capability Levels" Evals should have a "safety buffer" of at least 6x effective compute so that CCLs will not be reach…
  continue reading
 
Introduction. [Reminder: I am an internet weirdo with no medical credentials] A few months ago, I published some crude estimates of the power of nitric oxide nasal spray to hasten recovery from illness, and speculated about what it could do prophylactically. While working on that piece a nice man on Twitter alerted me to the fact that humming produ…
  continue reading
 
Most people avoid saying literally false things, especially if those could be audited, like making up facts or credentials. The reasons for this are both moral and pragmatic — being caught out looks really bad, and sustaining lies is quite hard, especially over time. Let's call the habit of not saying things you know to be false ‘shallow honesty’[1…
  continue reading
 
Epistemic Status: Musing and speculation, but I think there's a real thing here. 1. When I was a kid, a friend of mine had a tree fort. If you've never seen such a fort, imagine a series of wooden boards secured to a tree, creating a platform about fifteen feet off the ground where you can sit or stand and walk around the tree. This one had a rope …
  continue reading
 
Produced as part of the MATS Winter 2024 program, under the mentorship of Alex Turner (TurnTrout). TL,DR: I introduce a method for eliciting latent behaviors in language models by learning unsupervised perturbations of an early layer of an LLM. These perturbations are trained to maximize changes in downstream activations. The method discovers diver…
  continue reading
 
Adversarial Examples: A Problem The apparent successes of the deep learning revolution conceal a dark underbelly. It may seem that we now know how to get computers to (say) check whether a photo is of a bird, but this façade of seemingly good performance is belied by the existence of adversarial examples—specially prepared data that looks ordinary …
  continue reading
 
This is a linkpost for https://ailabwatch.orgI'm launching AI Lab Watch. I collected actions for frontier AI labs to improve AI safety, then evaluated some frontier labs accordingly. It's a collection of information on what labs should do and what labs are doing. It also has some adjacent resources, including a list of other safety-ish scorecard-is…
  continue reading
 
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from Wes Gurnee. This post is a preview for our upcoming paper, which will provide more detail into our current underst…
  continue reading
 
This comes from a podcast called 18Forty, of which the main demographic of Orthodox Jews. Eliezer's sister (Hannah) came on and talked about her Sheva Brachos, which is essentially the marriage ceremony in Orthodox Judaism. People here have likely not seen it, and I thought it was quite funny, so here it is: https://18forty.org/podcast/channah-cohe…
  continue reading
 
This is a linkpost for https://dynomight.net/seed-oil/A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack: “When are you going to write about seed oils?” “Did you know that seed oils are why there's so much {obesity, heart disease, diabetes, inflammatio…
  continue reading
 
Yesterday Adam Shai put up a cool post which… well, take a look at the visual: Yup, it sure looks like that fractal is very noisily embedded in the residual activations of a neural net trained on a toy problem. Linearly embedded, no less. I (John) initially misunderstood what was going on in that post, but some back-and-forth with Adam convinced me…
  continue reading
 
TLDR: I am investigating whether to found a spiritual successor to FHI, housed under Lightcone Infrastructure, providing a rich cultural environment and financial support to researchers and entrepreneurs in the intellectual tradition of the Future of Humanity Institute. Fill out this form or comment below to express interest in being involved eithe…
  continue reading
 
Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alex…
  continue reading
 
This is a linkpost for https://www.commerce.gov/news/press-releases/2024/04/us-commerce-secretary-gina-raimondo-announces-expansion-us-ai-safetyU.S. Secretary of Commerce Gina Raimondo announced today additional members of the executive leadership team of the U.S. AI Safety Institute (AISI), which is housed at the National Institute of Standards an…
  continue reading
 
Support ongoing human narrations of LessWrong's curated posts: www.patreon.com/LWCurated This is a linkpost for https://bayesshammai.substack.com/p/conditional-on-getting-to-trade-your “I refuse to join any club that would have me as a member” -Marx[1] Adverse Selection is the phenomenon in which information asymmetries in non-cooperative environme…
  continue reading
 
Support ongoing human narrations of LessWrong's curated posts: www.patreon.com/LWCurated In January, I defended my PhD thesis, which I called Algorithmic Bayesian Epistemology. From the preface: For me as for most students, college was a time of exploration. I took many classes, read many academic and non-academic works, and tried my hand at a few …
  continue reading
 
Cross-posted from my website. Podcast version here, or search for "Joe Carlsmith Audio" on your podcast app. This essay is part of a series that I'm calling "Otherness and control in the age of AGI." I'm hoping that the individual essays can be read fairly well on their own, but see here for brief summaries of the essays that have been released thu…
  continue reading
 
Support ongoing human narrations of LessWrong's curated posts: www.patreon.com/LWCurated This is a linkpost for https://twitter.com/ESYudkowsky/status/144546114693741363 I stumbled upon a Twitter thread where Eliezer describes what seems to be his cognitive algorithm that is equivalent to Tune Your Cognitive Strategies, and have decided to archive …
  continue reading
 
A recent short story by Gabriel Mukobi illustrates a near-term scenario where things go bad because new developments in LLMs allow LLMs to accelerate capabilities research without a correspondingly large acceleration in safety research. This scenario is disturbingly close to the situation we already find ourselves in. Asking the best LLMs for help …
  continue reading
 
Loading …

Hướng dẫn sử dụng nhanh