LW - Shard Theory - is it true for humans? by Rishika

The Nonlinear Library: LessWrong

Nội dung được cung cấp bởi The Nonlinear Fund. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được The Nonlinear Fund hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.

3M ago 27:05

MP3•Trang chủ episode

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on September 22, 2024 16:12 (5d ago)

What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Shard Theory - is it true for humans?, published by Rishika on June 14, 2024 on LessWrong.
And is it a good model for value learning in AI?
TLDR
Shard theory proposes a view of value formation where experiences lead to the creation of context-based 'shards' that determine behaviour. Here, we go over psychological and neuroscientific views of learning, and find that while shard theory's emphasis on context bears similarity to types of learning such as conditioning, it does not address top-down influences that may decrease the locality of value-learning in the brain.
What's Shard Theory (and why do we care)?
In 2022, Quintin Pope and Alex Turner posted '
The shard theory of human values', where they described their view of how experiences shape the value we place on things. They give an example of a baby who enjoys drinking juice, and eventually learns that grabbing at the juice pouch, moving around to find the juice pouch, and modelling where the juice pouch might be, are all helpful steps in order to get to its reward.
'Human values', they say, 'are not e.g. an incredibly complicated, genetically hard-coded set of drives, but rather sets of contextually activated heuristics…' And since, like humans, AI is often trained with reinforcement learning, the same might apply to AI.
The original post is long (over 7,000 words) and dense, but Lawrence Chan helpfully posted a condensation of the topic in '
Shard Theory in Nine Theses: a Distillation and Critical Appraisal'. In it, he presents nine (as might be expected) main points of shard theory, ending with the last thesis: 'shard theory as a model of human values'. 'I'm personally not super well versed in neuroscience or psychology', he says, 'so I can't personally attest to [its] solidity…I'd be interested in hearing from experts in these fields on this topic.' And that's exactly what we're here to do.
A Crash Course on Human Learning
Types of learning
What is learning? A baby comes into the world and is inundated with sensory information of all kinds. From then on, it must process this information, take whatever's useful, and store it somehow for future use.
There's various places in the brain where this information is stored, and for various purposes. Looking at these various types of storage, or memory, can help us understand what's going on:
3 types of memory
We often group memory types by the length of time we hold on to them - 'working memory' (while you do some task), 'short-term memory' (maybe a few days, unless you revise or are reminded), and 'long-term memory' (effectively forever). Let's take a closer look at long-term memory:
Types of long-term memory
We can broadly split long-term memory into 'declarative' and 'nondeclarative'. Declarative memory is stuff you can talk about (or 'declare'): what the capital of your country is, what you ate for lunch yesterday, what made you read this essay. Nondeclarative covers the rest: a grab-bag of memory types including knowing how to ride a bike, getting habituated to a scent you've been smelling all day, and being motivated to do things you were previously rewarded for (like drinking sweet juice).
For most of this essay, we'll be focusing on the last type: conditioning.
Types of conditioning
Conditioning
Sometime in the 1890s, a physiologist named Ivan Pavlov was researching salivation using dogs. He would feed the dogs with powdered meat, and insert a tube into the cheek of each dog to measure their saliva.As expected, the dogs salivated when the food was in front of them. Unexpectedly, the dogs also salivated when they heard the footsteps of his assistant (who brought them their food).
Fascinated by this, Pavlov started to play a metronome whenever he gave the dogs their food. After a while, sure enough, the dogs would salivate whenever the metronome played, even if ...

1851 tập

#The Nonlinear Fund #Podcasting Education #Of TexttoSpeech