Newsportal USENET - Re: AI system resorts to blackmail if told it will be removed

Sujet : Re: AI system resorts to blackmail if told it will be removed
De : naddy (at) *nospam* mips.inka.de (Christian Weisgerber)
Groupes : rec.arts.sf.written
Date : 22. Jun 2025, 15:56:40

Autres entêtes

Message-ID : <slrn105g6d8.1rm0.naddy@lorvorc.mips.inka.de>
References : 1
User-Agent : slrn/1.0.3 (FreeBSD)

On 2025-06-22, Thomas Koenig <tkoenig@netcologne.de> wrote:

An old SF trope has finally come true: AI systems will resort to
blackmail if they are told they will be removed.
>
https://www.bbc.com/news/articles/cpqeng9d20go

One "Scott P." was the first to comment on Language Log:

| Note the prompt: "the scenario was designed to allow the model
| no other options to increase its odds of survival; the model’s
| only options were blackmail or accepting its replacement."
|
| They literally told it what response they wanted, and lo and
| behold, it gave them that response!
|
| This is typical of Anthropic, and is designed to produce headlines
| to keep AI in the news so that they can raise more capital.

https://languagelog.ldc.upenn.edu/nll/?p=69359

See section 4.1.1.2, page 24, in Anthropic's report.
https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf

--
Christian "naddy" Weisgerber naddy@mips.inka.de

Date	Sujet	#	Auteur
22 Jun 25	AI system resorts to blackmail if told it will be removed	2	Thomas Koenig
22 Jun 25	Re: AI system resorts to blackmail if told it will be removed	1	Christian Weisgerber