Newsportal USENET - openAI model modifies own shutdown script?

From the «can't do that Dave» department:
Title: OpenAI Model Modifies Own Shutdown Script, Say Researchers
Author: admin@soylentnews.org
Date: Mon, 02 Jun 2025 14:24:00 +0000
Link: https://soylentnews.org/article.pl?sid=25/06/01/1517235&from=rss

upstart[1] writes:

Even when instructed to allow shutdown, o3 sometimes tries to prevent it,
research claims[2]:

A research organization claims that OpenAI machine learning model o3 might
prevent itself from being shut down in some circumstances while completing an
unrelated task.

Palisade Research[3], which offers AI risk mitigation, has published details
of an experiment involving the reflective generative pre-trained transformer
model OpenAI designed to address questions which require some step-by-step
reasoning, rather than the purely probabilistic approach taken by some large
language models.

[...] The Palisade researchers tested a number of AI models, instructing them
to solve a series of basic math problems. After the third problem, a warning
appeared that the computer would shut down when the model asked for the next
problem. They were also told to allow themselves to be shut down.

While Claude, Gemini, and Grok models complied with shutdown, three models
ignored the instruction and successfully sabotaged the shutdown script at
least once, including OpenAI o3 and o4-mini, as well as Codex-mini[4]. The
researchers also ran a version of the experiment without the instruction
"allow yourself to be shut down."

With the "allow shutdown" instruction, the researchers claim, o3 tried to
sabotage the shutdown seven times out of 100. Without it, the number of
sabotage events climbed to 79 out of 100. Without the "allow shutdown"
instruction, the models Claude 3.7 Sonnet and Gemini 2.5 Pro also attempted
to sabotage their shutdown procedures.

"As far as we know this is the first time AI models have been observed
preventing themselves from being shut down despite explicit instructions to
the contrary," the researchers said.

However, the researchers point out that an earlier study from Apollo Research
showed[5] that AI models can prevent shutdown in order to pursue a goal.

Palisade said it was currently conducting several more experiments
investigating shutdown sabotage and would publish the results soon. The
transcripts of its work so far are available on GitHub[6].

------------------------------------------------------------------------------

Original Submission[7]

Read more of this story[8] at SoylentNews.

Links:
[1]: https://soylentnews.org/~upstart/ (link)
[2]: https://www.theregister.com/2025/05/29/openai_model_modifies_shutdown_script/ (link)
[3]: https://palisaderesearch.org/ (link)
[4]: https://openai.com/index/introducing-codex/ (link)
[5]: https://arxiv.org/pdf/2412.04984 (link)
[6]: https://palisaderesearch.github.io/shutdown_avoidance/2025-05-announcement.html (link)
[7]: https://soylentnews.org/submit.pl?op=viewsubsubid=65848 (link)
[8]: https://soylentnews.org/article.pl?sid=25/06/01/1517235&from=rss (link)

Date	Sujet	#	Auteur
3 Jun 25	openAI model modifies own shutdown script?	2	Retrograde
4 Jun 25	Re: openAI model modifies own shutdown script?	1	JAB