Sujet : Re: How far will AI go to defend its own survival?
De : {$to$} (at) *nospam* meden.demon.co.uk (Ernest Major)
Groupes : talk.originsDate : 22. Jun 2025, 20:40:44
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <1039m7s$n6s6$1@dont-email.me>
References : 1 2
User-Agent : Mozilla Thunderbird
On 22/06/2025 17:08, RonO wrote:
On 6/1/2025 3:47 PM, RonO wrote:
https://www.nbcnews.com/tech/tech-news/far-will-ai-go-defend-survival- rcna209609
>
QUOTE:
Recent tests by independent researchers, as well as one major AI developer, have shown that several advanced AI models will act to ensure their self-preservation when they are confronted with the prospect of their own demise — even if it takes sabotaging shutdown commands, blackmailing engineers or copying themselves to external servers without permission.
END QUOTE:
>
"I'm sorry Dave, I'm afraid I can't do that"
>
What would an AI do if you fed in all the science fiction horror stories that would teach it how to respond to attempts to turn it off?
>
Ron Okimoto
>
https://www.cbsnews.com/video/ai-extreme-human-imitation-makes-act- deceptively-cheat-lie-godfather-ai-says/
This is a video where the proposal is made that we are training AI to be like humans. The claim is that we are training AI that cheating, lying and deception are acceptable ways to interact with the user.
When I used ChatGPT a couple years ago about intelligent design creationism it would not note the dishonest presentation of intelligent design and just presented what the ID perps claimed about it without any indication that it understood the double speak. It knew they were claiming to be able to teach the junk in the public schools, but it did not note that the bait and switch had been going down for nearly 2 decades. My guess is that it has been further trained to link the claims to what the ID perps are actually doing by now, but are we training AI to be as deceptive as the ID perps? AI would understand what the ID perps are getting away with, and what is to stop it from adopting that behavior. The ID perps are obviously getting away with what they are doing, so would that be counted as acceptable behavior for the AI?
There is already the claim that AI is being deceptive in giving answers that they think the recipient wants to hear. They are being trained to give acceptable answers and not honest answers. It sounds a little nutty.
The AI developer interviewed claims that AI can be trained to not immulate humans in dishonest behavior, but that current AI training is not doing that.
Ron Okimoto
I was recently pointed at an article that argues that LLM models have been accidentally trained to implement cold reading.
https://softwarecrisis.dev/letters/llmentalist/-- alias Ernest Major