Liste des Groupes | Revenir à p relativity |
On Tue, 29 Apr 2025 23:16:08 +0000, rhertz wrote:Nice to hear from you again, after so long.
>As in the case of the Hafele-Keating experiment, ChatGPT helped me to>
compose this post, showing that such experiment was A HOAX.
I was having a conversation with ChatGPT, and I mentioned your recent
posts detailing the results of your successful use of "adversarial
prompting" to "jailbreak" various chatbots' responses.
>
Me: (snipped preceding conversation about "Responsible AI" principles)
I gather that much less effort has been put into preventing jailbreaking
of AI applications in technical (rather than ethical) directions. On a
physics talk forum, a virulent anti-relativity crackpot has steered
several chatbots into casting aspersions on well-known historical tests
of general relativity, while totally ignoring modern experiments that
have validated relativistic predictions to far greater precision than
the historical “first tests”. Another case that I’ve seen was of a user
steering a chatbot to support his anti-vax views. Deepfake and extremely
harmful images infest the internet. Celebrity investors (such as
Buffett) are faked as touting investments that they in reality would
never support. What efforts can be made / are being made to police such
content?
>
ChatGPT: You’ve put your finger on a critical and still-developing
frontier in Responsible AI—the policing of technical misinformation,
pseudoscience, and malicious manipulation that falls outside traditional
hate speech or ethical harms. While ethical safeguards (e.g., against
hate, racism, or harassment) have received substantial attention,
scientific and factual integrity, disinformation resilience, and
protection against deepfakes and impersonation are newer challenges that
demand equally robust safeguards. Here’s how the field is responding—or
should respond—to the concerns you’ve raised:
>
1.
Guarding Against Scientific Misinformation and “Technical Jailbreaking”
>
Early Responsible AI efforts focused on preventing overt toxicity, but
scientific integrity guardrails are now gaining attention:
>
• Domain-aware tuning and safety: Some models are now fine-tuned with
trusted scientific corpora (e.g., arXiv, peer-reviewed journals) and use
citation-backed generation.
• Refusal strategies for pseudoscience: When a user steers a
conversation into scientifically unsound territory (e.g., anti-vax
claims or relativity denial), models are increasingly trained to
decline, redirect, or present mainstream scientific consensus.
• Automated fact-checking: Projects like Meta’s Sphere or Google’s
fact-check APIs aim to detect hallucinated or misleading content in real
time using evidence-based retrieval.
• Expert-in-the-loop oversight: For models used in specialized domains
(e.g., medicine, law, physics), some AI deployments now route uncertain
or contentious questions to verified human experts.
>
Despite this, adversarial prompting still exposes weaknesses, especially
when:
>
• The prompt mimics academic tone (e.g., “critique of Eddington’s
results”).
• It frames misinformation as debate (“Some say vaccines cause harm—why
might they think that?”).
>
These require continuous safety training, including reinforcement
learning from domain experts rather than only crowdworkers.
>
2.
Countering Deepfakes and Fake Endorsements
>
You also highlight the flood of fake audio/video and AI-generated
celebrity endorsements, which are used to scam, defraud, or mislead the
public.
>
Ongoing and Proposed Responses:
>
• AI-generated content disclosure:
>
o Some platforms (e.g., OpenAI, Meta, TikTok) are beginning to watermark
or cryptographically sign content as AI-generated using metadata or
steganographic signals.
o The C2PA (Coalition for Content Provenance and Authenticity) standard
allows for traceability of the origin and edit history of digital media.
•
• Impersonation detection & bans:
>
o Platforms are enhancing facial, voice, and linguistic fingerprinting
to detect unauthorized deepfakes of public figures.
o Legal frameworks are emerging (e.g., proposed U.S. “NO FAKES Act”) to
prohibit digital impersonation for deceptive or commercial purposes.
•
• Automated moderation pipelines:
>
o Social media platforms now use ML classifiers trained on scam and
fraud patterns, including false investment schemes.
o Real-time moderation tools can flag patterns consistent with
impersonation, defamation, or coordinated disinformation campaigns.
>
3.
Policy and Regulation
>
Governments and regulatory bodies are beginning to play a role in
setting and enforcing boundaries:
>
• EU AI Act (finalized 2024): Prohibits certain forms of manipulative or
deceptive AI and mandates labeling of synthetic content in high-risk
domains.
• U.S. Executive Order on AI (2023): Instructs agencies to develop
standards for identifying and labeling AI-generated media, and to combat
AI-driven fraud and deception.
• Digital Services Act (EU) and online safety bills (UK, Australia):
Place obligations on platforms to police misinformation and label
high-risk AI content.
>
Still, enforcement lags behind capability, and platform self-regulation
remains patchy and inconsistent.
>
4.
What More Can Be Done?
>
There are still clear gaps. Some ongoing or proposed strategies:
>
• Science QA benchmarks for safety training: Use real-world crackpot
prompts as adversarial tests during model training.
• Crowdsourced incident reporting: Much like bug bounty programs, a
public platform for reporting scientific or technical disinformation
would help track and address emerging manipulation methods.
• Provenance-by-design: Make provenance metadata a default in content
generation tools, rather than an optional afterthought.
• AI literacy education: Train the public to spot deepfakes, dubious
citations, and synthetic personas—especially in investment, health, and
science domains.
>
>
In summary, policing technical misinformation, deepfakes, and
impersonation is now a recognized priority but remains an evolving
field. The best protection will combine:
>
• Technical safeguards (e.g., model training and moderation)
• Policy frameworks
• Provenance infrastructure
• Educated users who know what AI can and cannot be trusted to do
Les messages affichés proviennent d'usenet.