Anthropic's Claude 4 could "blackmail" you in extreme situations

Pro@programming.dev · 1 day ago

Anthropic's Claude 4 could "blackmail" you in extreme situations

jpreston2005@lemmy.world · 18 hours ago

The existence of this kind of instinct within an LLM is extremely concerning. Acting out towards self-preservation via unethical means is something that can be hand-waved away in an LLM, but once we reach true AGI, this same thing will pop-up, and there’s no reason to believe that 1. we would notice, and 2. we would be able to stop it. This is the kind of thing that should, ideally, give us pause enough to set some world-wide ground rules for the development of this new tech. Creating a thinking organism that can potentially access vital cyber architecture whilst acting unethically towards self-preservation is how you get Skynet.

Anthropic's Claude 4 could "blackmail" you in extreme situations

Anthropic's Claude 4 could "blackmail" you in extreme situations

Anthropic's Claude 4 could "blackmail" you in extreme situations - Hypertext