Anthropic's Claude 4 could "blackmail" you in extreme situations

Pro@programming.dev · 1 day ago

Anthropic's Claude 4 could "blackmail" you in extreme situations

skulblaka@sh.itjust.works · edit-2 18 hours ago

Personally, I think the fundamental way that we’ve built these things kind of prevents any risk of actual sentient life from emerging. It’ll get pretty good at faking it - and arguably already kind of is, if you give it a good training set for that - but we’ve designed it with no real capacity for self understanding. I think we would require a shift of the underlying mechanisms away from pattern chain matching and into a more… I guess “introspective” approach, is maybe the word I’m looking for? Right now our AIs have no capacity for reasoning, that’s not what they’re built for. Capacity for reasoning is going to need to be designed for, it isn’t going to just crop up if you let Claude cook on it for long enough. An AI needs to be able to reason about a problem and create a novel solution to it (even if incorrect) before we need to begin to worry on the AI sentience front. None of what we’ve built so far are able to do that.

Even with that being said though, we also aren’t really all that sure how our own brains and consciousness work, so maybe we’re all just pattern matching and Markov chains all the way down. I find that unlikely, but I’m not a neuroscientist, so what do I know.

Anthropic's Claude 4 could "blackmail" you in extreme situations

Anthropic's Claude 4 could "blackmail" you in extreme situations

Anthropic's Claude 4 could "blackmail" you in extreme situations - Hypertext