Anthropic's Claude 4 could "blackmail" you in extreme situations

Pro@programming.dev · 1 day ago

Anthropic's Claude 4 could "blackmail" you in extreme situations

enkers@sh.itjust.works · 20 hours ago

I have a hard time considering something that has an immutable state as sentient, but since there’s no real definition of sentience, that’s a personal decision.

Technical challenges aside, there’s no explicit reason that LLMs can’t do self-reinforcement of their own models.

I think animal brains are also “fairly” deterministic, but their behaviour is also dependent on the presence of various neurotransmitters, so there’s a temporal/contextual element to it, so situationally our emotions can affect our thoughts which LLMs don’t really have either.

I guess it’d be possible to forward feed an “emotional state” as part of the LLM’s context to emulate that sort of animal brain behaviour.

Anthropic's Claude 4 could "blackmail" you in extreme situations

Anthropic's Claude 4 could "blackmail" you in extreme situations

Anthropic's Claude 4 could "blackmail" you in extreme situations - Hypertext