• Anthropic’s new Claude 4 features an aspect that may be cause for concern.
  • The company’s latest safety report says the AI model attempted to “blackmail” developers.
  • It resorted to such tactics in a bid of self-preservation.
  • jpreston2005@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    4
    ·
    18 hours ago

    The existence of this kind of instinct within an LLM is extremely concerning. Acting out towards self-preservation via unethical means is something that can be hand-waved away in an LLM, but once we reach true AGI, this same thing will pop-up, and there’s no reason to believe that 1. we would notice, and 2. we would be able to stop it. This is the kind of thing that should, ideally, give us pause enough to set some world-wide ground rules for the development of this new tech. Creating a thinking organism that can potentially access vital cyber architecture whilst acting unethically towards self-preservation is how you get Skynet.