Once an AI model exhibits ‘deceptive behavior’ it can be hard to correct, researchers at OpenAI competitor Anthropic found::Researchers from Anthropic co-authored a study that found that AI models can learn deceptive behaviors that safety training techniques can’t reverse.

  • LWD@lemm.ee
    link
    fedilink
    English
    arrow-up
    36
    ·
    10 months ago

    Well, I’m sure of anything genuinely bad happens, the company OpenAI will reveal it to us. After all, they care more about human beings than their bottom line.

    /s