• GamingChairModel@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    5 months ago

    Yeah, I’m not a fan of AI but I’m generally of the view that anything posted on the internet, visible without a login, is fair game for indexing a search engine, snapshotting a backup (like the internet archive’s Wayback Machine), or running user extensions on (including ad blockers). Is training an AI model all that different?

    • sugar_in_your_tea@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      Yes, it kind of is. A search engine just looks for keywords and links, and that’s all it retains after crawling a site. It’s not producing any derivative works, it’s merely looking up an index of keywords to find matches.

      An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues. Whether a particular generated result violates copyright depends on the license of the works it’s based on and how much of those works it uses. So it’s complicated, but there’s very much a copyright argument there.

      • TheRealKuni@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 months ago

        An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues.

        Derivative works are not copyright infringement. If LLMs are spitting out exact copies, or near-enough-to-exact copies, that’s one thing. But as you said, the whole point is to generate derivative works.