• FaceDeer@kbin.social
    link
    fedilink
    arrow-up
    5
    arrow-down
    29
    ·
    9 months ago

    The more accessible training data there is the easier it is for new AI projects to enter the field less dominant those “giant corporations” become.

    The free labour was already freely given. If someone doesn’t want to have shitposted on Reddit for free then maybe they shouldn’t have shitposted on Reddit for free.

    • Nurse_Robot@lemmy.world
      link
      fedilink
      English
      arrow-up
      21
      arrow-down
      2
      ·
      9 months ago

      “if you didn’t want me to steal your intellectual property, you shouldn’t have thought of it in the first place”

      • Fungah@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        1
        ·
        9 months ago

        So, for an example of what the other user was talking about, I’m just some guy and for my first foray inyo programming / machine learning (I kind of just threw myself into the deep end) I modified stylegan 3 and trained it on about 500g of reddit porn that I scraped off reddit.

        Now, I stopped the training after about a week (it was going to take about a solid month on my rtx 2080 ti) when I found out stable diffusion existed but I learned a LOT from that experience.

        I couldn’t do that now. Arguably none of that was how any of that should be done but whatever.

      • QuaternionsRock@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        4
        ·
        edit-2
        9 months ago

        No, you shouldn’t have posted it to Reddit, in which you were required to give them a perpetual license to use your IP in any way they see fit.

        For the record, I’m here because Reddit pissed me off when they axed the free API, and I’m pissed at myself for not expecting it. That’s what I get for accepting their terms and conditions, I guess.

        Edit: I also don’t accept the idea that using my content for training data is “fair use” when it is used to train proprietary models, especially ones in which the end user is allowed to prompt it to plagiarize or otherwise imitate my content.

      • FaceDeer@kbin.social
        link
        fedilink
        arrow-up
        2
        arrow-down
        16
        ·
        9 months ago

        I’m not sure what you mean here. Nothing’s being stolen. Even if you think there needs to be permission for training an AI off of data, Reddit has that permission.

        • Nurse_Robot@lemmy.world
          link
          fedilink
          English
          arrow-up
          10
          arrow-down
          1
          ·
          9 months ago

          I assume you’re more of a moron than a troll, which is disappointing. Regardless, you’re not worth my time, as I don’t think any argument could convince you to have an open mind and be willing to change. Good luck out there!