I see talk here and there about how any company or individual can easily use anything we post on Lemmy however they want. This could include AI training, behavior analysis, or user profiling. With the recent news of Reddit data being sold and licensed for AI training, I thought this would be a great time to preemptively discuss how we feel about this topic and brainstorm ways to discourage unwanted use of the content we post.

I’ve seen some users add a license to the end of each of their comments. One idea might be this: Add a feature to Lemmy where each user can choose a content license that applies to everything they post. For example, one user might choose to no rights for their content (like CC0) because they don’t care how their data is used. Another user might not want companies profiting off their posts, so they’d choose a more restrictive license.

I’m eager to here everyone’s thoughts on the whole topic, so to kick things off:

  1. Do you care how your public data and posted content is used? Why or why not?
  2. What do you think of choosing a content license for your Lemmy account? Does this contradict the FOSS model?
  3. Should Lemmy have features to protect user data/content in this way, or should that be left up to the user to figure out on their own?

Data is becoming an increasingly valuable commodity in the digital world. Hopefully these big-picture conversations can help us see what we value as a community and be more prepared for the future.

  • Faresh@lemmy.ml
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    8 months ago

    (me not lawyer nor study law)

    I’ve seen some users add a license to the end of each of their comments. One idea might be this: Add a feature to Lemmy where each user can choose a content license that applies to everything they post. For example, one user might choose to no rights for their content (like CC0) because they don’t care how their data is used. Another user might not want companies profiting off their posts, so they’d choose a more restrictive license.

    I don’t think licensing your content prevents it from being used in AI models, considering that services such as Copilot were trained on data such as GPL licensed source code without having to comply with the terms it imposes when modifying or copying GPL licensed code (but it’s not just resticted to restrictive licenses such as the GPL, since according to licenses such as the MIT they would also have to credit the authors of the original work). It seems that, for now, copyright law doesn’t apply to data generated by AI models and that they don’t need to comply with the terms of the licenses of the training data (or at least they don’t seem to have been penalized for violating copyright law yet AFAIK).

    And even if it wasn’t licensed, companies can’t use your works without your permission (unless it constitutes fair use). When you license a work, you are simply giving permission to other people to do things with your work they would otherwise not be allowed to do.