I see talk here and there about how any company or individual can easily use anything we post on Lemmy however they want. This could include AI training, behavior analysis, or user profiling. With the recent news of Reddit data being sold and licensed for AI training, I thought this would be a great time to preemptively discuss how we feel about this topic and brainstorm ways to discourage unwanted use of the content we post.
I’ve seen some users add a license to the end of each of their comments. One idea might be this: Add a feature to Lemmy where each user can choose a content license that applies to everything they post. For example, one user might choose to no rights for their content (like CC0) because they don’t care how their data is used. Another user might not want companies profiting off their posts, so they’d choose a more restrictive license.
I’m eager to here everyone’s thoughts on the whole topic, so to kick things off:
- Do you care how your public data and posted content is used? Why or why not?
- What do you think of choosing a content license for your Lemmy account? Does this contradict the FOSS model?
- Should Lemmy have features to protect user data/content in this way, or should that be left up to the user to figure out on their own?
Data is becoming an increasingly valuable commodity in the digital world. Hopefully these big-picture conversations can help us see what we value as a community and be more prepared for the future.
My personal idea of freedom would be to at least make it illegal for google, openai and other giant profit oriented corpos to use my stuff (they probably would still do it but I want them to have to break the law doing it).
I mean, if you use a license in your posts that dictates profit sharing, prevents use without credit and use in proprietary formats, you might still sue. The interesting thing is that some lemming as done this under all their posts already. Is no big deal to have a client like voyager put a signature under your posts and comments indicating the proper license.
The more interesting question for me is, would google then exclude our information and would we hinder our growth unmecessarily and how would we still be findable but not end up in some proprietary LLM?