• Fake4000@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    139
    arrow-down
    1
    ·
    5 months ago

    Shit move from Reddit. Glad I jumped ship to lemmy.

    Honestly, lemmy has less users compared to Reddit, yet you still get more engagement.

  • Boozilla@lemmy.world
    link
    fedilink
    English
    arrow-up
    100
    arrow-down
    1
    ·
    5 months ago

    I don’t miss the dipshits, pun spammers, and smug power mods of reddit at all. I do miss their niche subs and smarter users. Like it or not, they do have some brainy folks peppered among the shit posters.

    We have some good folks here, too. Just need more of them.

    It’s a shame reddit has been dialing up the shit faucet slowly enough that most of their users don’t notice how awful it is now. They’ve grown accustomed to the poor quality of the content and weaponized greed of the owners.

    • Fake4000@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      46
      ·
      5 months ago

      In all honesty, when I joined Reddit right after digg went to shit. It was amazing. Reddit was great, 3rd party apps were welcome, their interface was straightforward, and they had none of those NFT gold shit.

      It just went downhill.

      • NotSteve_@lemmy.ca
        link
        fedilink
        English
        arrow-up
        13
        arrow-down
        1
        ·
        5 months ago

        At that point, they were also open source which was super cool. I always wanted that profile badge you got for submitting a merged PR.

        Reddit really went downhill fast after ~2015. I think Lemmy will get there eventually. I remember reddit being a lot smaller back then as well. It took a while to get to the point where niche communities could thrive and I do believe we’ll see that happen here as well (even if it takes a decade or so)

      • OmanMkII@aussie.zone
        link
        fedilink
        English
        arrow-up
        4
        ·
        5 months ago

        I joined maybe 6 years ago, and there was a bit of shit talking and most posts had a troll answer hitting the most votes for some reason, but it was usually pretty good to scroll straight past and find some really insightful comments. There was a lot of good stuff around reddit, but slowly the absurb number of awards, NFT avatars, reposts, and ads every third post started to corrupt it. It was simple enough to switch to a third party app for quite a while, but the garbage slowly took over.

        Even if they hadn’t pulled 3rd party apps, it was getting pretty close a point where it wasn’t worth scrolling past the bullshit.

    • deweydecibel@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      1
      ·
      edit-2
      5 months ago

      smug power mods of reddit at all.

      Oh they’re here too. They’re not causing too much drama because there’s not enough going on, but they’re here. Some of them are admins of certain instances.

      The ones that aren’t here yet will eventually find their way here when Lemmy continues to grow. And the most concerning thing about that is how many more tools Lemmy is providing them to fuck with users.

      At least on Reddit, mods couldn’t see votes. Lemmy actually just made it easier for them.

    • Ragnarok314159@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      7
      ·
      5 months ago

      I left Reddit. Had over 600k Karma after a few years answering all kinds of questions from Veteran help to complex engineering.

      Fuck Reddit. Will never go back. It’s a shell of what it was only a few years ago.

  • pixxelkick@lemmy.world
    link
    fedilink
    English
    arrow-up
    56
    arrow-down
    3
    ·
    edit-2
    5 months ago
    1. Called this awhile back, this is why Reddit has such a high evaluation.

    2. Poisoning your data won’t do anything but give them more data, do you seriously think reddit servers don’t track every edit you make to posts? You’d literally just be providing training data of original human vs poisoned. They’d still have your original post, and they have a copy of everytime you edit it.

    3. Whoever buys reddit will have sole access to one of the larger (I don’t think largest though) pools of text training Data on the internet, with full licensed usage of it. I expect someone like Google, FB, MS, OpenAI, etc would pay big $$$ for that.

    “But can’t people already scrape it?”

    1. Well yes, but it’s at best legally dubious in some places

    2. Scraping Data off reddit only gets you current versions of posts (which means you can get poisoned dara, and cant see deleted content), and is extremely slow… if you own the server you have first class access to all posts in a database, including g the originals and diffs of everytime soneone edited a post, and all the deleted posts too.

    Think about if you perhaps wanted to train an AI to detect posts that require flagging for moderation, if you scrape reddit data, you can’t find deleted posts that got moderated…

    But, if you have the raw original data, you 100% would have a list of every post that got deleted by mods and even the mod message on why it was deleted

    You surely can see the value of such data, that only owners of reddit are currently privy to atm…

    • Buddahriffic@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      1
      ·
      5 months ago

      They’ve also got vote counts and breakdowns of who is making those votes. This data will be worth more for AI training than any similar volume of data other than maybe the contents of Wikipedia. Assuming they didn’t have it set up to delete the vote breakdowns when they archived threads.

      Why are those breakdowns worth so much? Because they can be used to build profiles on each voter (including those who only had lurker accounts to vote with), so they can build AIs that know how to speak with the MAGA cult, Republicans who aren’t MAGA, liberals, moderates, centrists, socialists, communists, anarchists. Not only that, they’ll be able to look at how sentiments about various things changed over time with each of these groups, watch people move from one to another as their opinions evolved, see how someone pretends to be a member of whatever group (assuming they voted honestly and posted under their fake persona).

      Oh and also, all of that data is available through the fediverse but it’s free to train on to anyone who sets up a server. Which makes me question whether the fediverse is a good thing because even changing federation to opt-in instead of opt-out just covers whether your server accepts data from another. It’s always shared.

      Open and private are on opposite sides of a spectrum. You can’t have both, best you can do is settle for something in the middle.

      • Breezy@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 months ago

        What if reddit also kept all deleted comments and post, im sure there are shit loads of things people type out just to delete, thinking all the while it’ll never see the light of day.

        • Buddahriffic@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          ·
          5 months ago

          I’d be surprised if they don’t keep all of that. There were a number of sites for looking at deleted posts. They’d just go and grab everything and compare what was still there with what wasn’t and highlight the stuff that wasn’t there anymore.

          Which is also possible here, though the mod log reduces the need for it. But if someone is looking for posts people change their mind about wanting anyone to see, deleting it highlights it instead of hides it for anyone who is watching for that.

          • Breezy@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            5 months ago

            I think that site was unddit, but yes those were posted then later deleted. Im talking about just typing out a post or comment and never posting just simply backing out of the page or hitting cancel. Im not just if any of that is stored on the site or just locally.

            • Buddahriffic@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              5 months ago

              Oh, yeah, I’ve wondered the same myself. Hell, that might have been a motivation for removing the API access.

            • sacredfire@programming.dev
              link
              fedilink
              English
              arrow-up
              1
              ·
              5 months ago

              You would be able to tell by monitoring the network tab of the browser developer tools. If post requests are being made (which they probably are, though I’m too lazy to go check) while you are typing a comment, they are most likely saving work in progress records for comments.

        • pixxelkick@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 months ago

          They definitely do, it’s common for such systems to never actually delete anything because storage is cheap. It likely just is flagged deleted=true and the searches just return WHERE [post].Deleted = False on queries on the backend.

          So it looks deleted to the consumer, but it’s all saved and squirreled away on the backend.

          It’s good to keep all this shit for both legal reasons (if someone posts illegal stuff then deletes it, you still can give it to the feds), as well as auditing (mods can’t just delete stuff to cover it up, the original still exists and admins can see it)

      • pixxelkick@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 months ago

        Which makes me question whether the fediverse is a good thing

        I’d argue it’s good, because it means open source AI has a fighting chance with FOSS data to train on without needing to fork over a morbillion dollars to Reddits owners.

        Whatever use cases the reddit data can train on, FOSS researchers can repeat it on Lemmy data and release free models that average joes can use on their own without having to subscribe to shit like Microsoft Copilot and friends to stay relevant.

    • Dettweiler@lemm.ee
      link
      fedilink
      English
      arrow-up
      4
      ·
      5 months ago

      In regards to the editing part, sure, I’m sure they can track your edit history. However, on a large scale, most edits are going to be to correct things. To determine if an edit was to poison the text, it would likely require manual review and flagging. There’s no way they’re going to sift through all of the edits on individual accounts to determine this, so it’s still worthwhile to do.

      • T156@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 months ago

        Although they could sidestep the issue a bit by simply comparing the changes between edits. Huge changes could just be discarded, while minor ones are fine.

    • Milk_Sheikh@lemm.ee
      link
      fedilink
      English
      arrow-up
      3
      ·
      5 months ago

      sigh

      So the old trick of “search term +reddit” no longer will work then huh?

      I’ve already made a habit of adding date limiters to web results from before before LLMs were made public… The SEO ‘optimization’ game of before was bearable, but the LLM spam just ruins so many search results with regurgitated garbage or teaspoon deep information

      • Nelots@lemm.ee
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        5 months ago

        search term +reddit

        tossing site:reddit.com before any search will guarantee all results come from reddit, if that’s what you’re looking for.

      • Dettweiler@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 months ago

        During the peak of the great purge, it was quickly becoming pointless. A lot of results were bringing up deleted posts. It took a while for search engines to catch up and start filtering a lot of those results out.

    • Falcon@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      With respect to 2, it would stop others scrapping the content to train more open models on. This would essentially give Reddit exclusive access to the training data.

    • afraid_of_zombies@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      Sounds like something a bunch of governments would be interested in. As you pointed out you get to see why human mods made certain decisions. Could you an edge in manipulation.

      • deweydecibel@lemmy.world
        link
        fedilink
        English
        arrow-up
        16
        ·
        edit-2
        5 months ago

        And ya know what? Frankly, if AI is going to harvest all this shit, I’d rather fuckers like spez couldn’t get rich off it in the process. Granted I’m not happy the tech bros running these AI companies are getting rich with these fucking things, but I can at least take solace that, for Lemmy at least, there isn’t some asshole middle man making bank off the work and words of users they never paid a dime to.

        Genuinely, why does Sepz and Reddit deserve to make money off anything we posted? Why does any social media site? They make the site, pay for the servers, maintain the apps, sure, and they can get compensation for that, I don’t see a problem there. But why does any social media company deserve to get rich when the only thing that makes their platform valuable is the people that post to it? Reddit didn’t even have paid mods, the community did all the work on the content of that site, why in the fuck do we tolerate these assholes making profit off it like this?

        • 👍Maximum Derek👍@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          3
          ·
          5 months ago

          If the EU (or any other governments) decide that AI can’t legally train their models on information they don’t own or license (I don’t know how that would work legally but they talk about it), then this company that Reddit has sold access to could argue to lawmakers that they have license for all the content on Reddit. I don’t know that it would hold up, but I suspect it’s part of the company’s perceived value in this Reddit deal.

    • OmanMkII@aussie.zone
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      2
      ·
      5 months ago

      I was curious if a robots.txt equivalent exists for AI training data, and there was some solid points here:

      If I go to your writing, I read it & learn from it. Your writing influences my future writing. We’ve been okay with this as long as it’s not a blatant forgery.

      If a computer goes to your writing, it reads it & learns from it. Your writing influences its future writing. It seems we are not okay with this, even if it isn’t blatant forgery.

      [AI at the moment is] different because the company is re-using your material to create a product they are going to sell. I’m not sure if I believe that is so different than a human employee doing the same thing.

      https://news.ycombinator.com/item?id=34324208

      I still think we should have the ability to opt out like we do with search engines and webcrawlers, but if the algorithm works ideally and learns but does not recycle content, is it truly any different from a factory of workers pumping out clones of popular series on Amazon? I honestly don’t know the answer to that.

      • Appoxo@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        6
        ·
        5 months ago

        Afaik the OpenAI bot may choose to ignore it? At least that’s what another user claimed it does.

        • JohnEdwa@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          10
          ·
          5 months ago

          Robots.txt has been always ignored by some bots, it’s just a guideline originally meant to prevent excessive bandwidth usage by search indexing bots and is entirely voluntary.

          Archive.org bot for example has completely ignored it since 2017.

      • deweydecibel@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        edit-2
        5 months ago

        The problem is not the technology, the problem is the businesses and the people behind them.

        These tools were made with the explicit purpose of taking the content that they did not create, repurposing them, and creating a product. Throw all these conversation about intelligence and learning out the fucking window, what matters is what the thing does, and why it was created to do that thing.

        Until we reach a point where there is some sort of AI out there that has any semblance of free will, and can choose not to learn if fed certain information, and choose not to respond to input given to it without being programmed to do not respond, then we are not talking about intelligence, we are talking about a tool. No matter how they dress it up.

        Stop arguing about this on their terms, because they’re gaslighting the fuck out of you.

      • Mossy Feathers (They/Them)@pawb.social
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        5 months ago

        This is kinda my take on it. However, the way I see it is that the AI isn’t intelligent enough yet to truly create something original. As such, right now AI is closer to being a tool than a being. Because of that, it somewhat bothers me that I’m being used to teach a tool. If I thought that companies like OpenAI were truly trying to create beings and not tools, then I’d feel differently.

        It’s kinda nuanced, but a being can voluntarily determine whether or not something is copyright infringing, understand why that might be an issue, and then decide whether or not to continue writing based on that. A tool can’t really do that. You can try and add filters to a tool to avoid writing copy written text, but that will have flaws and holes in it. A being who understands what it’s writing and what makes it plagiarism vs reference vs homage/inspiration/whatever is less likely to have those issues.

    • rar@discuss.online
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      5 months ago

      It’s all federated, so it would be strange the bots didn’t scrape anything off.

  • JigglypuffSeenFromAbove@lemmy.world
    link
    fedilink
    English
    arrow-up
    20
    ·
    5 months ago

    Slightly unrelated question, but is there an easy way to delete all my Reddit posts and comments? I used the Nuke add-on in the past, but it doesn’t work anymore.

    I wanna delete my Reddit account, but I’d prefer to erase my history before doing that.

  • db2@lemmy.world
    link
    fedilink
    English
    arrow-up
    13
    arrow-down
    1
    ·
    5 months ago

    Greedy little pigboy Steve couldn’t resist. Every day they seem to do something that reaffirms leaving was the best plan.

  • Wolpertinger@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    10
    ·
    5 months ago

    So I need to run any comments I make to reddit by chatgpt before posting, it seems. I heard ai training ai leads to a poisoned data set.

    • Fishbone@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      For text, AI training AI wouldn’t be all that great for giving data sets a little poison ivy rubdown, because at the end of the day, the message is still moderated by a non bot. I think a better way would be to write more unconventionally, but heavily contextual so that if specifics texts are ripped and tossed into the bot blender, it’ll make no sense without the context alongside it.

      Slang, edge case wording, and verbing non verbs would likely do a lot of heavy lifting in that department.

    • General_Effort@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      Yeah, I heard that, too. Consider that people who don’t like tech may not have very reliable knowledge of tech. Regardless, OAI would appreciate your business.

  • xantoxis@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    ·
    5 months ago

    Damn. I keep meaning to use one of those things that deletes all your reddit data. I doubt it’ll actually do anything (reddit has no ethical framework so they won’t think twice about indexing “deleted” data) but I still need to do that.

    • ipkpjersi@lemmy.ml
      link
      fedilink
      English
      arrow-up
      6
      ·
      5 months ago

      I’d bet a year of my salary that it only deletes it from public view so people can no longer get helped from Reddit’s Google search results, but a copy (or more than one copy) is still retained on their internal servers.

      • Dettweiler@lemm.ee
        link
        fedilink
        English
        arrow-up
        9
        ·
        5 months ago

        The trick is to turn everything into randomized garbage and then delete it later. A lot of those purge services offer that feature. It just swaps the words with others; so on the surface it looks like proper written text, but it makes absolutely no sense.

        Aside from removing your content that they’re profiting from, it also feeds AI scrapers pure garbage in the event that your content is restored.

        • Crackhappy@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 months ago

          Yep. I did that over a month to all of my posts and comments, then deleted it all a week later before deleting my account.

      • HonorIsDead@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        5 months ago

        Maybe I’m miss remembering but weren’t they restoring stuff users deleted during the API protest?

        • philodendron@lemdro.id
          link
          fedilink
          English
          arrow-up
          4
          ·
          5 months ago

          They were. One user got so upset he live-streamed himself individually deleting every post and comment he’d ever made. Reddit restored it all right after.

  • Everythingispenguins@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    ·
    5 months ago

    I am willing to bet the most active subreddits that are not too bot infested are the NSFW ones. Reddit AI is going to be creepy and horny.

    • FaceDeer@kbin.social
      link
      fedilink
      arrow-up
      2
      arrow-down
      2
      ·
      5 months ago

      AI trainers do a lot of work filtering and reformatting the training data. Often that’s the most expensive part. There’s a lot of synthetic data used these days too, reprocessed by other AIs.

  • wise_pancake@lemmy.ca
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    5 months ago

    We should have been posting factually incorrect information instead of deleting posts this whole time.

    Although I think Reddit does a good job paying factually incorrect information on its own.

  • MxM111@kbin.social
    link
    fedilink
    arrow-up
    7
    ·
    5 months ago

    I don’t mind to give my content for AI training. But with my approval and for free.

        • FaceDeer@kbin.social
          link
          fedilink
          arrow-up
          1
          arrow-down
          4
          ·
          5 months ago

          “But with my approval and for free” are new conditions that weren’t present when you originally published it on Reddit.

          • MxM111@kbin.social
            link
            fedilink
            arrow-up
            2
            ·
            5 months ago

            Yes, but I did not mean retroactively. Nor did I mean only on Reddit, by the way. However, making money from already published content is not what I have consented when I joined Reddit like 15 years ago.

            • FaceDeer@kbin.social
              link
              fedilink
              arrow-up
              1
              arrow-down
              2
              ·
              5 months ago

              From the current Reddit User Agreement:

              You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

              When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

              I found a historical version from 10 years ago and that version already had this:

              you agree that by posting messages, uploading files, inputting data, or engaging in any other form of communication with or through the Website, you grant us a royalty-free, perpetual, non-exclusive, unrestricted, worldwide license to use, reproduce, modify, adapt, translate, enhance, transmit, distribute, publicly perform, display, or sublicense any such communication in any medium (now in existence or hereinafter developed) and for any purpose, including commercial purposes, and to authorize others to do so.

              Haven’t dug up anything earlier than this, do you know of any?

              Basically, you gave Reddit your approval long ago.

    • HACKthePRISONS@kolektiva.social
      link
      fedilink
      arrow-up
      7
      ·
      5 months ago

      spez says that’s how he got reddit off the ground in the first place: faking content/engagement (well, genuinely engaging with his account(s?), but essentially shouting into the void and hoping enough people heard and wanted to stick around.

      with a RedditUserBot trained on reddit users, you might be able to fake another decade of growth.