• Corkyskog@sh.itjust.works
    link
    fedilink
    arrow-up
    19
    ·
    9 months ago

    I was told reddit has already been scraped for AI and all sorts of stuff. There is very little new value to sell.

    • Ross_audio@lemmy.world
      link
      fedilink
      arrow-up
      10
      ·
      9 months ago

      Except AI models may end up having to start again with licences or public domain data.

      They are currently breaking the law and delaying legal action as long as possible in the hopes they can repeat the trick with a new data set.

      • besbin@lemmy.ml
        link
        fedilink
        arrow-up
        3
        ·
        9 months ago

        Whatever already existed won’t be thrown away regardless of the ruling. It’s like throwing all the gold already dug up just because it was done by slave labor. The law and legal actions are mostly just a moat around the pile of gold already dug up. Sure AI companies will have to pay more for the new data from other sources. However that would be peanut compared to how much they will have to pay starting from zero.

        • Ross_audio@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          9 months ago

          If every time what already exists gets used there’s a risk of a massive fine or court case they’ll throw it away.

          The game now is to delay the legal process long enough until they’ve built the replacement.

          Then they can afford to throw the, essentially faulty, model away.

            • Ross_audio@lemmy.world
              link
              fedilink
              arrow-up
              1
              ·
              9 months ago

              It’s clear from the output that it breaks copyright.

              We don’t have to look inside the black box to demand to see the input which caused that output.

              To be clear a machine is not responsible for itself. This machine was trained to break copyright.

              • fine_sandy_bottom@discuss.tchncs.de
                link
                fedilink
                arrow-up
                1
                ·
                9 months ago

                Generally if someone is clearly in breach of copyright the rights holder will apply to a court to issue an injunction to order that company to cease their activities until a case can be resolved.

                Given that has not happened, it seems that from a court’s perspective, it’s not a clear breach of copyright.

                • Ross_audio@lemmy.world
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  9 months ago

                  The rights holder first considers the size of the payout vs. the cost of legal fees.

                  Just because they haven’t been sued directly for this doesn’t make it infringement.

      • diffuselight@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        9 months ago

        No they’ll train on laundered model output. Like every llama.

        The investment thesis they the data is valuable is bonkers. It’s not. Not only has it been exfiltrated and can be laundered in a dozen ways, Reddit also won’t be able to effectively assert copyright.

        Look at Facebook. It’s full of reposted quora content now with AI images and AI laundered text.

        Reddit is dead

      • Tak@lemmy.ml
        link
        fedilink
        arrow-up
        2
        ·
        9 months ago

        Corporations break the law all the time and typically it’s just an operational expense.

          • Tak@lemmy.ml
            link
            fedilink
            arrow-up
            1
            ·
            9 months ago

            I don’t understand what you’re saying because I never said they were.

            • Ross_audio@lemmy.world
              link
              fedilink
              arrow-up
              1
              ·
              9 months ago

              My point is that corporations often see a fine as a cost of business because the fines are issued by a regulatory system that has no teeth.

              If you’re in a lawsuit against another corporation they are going after damages in civil court and it’s likely to be a high enough fine to stop the behaviour.

    • nicetriangle@kbin.social
      link
      fedilink
      arrow-up
      6
      ·
      9 months ago

      Yeah that was kinda my understanding too. And regardless of my feelings on it, I think rulings are mostly gonna go in AI’s favor.