cross-posted from: https://lemmy.world/post/1246165

Two authors sued OpenAI, accusing the company of violating copyright law. They say OpenAI used their work to train ChatGPT without their consent.

  • _Rho_@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    1 year ago

    How can they prove this though? I don’t think they’d have any way to. Unless OpenAI straight up admits it. But like the article mentions, the data could still have been obtained legally.

    • phoneymouse@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 year ago

      Ask ChatGPT to summarize Sarah Silverman’s book. Ask it to give you a few quotes from it.

      How else would it be able to do that unless it had been trained using the book as an input.

      • RGB3x3@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 year ago

        It could have parsed it from some webpage it found, like a book review. It doesn’t necessarily have to be from the book itself.

        There are other ways of getting that info than actually injecting the original material.

      • _Rho_@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        Hmm. That’s a fair point. Lol.

        I suppose it’s possible that it was trained on articles and such that quote/summarize the book. But what you’re saying makes sense.

        • Moskus@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          1 year ago

          ChatGPT could have read 1000 other summaries of the book, it doesn’t have to read the actual book to make a summary. It can just rewrite don’t out the old ones.

  • iMike@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    1 year ago

    Soo, if I read a book without asking the author first, he can sue me for reading the book?

    • Moskus@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      Yes, apparently we do. It’s like there’s a correct way of reading a book, and if you read that book to improve your English you are doing it wrong

      This is going to be interesting. We’ll end up having to sign an EULA before reading soon…

  • berkeleyblue@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    1 year ago

    Look my problem with all of this is: AI doesnt steal copyrighted work, not really. It’s more like someone reading a book and being inspired to ise it for a project he has. We humans do that all the time, AI is just faster at it. So why should we treat a software differently than every other person ont the planet. What’s next? Are we suing people for playing songs that might have been inspired by another song? That’s sjust not how things work.

  • theachievers@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    Have no fear, citizens! The American Judicial system will adjudicate this conflict with characteristic speed and wisdom! Expect everything to be a kind of malevolent higgledy-piggledy for 30 years. After that, there’ll be some sort of tacit understanding of a gentleman’s agreement which will be used as a rule of thumb for certain non-monetized works which may be certified for limited un-scraping status. It’s win-win!

  • Michal@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    1 year ago

    I still haven’t grasped, how is human learning from a book different from machine learning from a book?

    If i read your book and then use the knowledge to answer someone’s question is it not the same if machine does it?

    Does chat gpt plagiarize the book word for word? Was it trained on an illegally obtained copy?

    Still, if i get knowledge from unlicensed copy of a book and use that knowledge, at what point is the law broken?

    • habanhero@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      1 year ago

      You as an individual are probably fine but ChatGPT is a large scale system being use commercially and for profit. Very different scenarios.

      Sarah Silverman also launched lawsuits against OpenAI and Meta and was able to show that dataset used to train one of the models (cant remember if its LLaMa or GPT) contained illegally obtained version of her book.

  • burrp@burrp.xyz
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    1 year ago

    How many reviews of their works have been posted online? You don’t need the source text.

  • b1ab@lem.monster
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Just because it’s publicly available on the internet does not mean it is public domain or not covered by copyrights. Attribution may end up being what is needed. A works cited list. I see licensing of works being ingested as a future moneymaker.

    • b1ab@lem.monster
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      Even more interesting is how will derivative works fit the model. Fun stuff ahead.