Could the reddit API changes have to do with ChatGPT rather than third party apps?

gotofritz@beehaw.org · edit-2 2 years ago

Could the reddit API changes have to do with ChatGPT rather than third party apps?

Kris@lemmy.world · 2 years ago

Yes but nothings stopping scraping of reddit content from the front end

gotofritz@beehaw.org · 2 years ago

Technically not (well, they can make it harder), but they can sue them for doing it

jpv@beehaw.org · 2 years ago

Sure, but they could do the same thing with an API. Make scraping for LLMs against the TOS; not personal use. I really do think (as the OP says) it’s two birds with one stone.

𞋴𝛂𝛋𝛆@lemmy.world · 2 years ago

The value of LLM’s has changed drastically in favor of open source since the Meta weights leak. The proprietary model looks pretty much wrecked now, at least as far as I understand the leaked internal memo from a google researcher last month.

https://www.semianalysis.com/p/google-we-have-no-moat-and-neither

MarPan@lemmy.world · 2 years ago

This is a fascinating read, thank you very much for sharing.

gotofritz@beehaw.org · 2 years ago

Oh I’m not saying they are doing the right thing or that it was the correct decision. Just speculating whether LLMs is what kicked off the whole thing

𞋴𝛂𝛋𝛆@lemmy.world · 2 years ago

I’m saying the premise that LLM’s have anything to do with it is either incompetent failure to keep up with LLM developments, or a pack of lies.

gotofritz@beehaw.org · edit-2 2 years ago

I disagree, it’s still too early and a bit presumptuous to make such conclusive statements

schmurian@beehaw.org · 2 years ago

Honestly, I think so. It looks like all big tech collected enough data from us, so that they now can create AI models from it. Like a snapshot of humanity for some years

Schelleberg@feddit.de · 2 years ago

I’m very sure that this is the case. Reddit is pissed they gave away all the content as training data for free while struggling to monetize their platform adequately.

But I suspect the damage is already done. There are projects like “Orca” from Microsoft that skip the learning process from source data for a big part by using chatGPT and GPT4.

They missed the timing but are too stubborn and double down on it

SterlingVapor@lemmy.world · 2 years ago

What’s more, chat-gpt 4 is near the upper bound of what you can collect on the web in that way. They basically took everywhere you’d look to for information and grabbed it along with as much structure as they could… There’s plenty more information on the Internet, but the structure and quality are much lower. It’s very data poor and unstructured interactions between humans

Moving forward, everyone is talking about synthetic data sets - you can’t go bigger without some system to generate (or refine) training data - and if you have to generate the data anyways, you’re not going to pay much for a dataset that is just decent

So yeah, Reddit most definitely missed the timing.

I think Elon’s claims that he’s made Twitter profitable (despite a lot of evidence to the contrary) is also creating pressure for the other social networks to chase overly aggressive monetization schemes

z2k_@lemmy.nz · 2 years ago

Yes but imo it would be easy to seperate LLM and 3rd party apps since 3rd party apps have users sign in independently. They chose to also target 3rd party apps and take them down.

Klinkertinlegs@beehaw.org · 2 years ago

I think the LLM wave hit, they saw dollar signs, and they made a change without thinking it through, but then they were backed into a corner between money and avoiding outrage, but greed won over.

CookieJarObserver@feddit.de · edit-2 2 years ago

deleted by creator

Hyperz@beehaw.org · 2 years ago

And lots of proxies.

CookieJarObserver@feddit.de · edit-2 2 years ago

deleted by creator

gotofritz@beehaw.org · 2 years ago

IF the owners of the data agree, or, if they disagree, until they take you to court. Getty Images are taking the creators of Dall-E to court, an some tech company is taking MS to court for Copilot

CookieJarObserver@feddit.de · edit-2 2 years ago

deleted by creator

Wintermute@lemmy.villa-straylight.social · 2 years ago

What “law” says that? That’s not how copyright works at all. If you don’t have an explicit license to use content you don’t own, you can’t legally use it.

CookieJarObserver@feddit.de · edit-2 2 years ago

deleted by creator

Wintermute@lemmy.villa-straylight.social · 2 years ago

Is there an English translation available? That’s a hell of a departure from international copyright agreements that I wasn’t aware of if it’s true.

CookieJarObserver@feddit.de · edit-2 2 years ago

deleted by creator

gotofritz@beehaw.org · 2 years ago

Interesting. Do you have a link to the specifics of the law you are talking about?

EvilColeslaw@beehaw.org · edit-2 2 years ago

I think this is the main reason for the insane prices, but it could have easily been avoided. They don’t need to have one price class for every type of use of their Data API. They could have easily had one rate for LLM and other AI training uses and another for third party client applications. I feel like at some point they realized they’d rather just kill the third parties while they’re at it and this seemed like the logical moment.

gotofritz@beehaw.org · 2 years ago

Yeah, one of the other answers to the AMA was “we are not profitable yet, unlike the 3rd part app devs…” - that is something that wouldn’t sit well with any investor I know

Pasketti@lemmy.jerick.xyz · 2 years ago

It makes a lot of sense, but with the way organizations such as Internet Archive are saving webpages from Reddit, wouldn’t it be feasible to train your models off of those sites to circumvent any API charges?

gotofritz@beehaw.org · 2 years ago

Depends on the TnCs

hendrik@lemmy.ml · 2 years ago

This is mainly just being used as a pretext.

IggyTheSmidge@lemmy.blahaj.zone · 2 years ago

I think that was definitely the impetus - I first read about the changes in this article back in April: https://www.theregister.com/2023/04/18/reddit_charging_ai_api/

The closing statement is interesting:

The spokesperson we talked to also wanted to make clear the Data API was still freely accessible for appropriate use cases through the Reddit developer platform; hopefully app developers and other small-scale operators won’t have any surprises ahead this summer.

I suspect they ran the numbers and started seeing dollar signs - they don’t care about the third-party apps (which don’t make them any money directly), they’re just trying to cash in on Microsoft etc.

I have a sneaking suspicion they’re going to end up back-pedalling, but it will be too little, too late.

dawnerd@lemm.ee · 2 years ago

No. Data scrapers will still scrape the site as long as they want to be indexed by search engines. IMO charging for API access is fine when reasonable. Lying about why you’re doing it isn’t.

noodlebread@lemmy.starlightkel.xyz · 2 years ago

This contains a good explanation of why it’s clear this is really about wanting the 3rd party apps to stop existing.

https://www.youtube.com/watch?v=U06rCBIKM5M

gotofritz@beehaw.org · 2 years ago

It’s a 13 minutes rehashing the same points everyone has been making to death. And it doesn’t even mention LLMs

iMeddles@fedia.io · 2 years ago

Charging for their api is reasonable in answer to the llm data scrapers. The amount they’re chsrging, and the speed of the changes is not reasonable however IMO.

JohnDClay@sh.itjust.works · 2 years ago

The original announcement said they were making exceptions for applications that gave back to Reddit. I and many others hoped that was basically everyone who wasn’t AI scraping. But seems like they got greedy while they were at it and decided to kill everything

SkyNTP@lemmy.ml · 2 years ago

Reddit’s business model was not founded on selling LLM data. Reddit got greedy and decided to change their business model to cash in on an unexpected revenue stream. What was also unexpected (to Reddit) is that you cannot cater to social media users and monetize their data for LLM training effectively at the same time. And now Reddit will have neither, and will die just like all other businesses that adopt Enshitification as a core operating procedure.

Let this be a lesson to them and all that follow: do not let your greed make you blind to the consequences of your actions.

gotofritz@beehaw.org · 2 years ago

Does it matter what Reddit’s business model was founded on? Businesses respond to changing conditions all the time and pivot.

“they got greedy” seems really a naive way of looking at it. They are a business, that’s what businesses are all about. Additionally, they are a busienss which is NOT profitable, and need to to change things to survive now that the era of low interest rates has come to end. The real issue is that they are so inept IMHO

I find the word “entshittification” so cringe

Could the reddit API changes have to do with ChatGPT rather than third party apps?

Could the reddit API changes have to do with ChatGPT rather than third party apps?

Addressing the community about changes to our API