To quote - "It found that most of the internet is translated, as 57.1 percent of the sentences in the corpus were multi-way parallel in at least three languages. "
So in other words, that majority of web content is AI translations of other content. As its often poorly translated, or entirely mistranslated, it qualifies as “AI-generated garbage” - hence the headline.
Technically, but I think I and a lot of other readers thought it was talking about original content from AIs, as opposed to translated.
I have noticed those sites with answers to commonly searched questions, which look very convincing and have AI generated “authors” as well as a topic-specific URL, but then sometimes lose the plot of the question half way through. I almost fall for them, and I’m a crusty internet person, so I can only imagine how many people are just totally swallowing the info.
Humans are frequently unoriginal, which is why they get caught copying existing things with adjustments. But they also do make new things based on existing things that add something that new in a way that is significantly different in a way that might use some parts of existing things in a way that is original.
The Thing, Predator, and Alien are all other worldly being who hunt humans but would you consider them regurgitated content with adjustments?
The thing is about fear of other people, with an alien monster.
Predator is about macho men being outclassed, with an alien monster.
Alien is about sexual assault, with an alien monster.
AI won’t accidentally create anything comparable by accident, because these three movies aren’t even the output of a single human. Hell, even books are not the out the output of a single person. They have editors and reviews and collaboration that involves sharing of knowledge and influenced by experiences that AI won’t accidentally stumble upon by accident. AI will create the direct to video knock offs that are just copying existing media to profit because AI is like an executive who tries to always make what is already proven to work because it is seen as reliable.
Alright, that’s a weaker claim (that is, less of an extraordinary claim) than I was expecting. LLMs aren’t quite as good as a human at conceptual originality yet, and I can’t prove they will catch up, especially if thematic subtext is the measure.
I guess I’ll just say my original point stands then. There’s a difference between something made from a prompt by ChatGPT, and something produced from a roughly equivalent text by a translation tool.
If you translate a sentence once using a computer, it’s probably a translation. If you translate a translation, you are using a computer to regenerate computer generated content, even if it started with a human seed in the first translation. The two words only have different meanings in specific context. They CAN mean the same thing, but don’t necessarily or even often.
In this case though, the article does suggest that AI is taking ai content and rewriting it, or translating from “English” to “English” a bunch of times. Which is both translation and generation.
English to English would be rewording and would totally fall under AI generated garbage but the article doesn’t seem to mention this. It’s entirely about English to other language, mostly in Africa and the global south.
Although taking articles and translating them is using AI, I don’t think that’s what most people associate with “AI generated garbage” hence the click bait.
It’s an interesting article, I just think the headline is misleading.
On the contrary. It explicitly states that it is.
To quote - "It found that most of the internet is translated, as 57.1 percent of the sentences in the corpus were multi-way parallel in at least three languages. "
So in other words, that majority of web content is AI translations of other content. As its often poorly translated, or entirely mistranslated, it qualifies as “AI-generated garbage” - hence the headline.
Technically, but I think I and a lot of other readers thought it was talking about original content from AIs, as opposed to translated.
I have noticed those sites with answers to commonly searched questions, which look very convincing and have AI generated “authors” as well as a topic-specific URL, but then sometimes lose the plot of the question half way through. I almost fall for them, and I’m a crusty internet person, so I can only imagine how many people are just totally swallowing the info.
‘Original content’ from AI is just regurgitated content with adjustments.
How many adjustments does it take before it’s new content. If the answer is a lot, are humans ever original either?
Humans are frequently unoriginal, which is why they get caught copying existing things with adjustments. But they also do make new things based on existing things that add something that new in a way that is significantly different in a way that might use some parts of existing things in a way that is original.
The Thing, Predator, and Alien are all other worldly being who hunt humans but would you consider them regurgitated content with adjustments?
The thing is about fear of other people, with an alien monster.
Predator is about macho men being outclassed, with an alien monster.
Alien is about sexual assault, with an alien monster.
AI won’t accidentally create anything comparable by accident, because these three movies aren’t even the output of a single human. Hell, even books are not the out the output of a single person. They have editors and reviews and collaboration that involves sharing of knowledge and influenced by experiences that AI won’t accidentally stumble upon by accident. AI will create the direct to video knock offs that are just copying existing media to profit because AI is like an executive who tries to always make what is already proven to work because it is seen as reliable.
Alright, that’s a weaker claim (that is, less of an extraordinary claim) than I was expecting. LLMs aren’t quite as good as a human at conceptual originality yet, and I can’t prove they will catch up, especially if thematic subtext is the measure.
I guess I’ll just say my original point stands then. There’s a difference between something made from a prompt by ChatGPT, and something produced from a roughly equivalent text by a translation tool.
You do not generate text when you translate it. The two words have different meanings.
If you translate a sentence once using a computer, it’s probably a translation. If you translate a translation, you are using a computer to regenerate computer generated content, even if it started with a human seed in the first translation. The two words only have different meanings in specific context. They CAN mean the same thing, but don’t necessarily or even often.
In this case though, the article does suggest that AI is taking ai content and rewriting it, or translating from “English” to “English” a bunch of times. Which is both translation and generation.
English to English would be rewording and would totally fall under AI generated garbage but the article doesn’t seem to mention this. It’s entirely about English to other language, mostly in Africa and the global south.
Although taking articles and translating them is using AI, I don’t think that’s what most people associate with “AI generated garbage” hence the click bait.
It’s an interesting article, I just think the headline is misleading.
Translate is not generate. It’s intentionally misleading.