- cross-posted to:
- becomeme@sh.itjust.works
- cross-posted to:
- becomeme@sh.itjust.works
The big AI models are running out of training data (and it turns out most of the training data was produced by fools and the intentionally obtuse), so this might mark the end of rapid model advancement
I fail to see how synthetic data is good if it makes AI used to justify job cuts, “better”.
Synthetic data is basically a fancy way of saying ‘I’m properly formatting data and reinforcing the ai’s good outputs’. Rearranging words, fixing / adding tags, that sort of thing. This is generated with various tools that usually have an LLM or VLM plugged in, though some are as simple as a regex script.