The big AI models are running out of training data (and it turns out most of the training data was produced by fools and the intentionally obtuse), so this might mark the end of rapid model advancement

    • lurkerlady [she/her]@hexbear.net
      link
      fedilink
      English
      arrow-up
      9
      ·
      edit-2
      5 months ago

      Synthetic data is basically a fancy way of saying ‘I’m properly formatting data and reinforcing the ai’s good outputs’. Rearranging words, fixing / adding tags, that sort of thing. This is generated with various tools that usually have an LLM or VLM plugged in, though some are as simple as a regex script.