• archomrade [he/him]
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    1 day ago

    None of the flagship models publish their training data because they’re all trained on less-than-legal datasets.

    It’s a little like complaining that jellyfin doesn’t publish any media with their code - not only is that not legal but it’s implied that you’re responsible for attaining your own.

    If you’re someone who can and does compile and re-train your own 64B parameter LLM models, you almost certainly have your own dataset for that purpose (in fact huggingface has many).

    • lurch (he/him)@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      ·
      19 hours ago

      still doesn’t make it magically open source.

      debian would probably split the package in a non-free and open source part, for this reason.