cross-posted from: https://lemmy.intai.tech/post/72919

Parameters count:

GPT-4 is more than 10x the size of GPT-3. We believe it has a total of ~1.8 trillion parameters across 120 layers. Mixture Of Experts - Confirmed.

OpenAI was able to keep costs reasonable by utilizing a mixture of experts (MoE) model. They utilizes 16 experts within their model, each is about ~111B parameters for MLP. 2 of these experts are routed to per forward pass.

Related Article: https://lemmy.intai.tech/post/72922

  • Chickenstalker@lemmy.world
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    13
    ·
    1 year ago

    People who shit on AI “hallucinations” are the same kind of people who called the Wright Brother’s Flyer, jank.

    • Nugget_in_biscuit@lemmy.ml
      link
      fedilink
      English
      arrow-up
      22
      ·
      1 year ago

      Yeah but the original Wright Flyer was extremely janky. It took decades before planes were safe enough for the general public to fly on them. I doubt it’s going to take decades for LLM’s to get really good, but it’s undeniable that the current generation of these systems are somewhat lacking in quality

      • Maple@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 year ago

        Yeah, I agree. Just because it’s bad now, doesn’t mean it won’t be good later. The only way to fix the problems something has is by recognizing them, and working on them. Imagine if people didn’t point out the AI hallucinations? We’d never get anywhere with LLMs. Not shitting on the first guy, I get where he’s coming from it’s a damn cool time to be alive, LLMs are incredible and can only get better from here(that is of course if they don’t keep slapping F*****g censors on it!). but it’s important for us to recognise the flaws in the system so it and we can grow.

        • grabyourmotherskeys@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          And to curb enthusiasm for integrating stuff like this into production systems.

          There are currently law suits about copyright violations in training. They are either going to get settled for tons of money or the model will be retrained without that data source. This could have a significant effect on the business model, the business itself, and the models in use, especially over the long term.

          There’s a lot to figure out before this is a stable product.

      • kakes@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I’m just stoked they exist at all, honestly, never minding any degree of quality. Been living my best geek life lately.

    • damnYouSun@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      Wasn’t there some newspapers that said that it would take one million years before humans could fly about two weeks before the Flyer?

      Hell, we have gone from hunter gatherers to a technologically advanced society in less time than that. The moral of the story being journalists are idiots and should be ignored.