• AdrianTheFrog@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      2 days ago

      for the distilled lighter models you can run them easily, the original you need like at least 260 gb of ram it looks like

      this video gets a semi usable experience with a $5500 cpu https://www.youtube.com/watch?v=o1sN1lB76EA

      you could get the thelio astra to run it for like $6900 total and probably get similar performance, still cheaper than the base model mac pro lol

      for better speed you could probably buy a bunch of old tesla gpus on ebay, that might work

      • Pup Biru@aussie.zone
        link
        fedilink
        English
        arrow-up
        2
        ·
        11 hours ago

        you don’t actually need to fit the whole model in RAM at once: the 70b for example “requires” something like 120gb of VRAM, but i’m running it on my 64gb m1 mbp - it just starts to run a bit slower (still very usable; i reckon about a word per 300ms)

      • theneverfox@pawb.social
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 day ago

        True, but who cares about the base models? Usefulness is what matters - the 8gb model is pretty useful, better than the free tier of anything I’ve tried

        Maybe the paid models are better… Just like adaptive cruise control, I refuse to rely on it until I can rely on it. I’m driving, I know the top models still need me to drive them, so I’m happy with what I have… Why rely on something that could be taken away?

        • AdrianTheFrog@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          19 hours ago

          I was trying the 14b model (q4_k_m quantized) on my 3060 recently, and while it is clearly stupider than ChatGPT (i tried asking it some things from old chatgpt chats) it is much faster (20 tokens per second) and at least doesn’t suddenly become dumber once openai decides you’ve had enough 4o time today on the free plan and the rest of your chat will use whatever earlier model there was