Well deserved rule

Doll_Tow_Jet-ski@fedia.io · 2 days ago

Well deserved rule

theneverfox@pawb.social · 2 days ago

Run it on your local machine if you can… The retirements are pretty modest

VeganCheesecake@lemmy.blahaj.zone · 1 day ago

Are they, though? You need shit loads of Vram, or at least RAM, to get a usable experience.

theneverfox@pawb.social · 1 day ago

Not really. You can run the 8b model with like 6-8gb of vram… It’s literally in the realm where people can run it on their phone

VeganCheesecake@lemmy.blahaj.zone · 23 hours ago

Okay, yeah, I was looking at the full size model.

AdrianTheFrog@lemmy.world · 2 days ago

for the distilled lighter models you can run them easily, the original you need like at least 260 gb of ram it looks like

this video gets a semi usable experience with a $5500 cpu https://www.youtube.com/watch?v=o1sN1lB76EA

you could get the thelio astra to run it for like $6900 total and probably get similar performance, still cheaper than the base model mac pro lol

for better speed you could probably buy a bunch of old tesla gpus on ebay, that might work

Pup Biru@aussie.zone · 11 hours ago

you don’t actually need to fit the whole model in RAM at once: the 70b for example “requires” something like 120gb of VRAM, but i’m running it on my 64gb m1 mbp - it just starts to run a bit slower (still very usable; i reckon about a word per 300ms)

theneverfox@pawb.social · 1 day ago

True, but who cares about the base models? Usefulness is what matters - the 8gb model is pretty useful, better than the free tier of anything I’ve tried

Maybe the paid models are better… Just like adaptive cruise control, I refuse to rely on it until I can rely on it. I’m driving, I know the top models still need me to drive them, so I’m happy with what I have… Why rely on something that could be taken away?

AdrianTheFrog@lemmy.world · 19 hours ago

I was trying the 14b model (q4_k_m quantized) on my 3060 recently, and while it is clearly stupider than ChatGPT (i tried asking it some things from old chatgpt chats) it is much faster (20 tokens per second) and at least doesn’t suddenly become dumber once openai decides you’ve had enough 4o time today on the free plan and the rest of your chat will use whatever earlier model there was

Mora@pawb.social · 2 days ago

The retirements are pretty modest

Yeah, in Germany as well🫤