Proton's biased article on Deepseek

JOMusic@lemmy.ml · 23 hours ago

Proton's biased article on Deepseek

Dyf_Tfh@lemmy.sdf.org · edit-2 21 hours ago

Those are not deepseek R1. They are unrelated models like llama3 from Meta or Qwen from Alibaba “distilled” by deepseek.

This is a common method to smarten a smaller model from a larger one.

Ollama should have never labelled them deepseek:8B/32B. Way too many people misunderstood that.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 19 hours ago

I’m running deepseek-r1:14b-qwen-distill-fp16 locally and it produces really good results I find. Like yeah it’s a reduced version of the online one, but it’s still far better than anything else I’ve tried running locally.

morrowind@lemmy.ml · edit-2 33 minutes ago

Have you compared it with the regular qwen? It was also very good

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 7 hours ago

The main difference is speed and memory usage. Qwen is a full-sized, high-parameter model while qwen-distill is a smaller model created using knowledge distillation to mimic qwen’s outputs. If you have the resources to run qwen fast then I’d just go with that.

morrowind@lemmy.ml · 31 minutes ago

I think you’re confusing the two. I’m talking about the regular qwen before it was finetuned by deep seek, not the regular deepseek

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 minutes ago

I haven’t actually used that one, but doesn’t the same point apply here too? The whole point of DeepSeek is in distillation that makes runtime requirements smaller.