publication croisée depuis : https://lemmy.world/post/1474932

Hi there.

I wanted to run LLMs locally on my server (for better privacy), and was wondering if:

  1. I could use Intel ARC/AMD GPUs - these are often less expensive and AMD has open source drivers, which is something I like.
  2. If a PCIe x4 Gen 3 slot would be enough (it’s an x16 slot with x4 speeds) - this is an important consideration.
  3. Would 8GB of RAM (in the GPU, I believe it’s called VRAM?) be enough?

I’m looking at language models to train on my Reddit and Lemmy content, in an aim to make it write like me (and maybe even better than me? Who knows). I don’t quite know which models I will train, or how I will do so (I certainly won’t be writing anything from scratch), but I was wondering; with the explosion of FOSS AI models, maybe something like this would be possible with the hardware constraints I mentioned above?

Does the speed of the connection between the GPU and the CPU really matter in such applications?

Thanks!

  • socphoenix
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    2 years ago

    I’ve used an image upscaling on a 3060ti with 8 gigs of vram. It works ok, but does limit how much it can do at one time. Long as you’re ok with letting it run longer I’d imagine it would work on a text mode as wel

  • surrendertogravity@wayfarershaven.eu
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    2 years ago

    So I’m no expert at running local LLMs, but I did download one (the 7B vicuña model recommended by the LocalLLM subreddit wiki) and try my hand at training a LoRA on some structured data I have.

    Based on my experience, the VRAM available to you is going to be way more of a bottleneck than PCIe speeds.

    I could barely hold a 7B model in 10 GB of VRAM on my 3080, so 8 GB might be impossible or very tight. IMO to get good results with local models you really have large quantities of VRAM and be using 13B or above models.

    Additionally, when you’re training a LoRA the model + training data gets loaded into VRAM. My training dataset wasn’t very large, and even so, I kept running into VRAM constraints with training.

    In the end I concluded that in the current state, running a local LLM is an interesting exercise but only great on enthusiast level hardware with loads of VRAM (4090s etc).

      • surrendertogravity@wayfarershaven.eu
        link
        fedilink
        arrow-up
        1
        arrow-down
        1
        ·
        2 years ago

        Yup; hopefully there are some advances in the training space, but I’d guess that having large quantities of VRAM is always going to be necessary in some capacity for training specifically.

      • surrendertogravity@wayfarershaven.eu
        link
        fedilink
        arrow-up
        1
        arrow-down
        1
        ·
        2 years ago

        Yup; hopefully there are some advances in the training space, but I’d guess that having large quantities of VRAM is always going to be necessary in some capacity for training specifically.