What is the best GP...
 
Notifications
Clear all

What is the best GPU for running DeepSeek LLM locally?

10 Posts
11 Users
0 Reactions
407 Views
0
Topic starter

I’m looking to move my DeepSeek coding workflow offline for privacy, but I’m stuck on hardware. I'm considering an RTX 4090 or maybe a dual 3090 setup for that 48GB VRAM. Given the 67B model's size, what’s the best GPU configuration to maintain fast inference speeds without hitting major bottlenecks?


Topic Tags
10 Answers
12

Sooo I totally get the struggle! I remember when I first tried loading a massive model on a single card and it literally just crashed my whole system... it was so frustrating!! If you're looking for the best bang for your buck, the dual NVIDIA GeForce RTX 3090 24GB setup is lowkey the way to go for DeepSeek. Even though a single NVIDIA GeForce RTX 4090 24GB is faster, you just CANT beat having 48GB of VRAM when you're running those heavy coding models.

Here is my quick tip for staying under budget:
- Use 4-bit or 5-bit quantization (GGUF or EXL2) to keep things snappy.
- Make sure your EVGA SuperNOVA 1000 G5 1000W PSU can actually handle the power draw of two cards.

Honestly, having that extra VRAM headroom makes everything sooo much smoother. I LOVE being able to run long context windows without getting OOM errors. gl with the build!!


11

Before I give advice, can you clarify if youre running quantized or full weights? Honestly, I tried dual NVIDIA GeForce RTX 3090 24GB cards and the speed was kinda disappointing due to PCIe bottlenecks. Also, whats your power supply? Running two or even a single NVIDIA GeForce RTX 4090 24GB is basically like a space heater lol. Highkey a headache to cool.


3

yo, i feel u on the privacy thing. honestly, trying to run a huge model like DeepSeek is basically a full-time job lol. i'm still kinda new to this too, but i've been messing around with local stuff lately and it's a bit of a headache. i think for your situation, you really gotta prioritize that total VRAM over everything else if you want it to be fast.

i would suggest just going with NVIDIA. they're basically the standard and i've had way less trouble with their drivers. i mean, you might want to consider a dual-card setup instead of just one high-end card. it's usually way more cost-effective for getting that 48GB your looking for. just make sure to check your power supply tho... i almost fried mine cuz i didn't realize how much juice they pull together!! it's SO much better once you get it running tho, gl!


2

Tbh I totally agree that 48GB is the magic number if you want to run the bigger DeepSeek models without it crawling. I've been doing some market research since I'm trying to build my own rig, and it's wild how much NVIDIA GeForce RTX 3090 24GB prices have stayed up. Like, why is the AMD Radeon RX 7900 XTX so much cheaper for the same VRAM? I guess it's basically cuz of the software support. I read that setting up ROCm for AMD is a huge pain compared to CUDA, so even though the hardware is cheaper, ur paying for the headache. Has anyone looked at the Apple Mac Studio with the M2 Ultra? I've seen some people say that even though it's not a "GPU" in the same way, that unified memory lets you run massive models way easier than trying to link two cards. But then again, it costs a fortune. Anyway, it just feels like we're all stuck paying the "NVIDIA tax" because their ecosystem is so much more mature. Is it even worth trying to go non-NVIDIA right now or is that just asking for trouble?


2

> Given the 67B model's size, what's the best GPU configuration to maintain fast inference speeds without hitting major bottlenecks? tbh in my experience over the years, chasing the absolute best setup is kinda a trap because hardware stuff changes so fast. ive tried many different configs and honestly id suggest just checking out the local llm subreddits. they have mega-threads specifically for this deepseek model size and everything. i also saw a really solid video on youtube about this just the other day... i think if you search deepseek local hardware comparison it pops right up. it breaks down the whole vram vs speed trade-off way better than a quick forum post can. better to do a search and see the actual benchmarks yourself before you commit to a build, ya know? anyway, definitely worth a look there first.


2

Gonna try this over the weekend. Will report back if it works!


1

Bump - same question here


1

Seconded!


1

I've messed with this for a long time and unfortunately, the consumer-grade stuff usually falls short for models this big. A setup i tried with two high-end cards had a bottleneck at the bus level that was just brutal.

  • My old enterprise-grade workstation actually handled the 67B model much better than the new gaming rig did, mostly because the data transfer between chips was more stable.
  • Latency on the consumer setup i built last month was honestly depressing... getting maybe 2 or 3 tokens per second is useless for a coding workflow.
  • Cooling was the other big failure. The ones i got would hit 90 degrees almost instantly and then the performance would just tank. Tbh, unless you move toward professional-grade hardware with better cooling and higher bandwidth, youre gonna be fighting the hardware more than you're actually coding. It's kinda frustrating how much the speed drops off.


1

Same boat, watching this


Share: