What is the best GP...
 
Notifications
Clear all

What is the best GPU for running DeepSeek-V3 locally?

7 Posts
8 Users
0 Reactions
477 Views
0
Topic starter

I'm really hyped about the DeepSeek-V3 release, but looking at the specs, it seems like an absolute beast to run locally. Since it’s a 671B MoE model, I know my current mid-range setup won’t even come close. I’m trying to figure out what hardware is actually necessary to get decent tokens per second without needing an enterprise-grade server. I’m considering a multi-GPU setup, maybe pooling a few used RTX 3090s or 4090s for that 24GB VRAM, but I'm worried about the total VRAM needed for 4-bit or 8-bit quantization. Does anyone have experience benchmarkng this yet? What’s the most cost-effective GPU configuration to handle those massive VRAM requirements while keeping the generation speed usable?


7 Answers
12

Oh man, I've been obsessing over this too!! Honestly, ur best bet is definitely pooling used cards: * Option A: NVIDIA GeForce RTX 3090 24GB (~$700)
* Option B: NVIDIA GeForce RTX 4090 24GB (~$1,800) The 3090 wins on value, right? Pros? Same VRAM, way cheaper. Cons? Slower bandwidth. Ngl, just go with used 3090s... they're amazing for this! gl


11

Sooo, I've been building LLM rigs since the early GPT-2 days, and honestly, DeepSeek-V3 is a total monster compared to what we're used to. To even get this running at a usable speed without a $40k server, ur gonna have to get creative with quantization. At 4-bit, you're looking at nearly 400GB of VRAM. Unless you've got a massive rack, that's like 16-17 NVIDIA GeForce RTX 3090 24GB cards!! Heres what I recommend for a more "budget" pro build: * Grab used NVIDIA GeForce RTX 3090 24GB cards. The NVIDIA GeForce RTX 4090 24GB is faster, but for MoE, VRAM capacity is king and the 3090 is basically half the price.
* Look into 1.5-bit or 2-bit EXL2 quants. It'll fit in about 200-240GB, which brings the GPU count down to a "reasonable" 10 cards.
* Make sure ur power supply can handle the transient spikes... seriously, you'll trip breakers with that many cards lol. I would suggest looking at a used Supermicro AS-4124GS-TNR server chassis. It handles the PCIe lanes way better than any consumer motherboard ever could. Just be careful with the heat, it's basically a space heater. gl!


3

Similar situation here - I've tried many rigs over the years, including a MoE setup last summer. MoE models are basically massive libraries where the whole thing has to fit in VRAM to work. I think this 671B beast needs like 400GB+? Honestly, I ran into huge PCIe bottlenecks. IIRC, itll be slow without high-bandwidth interconnects. Basically, VRAM is only half the battle. Tbh im still learning too!


3

Would love to know this too


2

I have been tracking the hardware market for a while, and honestly, you have to be pretty cautious when looking at brand ecosystems for a beast like DeepSeek-V3. It is not just about the card specs, it is about the long-term reliability of the whole setup.

  • Watch out for the used market right now. Since everyone is scrambling for high VRAM cards, prices are inflated and many of those older consumer cards have been through the wringer. If a card dies mid-inference because it was previously pushed too hard, your whole multi-GPU array is basically dead in the water.
  • Think about the power overhead. Running a massive stack of discrete GPUs basically turns your room into a sauna and might trip your breakers. Idk if your home wiring can even handle the massive wattage these DIY rigs pull during peak inference.
  • Be careful with the brand choice. While some alternatives offer better value on paper, the software ecosystem is still heavily biased. You might find yourself stuck in dependency hell trying to get specific MoE optimizations to work on non-standard drivers.
  • Maybe look into the unified memory workstation market. It is way more stable for these massive weights, even if the entry price is steep. It avoids the PCIe bottleneck issues that often plague those multi-GPU consumer builds.


2

I totally agree about the used market being a total gamble right now. Plus, finding that many matching cards from one seller is nearly impossible anyway. Honestly though, I have to disagree with the idea of building a 16-GPU rack at home. It sounds cool in theory, but the infrastructure needed is basically enterprise-level. For a DIY enthusiast, you are better off looking at high-RAM unified memory workstations instead. Even if the tokens per second are lower, you avoid the nightmare of:

  • Setting up custom cooling for a dozen cards
  • Dealing with massive power draw on home circuits
  • Debugging PCIe riser issues constantly DeepSeek-V3 is just so massive that trying to squeeze it onto consumer boards feels like a losing battle. Sometimes the best DIY move is knowing when a model is just too big for local hardware and using a professional service instead, you know?


1

Came here to say the same thing lol. Great minds think alike I guess.


Share: