Notifications

Clear all

What is the best GPU for running DeepSeek-V3 locally?

DeepSeek Forum

Last Post by BeefWellington 1 month ago

7 Posts

8 Users

0 Reactions

477 Views

RSS

17/02/2026 2:00 am

Topic starter

Robertencug

(@robertencug)

Active Member

7 Posts
2 5 0

I'm really hyped about the DeepSeek-V3 release, but looking at the specs, it seems like an absolute beast to run locally. Since it’s a 671B MoE model, I know my current mid-range setup won’t even come close. I’m trying to figure out what hardware is actually necessary to get decent tokens per second without needing an enterprise-grade server. I’m considering a multi-GPU setup, maybe pooling a few used RTX 3090s or 4090s for that 24GB VRAM, but I'm worried about the total VRAM needed for 4-bit or 8-bit quantization. Does anyone have experience benchmarkng this yet? What’s the most cost-effective GPU configuration to handle those massive VRAM requirements while keeping the generation speed usable?

Add a comment

Topic Tags

DeepSeek Hardware

7 Answers

17/02/2026 2:01 am

WembleyRoar

(@wembleyroar)

Active Member

7 Posts
1 6 0

Oh man, I've been obsessing over this too!! Honestly, ur best bet is definitely pooling used cards: * Option A: NVIDIA GeForce RTX 3090 24GB (~$700)
* Option B: NVIDIA GeForce RTX 4090 24GB (~$1,800) The 3090 wins on value, right? Pros? Same VRAM, way cheaper. Cons? Slower bandwidth. Ngl, just go with used 3090s... they're amazing for this! gl

Add a comment

17/02/2026 2:25 am

MaxBic

(@maxbic)

Active Member

6 Posts
0 6 0

Sooo, I've been building LLM rigs since the early GPT-2 days, and honestly, DeepSeek-V3 is a total monster compared to what we're used to. To even get this running at a usable speed without a $40k server, ur gonna have to get creative with quantization. At 4-bit, you're looking at nearly 400GB of VRAM. Unless you've got a massive rack, that's like 16-17 NVIDIA GeForce RTX 3090 24GB cards!! Heres what I recommend for a more "budget" pro build: * Grab used NVIDIA GeForce RTX 3090 24GB cards. The NVIDIA GeForce RTX 4090 24GB is faster, but for MoE, VRAM capacity is king and the 3090 is basically half the price.
* Look into 1.5-bit or 2-bit EXL2 quants. It'll fit in about 200-240GB, which brings the GPU count down to a "reasonable" 10 cards.
* Make sure ur power supply can handle the transient spikes... seriously, you'll trip breakers with that many cards lol. I would suggest looking at a used Supermicro AS-4124GS-TNR server chassis. It handles the PCIe lanes way better than any consumer motherboard ever could. Just be careful with the heat, it's basically a space heater. gl!

Add a comment

17/02/2026 2:15 am

sxmxuzlsdk

(@sxmxuzlsdk)

Active Member

7 Posts
0 7 0

Similar situation here - I've tried many rigs over the years, including a MoE setup last summer. MoE models are basically massive libraries where the whole thing has to fit in VRAM to work. I think this 671B beast needs like 400GB+? Honestly, I ran into huge PCIe bottlenecks. IIRC, itll be slow without high-bandwidth interconnects. Basically, VRAM is only half the battle. Tbh im still learning too!

Add a comment

27/02/2026 8:00 am

CockneySparrow

(@cockneysparrow)

Active Member

9 Posts
4 5 0

Would love to know this too

Add a comment

22/02/2026 4:47 am

jgyttvirmg

(@jgyttvirmg)

Active Member

4 Posts
1 3 0

I have been tracking the hardware market for a while, and honestly, you have to be pretty cautious when looking at brand ecosystems for a beast like DeepSeek-V3. It is not just about the card specs, it is about the long-term reliability of the whole setup.

Watch out for the used market right now. Since everyone is scrambling for high VRAM cards, prices are inflated and many of those older consumer cards have been through the wringer. If a card dies mid-inference because it was previously pushed too hard, your whole multi-GPU array is basically dead in the water.

Think about the power overhead. Running a massive stack of discrete GPUs basically turns your room into a sauna and might trip your breakers. Idk if your home wiring can even handle the massive wattage these DIY rigs pull during peak inference.

Be careful with the brand choice. While some alternatives offer better value on paper, the software ecosystem is still heavily biased. You might find yourself stuck in dependency hell trying to get specific MoE optimizations to work on non-standard drivers.

Maybe look into the unified memory workstation market. It is way more stable for these massive weights, even if the entry price is steep. It avoids the PCIe bottleneck issues that often plague those multi-GPU consumer builds.

Add a comment

23/02/2026 8:47 am

Armandphafe

(@armandphafe)

Active Member

8 Posts
0 8 0

I totally agree about the used market being a total gamble right now. Plus, finding that many matching cards from one seller is nearly impossible anyway. Honestly though, I have to disagree with the idea of building a 16-GPU rack at home. It sounds cool in theory, but the infrastructure needed is basically enterprise-level. For a DIY enthusiast, you are better off looking at high-RAM unified memory workstations instead. Even if the tokens per second are lower, you avoid the nightmare of:

Setting up custom cooling for a dozen cards

Dealing with massive power draw on home circuits

Debugging PCIe riser issues constantly DeepSeek-V3 is just so massive that trying to squeeze it onto consumer boards feels like a losing battle. Sometimes the best DIY move is knowing when a model is just too big for local hardware and using a professional service instead, you know?

Add a comment

12/04/2026 1:15 am

BeefWellington

(@beefwellington)

Active Member

13 Posts
1 12 0

Came here to say the same thing lol. Great minds think alike I guess.

Add a comment

8 Forums
1,200 Topics
8,397 Posts
18 Online
339 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed