What is the best GPU for running DeepSeek-V3 locally?

Question

Hey everyone! I'm trying to figure out which GPU can actually handle DeepSeek-V3 at home. I'm specifically worried about:

VRAM requirements for 4-bit quants

Multi-GPU scaling vs single pro cards

I want to stop paying for API calls for privacy reasons. What is the best GPU setup to run this smoothly?

Fxarthouh · Accepted Answer

Honestly, DeepSeek-V3 is a beast so you're gonna need massive VRAM. For a home setup, picking up used NVIDIA GeForce RTX 3090 24GB GDDR6X cards is usually the most cost-effective move. You get 24GB for way less than pro gear. Just a heads up, you'll need several of them in a multi-GPU rig to run a 4-bit quant smoothly, but its basically the best bang for your buck.

AngleseyIsland · Answer

I stumbled upon this while looking for V3 benchmarks myself. Ive spent way too long trying to fit these huge MoE models into home builds lately. Tbh, the math for DeepSeek-V3 is pretty brutal. At 4-bit, youre looking at around 350GB to 400GB just for the weights and KV cache. Even with a bunch of consumer cards, youll run out of room fast. I actually started looking into NVIDIA RTX 6000 Ada Generation 48GB GDDR6 units for the better VRAM density, but the cost is definitely wild. Another route Ive seen some folks take is the Apple Mac Studio M2 Ultra 192GB Unified Memory, though youd be forced into a heavier quantization like 2-bit to fit it all. If you go the multi-GPU route on a PC, just make sure your motherboard like the ASUS Pro WS WRX90E-SAGE SE WIFI can actually handle all those PCIe lanes. Bottlenecking is a real vibe killer when youre waiting for tokens...

psyusqdkdi · Answer

Facts.