What is the best GPU for running DeepSeek-V3 locally?

Question

I’m really hyped to try out DeepSeek-V3, but I’m hitting a wall trying to figure out the hardware needed for a smooth local setup. Since it’s a massive MoE model, I know VRAM is going to be the biggest bottleneck. I’m currently looking at an RTX 4090, but I’m worried 24GB won’t be enough even with heavy 4-bit quantization. Should I be considering a multi-GPU build with a few 3090s, or is a pro-grade card like the A100 the only real way to avoid constant OOM errors? I mainly want to use it for heavy coding tasks and logic reasoning. What’s the most cost-effective GPU setup you’d recommend to actually get this beast running?

AccsMarket.net · Accepted Answer

Ok so, before I dive into the hardware list, quick question - are you planning to run the full 671B model or a specific distilled version?? DeepSeek-V3 is a massive MoE and even with 4-bit quants, you're looking at needing like 400GB+ of VRAM to run it well. Ngl, if you want it cost-effective, NVIDIA GeForce RTX 3090 24GB cards in a multi-GPU setup are the way to go because they're way cheaper than a NVIDIA GeForce RTX 4090 24GB.

IsleOfWightSails · Answer

> I’m currently looking at an RTX 4090, but I’m worried 24GB won’t be enough even with heavy 4-bit quantization.

Yo, I feel u!! DeepSeek-V3 is a TOTAL beast and honestly, a single NVIDIA GeForce RTX 4090 24GB just wont cut it for the full model. In my experience, the move is definitely getting two used NVIDIA GeForce RTX 3090 24GB cards. You can find them for like $700ish each, giving you 48GB VRAM for way less than one 4090. It's literally the most cost-effective way to avoid those OOM errors while keeping logic sharp. gl!

TubeTraveller · Answer

Ok so I've been tinkering with local LLMs since the early LLaMA days and honestly, +1 to what was said earlier... a single card basically just doesn't have the memory bandwidth or capacity for a 671B MoE beast. My first attempt at running a massive model was a total disaster—constant OOM errors and like 0.2 tokens per second—but yeah, basically I learned that VRAM is king for logic tasks.

In my experience, if you wanna actually use it for coding without losing your mind, you've gotta scale horizontally. I've tried many setups over the years, and for this specific challenge, I'd say:

* Just get any enterprise-grade cards from NVIDIA if you have the budget.
* Go with multi-GPU setups from the same brand to keep the drivers simple.
* Look at workstation lines from professional vendors rather than just gaming gear.

Basically, even with 4-bit, you're looking at needing ~400GB for the full weights if you want zero degradation, so maybe just look at any high-VRAM options from the green team. Good luck tho!!