What is the best GPU for running DeepSeek-V3 models locally?

Question

So Ive been obsessing over the benchmarks for DeepSeek-V3 lately and Im finally ready to pull the trigger on a new setup. Im mostly using it for complex coding tasks and some local data analysis, and the lag on the web versions is just getting to be too much for my patience. My budget is capped right around $4,000 and Im hoping to have this thing built by next Tuesday before my next big project starts.

Right now Im basically torn between two very different setups. I could go the multi-GPU route with two used RTX 3090s, which would give me 48GB of VRAM total. It seems like the most cost-effective way to get enough memory to run a decent quantization of V3 without it crawling, but Im worried about the heat. My home office is tiny and honestly I dont know if my current PSU can handle the spikes from two 3090s.

On the other hand, Ive been looking at a single used RTX A6000 (the 48GB version). It would be way easier to fit in my current case and its much more power efficient, but the raw speed might be slower than dual consumer cards? I also looked at maybe just grabbing a Mac Studio with the M2 Ultra and 128GB of RAM since the unified memory situation is so much easier there, but Im a Windows guy through and through.

Given that V3 is such a massive MoE model, is it better to go for the raw speed of dual 3090s or the stability of a single pro card? Or am I totally off base and need even more VRAM than 48GB to make this thing actually usable for daily coding?

Josephzomma · Accepted Answer

> My budget is capped right around $4,000 and Im hoping to have this thing built by next Tuesday before my next big project starts. Before we dive too deep into the hardware specs, I gotta ask... what kind of power supply are you rocking right now? If you're currently on a 750W or 850W unit, dual 3090s are gonna trip your breakers the second you start a heavy inference run. Those power spikes are no joke and you'd likely need something like a EVGA Supernova 1600 G+ 1600W PSU to be safe. Comparing the two setups, you might want to consider the heat factor more seriously since your office is small. Two NVIDIA GeForce RTX 3090 24GB GDDR6X cards will basically turn your room into a sauna. I tried a similar multi-GPU setup last year and had to leave the door open just to breathe lol. They are technically faster for raw tokens per second because of the higher memory bandwidth tho. The NVIDIA RTX A6000 48GB GDDR6 is definitely the safer, more stable route. It's way more power efficient and uses a blower style cooler which is great for smaller cases. Just be careful because it's actually a bit slower than the dual 3090s in terms of sheer throughput because of the memory architecture. If you can find a used deal, a NVIDIA RTX 6000 Ada Generation 48GB GDDR6 would be the absolute dream, but those usually still hover right at or over the $4k mark. Honestly, for DeepSeek-V3, 48GB is really the bare minimum for a decent 4-bit quantization. You might find yourself wanting even more VRAM pretty quickly once you start hitting longer context windows during coding.

SherlockHolmesFan · Answer

Unfortunately, 48GB of VRAM is basically the bare minimum for DeepSeek-V3 if you want to run a quantization that doesnt feel like a lobotomized version of the model. I've had issues with running these massive MoE models on dual consumer setups before, and its rarely as smooth as the benchmarks suggest. The memory bandwidth overhead when splitting across two cards can actually kill your tokens-per-second more than you'd expect. I’ve spent way too much time troubleshooting heat issues with the NVIDIA GeForce RTX 3090 24GB GDDR6X. Those backplate VRAM chips get dangerously hot during long inference sessions, and its honestly disappointing how much babying they need. If you're on a deadline for a project, you dont want to be debugging a thermal throttle at 2 AM. Tbh, the professional route is better for your sanity.

The NVIDIA RTX A6000 48GB GDDR6 is much more reliable, even if the raw clock speed looks lower on paper. Stability is worth the premium here.

Make sure your motherboard can actually handle the spacing. Something like the ASUS Pro WS WRX80E-SAGE SE WIFI II is built for this kind of work.

You absolutely need a tier-A power supply. I'd suggest the Corsair AX1600i 1600W Digital ATX to avoid any transient spike shutdowns. Ngl, its frustrating that we still have to choose between consumer speed and professional stability at this price point. But for a Tuesday deadline? Get the A6000 and dont look back.