What is the best GP...
 
Notifications
Clear all

What is the best GPU for running DeepSeek-V3 locally?

6 Posts
7 Users
0 Reactions
470 Views
0
Topic starter

I’m really hyped to try out DeepSeek-V3, but I’m hitting a wall trying to figure out the hardware needed for a smooth local setup. Since it’s a massive MoE model, I know VRAM is going to be the biggest bottleneck. I’m currently looking at an RTX 4090, but I’m worried 24GB won’t be enough even with heavy 4-bit quantization. Should I be considering a multi-GPU build with a few 3090s, or is a pro-grade card like the A100 the only real way to avoid constant OOM errors? I mainly want to use it for heavy coding tasks and logic reasoning. What’s the most cost-effective GPU setup you’d recommend to actually get this beast running?


Topic Tags
6 Answers
11

Ok so, before I dive into the hardware list, quick question - are you planning to run the full 671B model or a specific distilled version?? DeepSeek-V3 is a massive MoE and even with 4-bit quants, you're looking at needing like 400GB+ of VRAM to run it well. Ngl, if you want it cost-effective, NVIDIA GeForce RTX 3090 24GB cards in a multi-GPU setup are the way to go because they're way cheaper than a NVIDIA GeForce RTX 4090 24GB.


10

> I’m currently looking at an RTX 4090, but I’m worried 24GB won’t be enough even with heavy 4-bit quantization.

Yo, I feel u!! DeepSeek-V3 is a TOTAL beast and honestly, a single NVIDIA GeForce RTX 4090 24GB just wont cut it for the full model. In my experience, the move is definitely getting two used NVIDIA GeForce RTX 3090 24GB cards. You can find them for like $700ish each, giving you 48GB VRAM for way less than one 4090. It's literally the most cost-effective way to avoid those OOM errors while keeping logic sharp. gl!


2

Ok so I've been tinkering with local LLMs since the early LLaMA days and honestly, +1 to what was said earlier... a single card basically just doesn't have the memory bandwidth or capacity for a 671B MoE beast. My first attempt at running a massive model was a total disaster—constant OOM errors and like 0.2 tokens per second—but yeah, basically I learned that VRAM is king for logic tasks.

In my experience, if you wanna actually use it for coding without losing your mind, you've gotta scale horizontally. I've tried many setups over the years, and for this specific challenge, I'd say:

* Just get any enterprise-grade cards from NVIDIA if you have the budget.
* Go with multi-GPU setups from the same brand to keep the drivers simple.
* Look at workstation lines from professional vendors rather than just gaming gear.

Basically, even with 4-bit, you're looking at needing ~400GB for the full weights if you want zero degradation, so maybe just look at any high-VRAM options from the green team. Good luck tho!!


2

> I mainly want to use it for heavy coding tasks and logic reasoning. What’s the most cost-effective GPU setup you’d recommend to actually get this beast running? Honestly, if you're prioritizing logic and coding over raw inference speed, you should seriously look into the Apple Mac Studio M2 Ultra with the 192GB unified memory configuration. I mean, wait, it sounds expensive at first glance, but if you do the market research and calculate the cost of building a 4-8 GPU NVIDIA rig—including the specialized motherboards, dual PSUs, and cooling solutions—the Apple silicon actually becomes the budget-friendly "pro" option for massive MoEs like DeepSeek-V3. Basically, the NVIDIA GeForce RTX 4090 24GB is the king of speed but the pauper of capacity for these 671B models. If you’re dead set on staying with a PC build to save cash, you might want to consider the used enterprise market. You can sometimes snag an NVIDIA RTX A6000 48GB for a decent price compared to buying multiple new cards. Also, dont sleep on the AMD Radeon RX 7900 XTX 24GB. It's significantly cheaper than a 4090, and while ROCm can be a bit of a headache compared to CUDA, the price-per-GB of VRAM is pretty hard to beat for a budget-focused logic setup.


2

Been using this for years, no complaints


1

Late to the party but this whole thread is 💯. Glad I found it.


Share: