What is the best GPU for running DeepSeek-V3 locally?

Question

Hey everyone! I’ve been following the release of DeepSeek-V3, and the benchmarks are honestly insane. I really want to get it running locally to experiment with it without worrying about API costs or privacy, but I’m hitting a wall when it comes to hardware specs. Since it's a massive Mixture-of-Experts (MoE) model with over 600B parameters, I know I'm going to need some serious VRAM, but I'm not sure what the most "efficient" setup is for a home enthusiast.

I was looking at the RTX 4090, but with only 24GB of VRAM, I’m worried I’ll be stuck with heavy quantization that ruins the performance. I’ve seen some talk about running the FP8 version or using multiple GPUs like a dual RTX 3090 setup to hit 48GB, but I'm curious if that's even enough for a decent tokens-per-second rate. Is it feasible to run DeepSeek-V3 on consumer hardware, or do I need to start looking into enterprise gear like the A6000?

I’ve got a budget of around $3,000 for the GPU side of things. Should I prioritize a single high-end card or go for a multi-GPU NVLink setup to handle those massive weights? What are you guys using to get this model running smoothly?

dtsuprygwu · Accepted Answer

yo oh man I totally feel u... I was looking at those benchmarks too and they are literally insane!! but honestly, running DeepSeek-V3 locally is a huge task for consumer gear. I'm kinda new to this but I've been experimenting with multiple cards and VRAM is basically everything. For $3,000, I would suggest being really careful about where you spend it. Here's what I recommend: 1. Honestly, grab two or even three used NVIDIA GeForce RTX 3090 24GB cards. You can find them for like $750 each on eBay. That gets you 48GB or 72GB of VRAM which is way better for those massive weights than a single high-end card. 2. Be careful with the NVIDIA RTX A6000 48GB. It's cool but basically eats your whole budget and you still only get 48GB... maybe not worth it for a home setup?
3. Avoid the NVIDIA GeForce RTX 4090 24GB for this specific model... 24GB is just too small and the quantization will lowkey ruin the logic. Anyway, multi-GPU is the way to go for MoE models, just make sure your power supply is beefy enough!! I almost tripped my breaker last week lol. gl! 👍

CharlesHowly · Answer

> I’ve got a budget of around $3,000 for the GPU side of things. Should I prioritize a single high-end card or go for a multi-GPU NVLink setup? Honestly, for DeepSeek-V3 on that budget, skip the 4090. I'm SO satisfied with my setup of three used NVIDIA GeForce RTX 3090 24GB cards. It's way more cost-effective than an A6000 and gives you 72GB of VRAM for like $2,100... Youll still need GGUF offloading to system RAM for the full model, but its the best value tbh. gl!

wgzvgnuxmx · Answer

Late to the party but I wanted to chime in. In my experience, trying to run a 600B+ model like DeepSeek-V3 on a $3,000 budget means you really have to weigh capacity against convenience. Over the years, I've tried many different setups, and while everyone loves the 3090s, they are power hogs and generating that much heat in a home office is a chore. Comparing the options, I've found these paths most viable:

A used NVIDIA RTX A6000 48GB is probably your best bet for a single-slot solution that wont melt your motherboard. It gives you professional-grade stability and plenty of VRAM for heavy quantization.

The AMD Radeon Pro W7900 45GB offers massive VRAM for the price, though the software stack is slightly more finicky than CUDA when you're setting up the environment.

If you are okay moving away from a traditional tower, the Apple Mac Studio M2 Ultra 128GB is actually the most elegant way to handle those weights because of the unified memory architecture. It isnt a traditional GPU setup, but for MoE models, it works surprisingly well. I've seen people try to offload to system RAM to save money, but the speed hit is usually brutal. If you do go that route, maybe look into G.Skill Trident Z5 Neo 128GB DDR5 instead of the usual brands... the timings really matter when youre waiting on the CPU to catch up.