What is the best GP...
 
Notifications
Clear all

What is the best GPU for running DeepSeek-V3 locally?

6 Posts
6 Users
0 Reactions
487 Views
0
Topic starter

Hey everyone! I’ve been following the release of DeepSeek-V3, and the benchmarks are honestly insane. I really want to get it running locally to experiment with it without worrying about API costs or privacy, but I’m hitting a wall when it comes to hardware specs. Since it's a massive Mixture-of-Experts (MoE) model with over 600B parameters, I know I'm going to need some serious VRAM, but I'm not sure what the most "efficient" setup is for a home enthusiast.

I was looking at the RTX 4090, but with only 24GB of VRAM, I’m worried I’ll be stuck with heavy quantization that ruins the performance. I’ve seen some talk about running the FP8 version or using multiple GPUs like a dual RTX 3090 setup to hit 48GB, but I'm curious if that's even enough for a decent tokens-per-second rate. Is it feasible to run DeepSeek-V3 on consumer hardware, or do I need to start looking into enterprise gear like the A6000?

I’ve got a budget of around $3,000 for the GPU side of things. Should I prioritize a single high-end card or go for a multi-GPU NVLink setup to handle those massive weights? What are you guys using to get this model running smoothly?


6 Answers
10

yo oh man I totally feel u... I was looking at those benchmarks too and they are literally insane!! but honestly, running DeepSeek-V3 locally is a huge task for consumer gear. I'm kinda new to this but I've been experimenting with multiple cards and VRAM is basically everything. For $3,000, I would suggest being really careful about where you spend it. Here's what I recommend: 1. Honestly, grab two or even three used NVIDIA GeForce RTX 3090 24GB cards. You can find them for like $750 each on eBay. That gets you 48GB or 72GB of VRAM which is way better for those massive weights than a single high-end card. 2. Be careful with the NVIDIA RTX A6000 48GB. It's cool but basically eats your whole budget and you still only get 48GB... maybe not worth it for a home setup?
3. Avoid the NVIDIA GeForce RTX 4090 24GB for this specific model... 24GB is just too small and the quantization will lowkey ruin the logic. Anyway, multi-GPU is the way to go for MoE models, just make sure your power supply is beefy enough!! I almost tripped my breaker last week lol. gl! 👍


10

> I’ve got a budget of around $3,000 for the GPU side of things. Should I prioritize a single high-end card or go for a multi-GPU NVLink setup? Honestly, for DeepSeek-V3 on that budget, skip the 4090. I'm SO satisfied with my setup of three used NVIDIA GeForce RTX 3090 24GB cards. It's way more cost-effective than an A6000 and gives you 72GB of VRAM for like $2,100... Youll still need GGUF offloading to system RAM for the full model, but its the best value tbh. gl!


3

Late to the party but I wanted to chime in. In my experience, trying to run a 600B+ model like DeepSeek-V3 on a $3,000 budget means you really have to weigh capacity against convenience. Over the years, I've tried many different setups, and while everyone loves the 3090s, they are power hogs and generating that much heat in a home office is a chore. Comparing the options, I've found these paths most viable:

  • A used NVIDIA RTX A6000 48GB is probably your best bet for a single-slot solution that wont melt your motherboard. It gives you professional-grade stability and plenty of VRAM for heavy quantization.
  • The AMD Radeon Pro W7900 45GB offers massive VRAM for the price, though the software stack is slightly more finicky than CUDA when you're setting up the environment.
  • If you are okay moving away from a traditional tower, the Apple Mac Studio M2 Ultra 128GB is actually the most elegant way to handle those weights because of the unified memory architecture. It isnt a traditional GPU setup, but for MoE models, it works surprisingly well. I've seen people try to offload to system RAM to save money, but the speed hit is usually brutal. If you do go that route, maybe look into G.Skill Trident Z5 Neo 128GB DDR5 instead of the usual brands... the timings really matter when youre waiting on the CPU to catch up.


2

Following this thread


1

sooo, in my experience, trying to run something like DeepSeek-V3 on a consumer budget is basically like trying to fit a V12 engine into a lawnmower lol. It’s a massive 671B MoE model, so even at a heavy 4-bit quantization, ur looking at needing way over 350GB of VRAM. With a $3k budget, you aren't gonna fit the whole thing in VRAM unless you find a pile of used cards and a server rack. But here is what I suggest for a solid enthusiast setup: 1. **Go for Multi-GPU, not one powerhouse.** A single NVIDIA GeForce RTX 4090 24GB is a beast, but 24GB is a drop in the bucket here. I’d actually recommend hunting for three used NVIDIA GeForce RTX 3090 24GB cards. You can usually find them for $700-800 each. That gives you 72GB of VRAM for around $2,400, which is way more useful for big models than one fast 4090.
2. **Consider the used enterprise route.** If you want a cleaner build, look for a used NVIDIA RTX A6000 48GB. I’ve seen them go for around $2,800 lately. It’s basically a 3090 with double the VRAM. You lose some speed, but for these massive MoE models, VRAM capacity is literally everything.
3. **System RAM is your safety net.** Since you wont fit 671B into 72GB of VRAM, you’ll be offloading layers to system memory. Make sure you've got at least 256GB of something like Corsair Vengeance LPX 128GB (4 x 32GB) DDR4 3600MHz (get two kits). It’ll be slow—maybe 1 or 2 tokens per second—but it’ll actually run without crashing. Honestly, I’ve tried many setups over the years and the multi-3090 route is still the goat for value. It's a bit of a headache to cool, but it's the only way to get that much VRAM without spending $10k+ on H100s. gl!


1

Ok adding this to my list of things to try. Thanks for the tip!


Share: