What is the best GP...
 
Notifications
Clear all

What is the best GPU for running DeepSeek-V3 locally?

8 Posts
9 Users
0 Reactions
350 Views
0
Topic starter

Hey everyone! I have been following the news about DeepSeek-V3 and I really want to try running it locally instead of relying on their API. I am currently looking to upgrade my workstation because my old 3060 definitely will not cut it for something this big.

I know DeepSeek-V3 is a massive model with 671 billion parameters total, though it uses a Mixture-of-Experts architecture so only about 37 billion are active during a forward pass. Still, the memory requirements seem pretty insane. I have seen some benchmarks saying it needs hundreds of gigabytes of VRAM even at 4-bit quantization. I am wondering what kind of multi-GPU setup I would actually need to get decent tokens per second without the system crawling.

My situation is that I am working on some private data projects where privacy is a huge deal, so cloud hosting just is not an option for me. I was looking at things like:

  • Multiple RTX 3090s or 4090s linked together
  • A Mac Studio with 192GB Unified Memory
  • Maybe some refurbished A100s if I can find a good deal

I am really confused about whether I should go for consumer cards or if I absolutely need pro-grade hardware to fit the weights. Does anyone here have experience with these large MoE models? What is the most cost-effective GPU setup you would recommend to actually get DeepSeek-V3 running smoothly at home?


8 Answers
11

Honestly, running DeepSeek-V3 locally is a total beast. I have been experimenting with large MoE setups lately and the memory wall is the biggest hurdle you are gonna face. For speed, a multi-GPU rig with NVIDIA GeForce RTX 3090 24GB GDDR6X is the classic enthusiast move. You get fast inference thanks to the bandwidth, but fitting the full model at 4-bit would actually require like 16 of these cards, which is wild for a home setup. The power draw is basically like running a space heater. The Apple Mac Studio M2 Ultra 192GB RAM is the easy path. It handles huge models better because of the unified memory, though it is way slower for actual generation compared to dedicated GPUs. Personally, I would go with used 3090s. The price-to-performance is just hard to beat, even if the setup is a bit of a headache. You might have to use a heavy quant or offload some layers, but it feels much snappier than the Mac imo.


10

Jumping in here... I tried stacking 3090s but the heat was honestly unbearable. I eventually swapped to a few NVIDIA RTX A6000 48GB GDDR6 units I found refurbished. Having 48GB per slot is a total game changer for high-parameter models. Its way more cost-effective than A100s and actually fits in a standard workstation chassis. Youll probably still need to offload some layers for the full V3 weights tho, but its way smoother.


3

My buddy told me the exact same thing last week. Guess he was right lol.


2

Spot on about the memory wall, it is basically the only thing that matters for V3. One technical point tho: dont sleep on interconnect bandwidth. With MoE, you are constantly shuffling data between chips, so if you are running narrow lanes, your tokens per second will tank regardless of how much VRAM you have. Latency across the bus is honestly the silent killer for these huge models.


2

Ok adding this to my list of things to try. Thanks for the tip!


2

TIL! Thanks for sharing


2

To add to the point above: I totally see why everyone keeps suggesting a pile of 3090s, but honestly, I'm a bit skeptical about that route for long-term use! Running consumer cards at 100% capacity for these massive DeepSeek-V3 inferences makes me super nervous about heat and hardware fatigue. If you want something thats gonna last through years of private data projects without needing a fire extinguisher nearby, I'd really suggest looking at a pair or more of the NVIDIA RTX 6000 Ada Generation 48GB GDDR6. The stability on these is just amazing! Unlike the 4090s, these are built for workstation airflow and they have ECC memory which is a total lifesaver for data integrity. You wont have to deal with weird driver quirks or those terrifying melting power connectors either. It's a bit more of an investment upfront, but the peace of mind knowing your rig wont crash in the middle of a massive generation is just fantastic. Plus, the power efficiency is way better, so your electricity bill wont look like a phone number every month! lol.


1

Coming back to this thread after doing some more digging... honestly I am in the exact same boat right now trying to figure out a setup for some private data projects where the cloud is a total non-starter. Like someone mentioned, that memory wall is just brutal for DeepSeek-V3. In my experience over the years, trying to daisy-chain a dozen consumer cards usually ends in a driver nightmare or a tripped circuit breaker. If you want real performance, you might want to look at these instead of the standard 3090 route:

  • NVIDIA L40S 48GB GDDR6 is probably the most efficient way to get high VRAM density right now for inference.
  • NVIDIA RTX 6000 Ada Generation 48GB is the top-tier choice if you can swing the cost, but the L40S is usually cheaper.
  • ASUS ESC8000 G4 is a beast of a chassis if you decide to go the rackmount route to keep things cool. Ngl, it is a struggle to balance the cost. I have been looking at the L40S units myself because the 48GB per card makes the model splitting way less of a headache across the PCIe bus. Let me know if you find a good vendor for those refurbished cards tho, I am still hunting for a deal myself...


Share: