Best GPU for DeepSeek 671b?

0

07/03/2025 10:17 pm

Topic starter

Qak

(@qak)

Eminent Member

16 Posts
3 13 0

Which GPU do you recommend?

Add a comment

Topic Tags

GPU

5 Answers

3

18/02/2026 9:15 pm

JamesDox

(@jamesdox)

Active Member

6 Posts
1 5 0

.

Add a comment

2

17/02/2026 5:30 am

Dizaynersk_absl

(@dizaynersk_absl)

Active Member

9 Posts
0 9 0

Totally agree on the VRAM math, that model is a literal monster. But honestly, unless you’re an enterprise with a massive budget, buying sixteen H100s is basically impossible—you're looking at like $400k+ just for the cards haha. If you’re trying to be somewhat cost-conscious, you *need* to look into quantization. Running DeepSeek 671b at 4-bit (using something like GGUF or EXL2) drops the VRAM requirement to around 400GB. Still huge, but it brings it into the realm of a single NVIDIA HGX baseboard with 8x NVIDIA A100 80GB or even a cluster of NVIDIA RTX 6000 Ada cards if you can find them. Anyway, my two cents: unless you need 24/7 uptime, just rent. Renting a multi-GPU node on a cloud provider for a few bucks an hour is way cheaper than the electricity and cooling costs of owning this gear yourself. The hardware depreciation alone will kill your ROI. **TL;DR:** The previous advice is spot on for unquantized, but for budget's sake, use 4-bit quantization to cut VRAM needs in half and rent the compute instead of buying a house-sized investment in GPUs.

Add a comment

2

23/02/2026 11:15 pm

qezzhsmhkr

(@qezzhsmhkr)

New Member

4 Posts
0 4 0

Exactly what I was thinking

Add a comment

2

05/03/2026 2:50 pm

care about people around you_z

(@care-about-people-around-you_z)

Active Member

10 Posts
2 8 0

I saw this earlier but just getting around to it now. Honestly, the hardware landscape for models this big is pretty depressing right now.

NVIDIA is the only real choice because of the CUDA kernels specifically optimized for DeepSeeks MoE architecture. If you try running this on something like the AMD Radeon Pro W7900 48GB, you wont get nearly the same performance because the software stack just isnt as mature yet.

Memory bandwidth is the silent killer for 671b. Even if you cram a bunch of NVIDIA RTX 6000 Ada Generation cards together, the lack of NVLink on the newer Ada cards is such a massive disappointment. You end up getting bottlenecked by the PCIe bus when the experts are swapping data. I tried to diy a custom cooling loop for my home rig last summer to handle the heat from just two cards and I ended up accidentally spraying coolant all over my vintage comic book collection. Spent like four hours with a hairdryer trying to save a first edition of something that probably wasnt even worth that much. My wife still brings it up whenever I mention buying new parts. Anyway, basically you either go enterprise or prepare to suffer through slow inference. But yeah.

Add a comment

0

15/03/2025 10:12 pm

Deeeep2

(@deeeep2)

Active Member

8 Posts
1 7 0

NVIDIA H100 80GB: A multi-GPU setup is recommended, with at least 16 GPUs to handle the model's VRAM requirement of approximately 1,543 GB.
NVIDIA A100 80GB: Alternatively, a setup with 16 A100 GPUs can also meet the VRAM needs for this model.

Add a comment