Which GPU do you recommend?
.
Totally agree on the VRAM math, that model is a literal monster. But honestly, unless you’re an enterprise with a massive budget, buying sixteen H100s is basically impossible—you're looking at like $400k+ just for the cards haha. If you’re trying to be somewhat cost-conscious, you *need* to look into quantization. Running DeepSeek 671b at 4-bit (using something like GGUF or EXL2) drops the VRAM requirement to around 400GB. Still huge, but it brings it into the realm of a single NVIDIA HGX baseboard with 8x NVIDIA A100 80GB or even a cluster of NVIDIA RTX 6000 Ada cards if you can find them. Anyway, my two cents: unless you need 24/7 uptime, just rent. Renting a multi-GPU node on a cloud provider for a few bucks an hour is way cheaper than the electricity and cooling costs of owning this gear yourself. The hardware depreciation alone will kill your ROI. **TL;DR:** The previous advice is spot on for unquantized, but for budget's sake, use 4-bit quantization to cut VRAM needs in half and rent the compute instead of buying a house-sized investment in GPUs.
Exactly what I was thinking
I saw this earlier but just getting around to it now. Honestly, the hardware landscape for models this big is pretty depressing right now.