Look I've been scouring threads for days and I'm just getting more annoyed by the minute. I need to get DeepSeek V4 Pro running locally for a coding assistant project I'm starting next month here in Seattle, but nobody seems to agree on what actually works.
I've got a $4,500 budget and I read some people saying dual 3090s are enough with heavy quantization, but then other guys are claiming you absolutely need 80GB+ of VRAM or itll just crawl. I dont want to drop four grand on a build that ends up being a paperweight. Is a Mac Studio with 128GB Unified Memory actually better for this or should I stick with a multi-GPU PC build? What hardware are you guys actually using to get decent tokens per second on V4 Pro?
Honestly, I had issues with the multi-GPU route because the heat and power draw were just a mess. Unfortunately, even two NVIDIA GeForce RTX 3090 24GB cards struggle with memory overhead on the larger quants. If you want to avoid the headache, just buy a Apple Mac Studio M2 Ultra 128GB. Its not as snappy as I expected, but at least it actually runs without crashing.
If you wanna save cash, look at used hardware. You can grab three NVIDIA GeForce RTX 3090 24GB GDDR6X cards for about $800 each. Really solid value. Put them in a refurbished Dell Precision 7920 Tower with a 1600W PSU. That gives you 72GB VRAM for under $3,500 total. It runs hot. But it handles bigger DeepSeek quants better for way less money... decent option if you dont mind the power draw.