What is the best GPU setup for running DeepSeek V4 Pro locally?

Question

Look, Ive been running local LLMs since the early Llama 1 days and usually I can figure out the hardware requirements pretty easily but DeepSeek V4 Pro is absolutely killing my brain right now. I've got this medical research project starting next month where we need to process sensitive patient data locally—cant use the API for obvious privacy reasons—and the benchmarks for the V4 Pro look insane but the VRAM requirement is a total nightmare.

I thought my dual 3090 setup would handle it with some heavy quantization but even at 4-bit it seems like it is gonna choke or just run at like 0.5 tokens per second which is useless for my timeline. I have about $5,000 to spend on a dedicated workstation build but I'm torn between trying to find a used A6000 or just stuffing a bunch of 3090s into a server rack but then I'm worried about the PCIe bandwidth and thermal throttling in my small home office. Is anyone actually running this thing smoothly without a full enterprise cluster? What is the realistic minimum GPU memory and interconnect speed I need to get decent inference speeds for the Pro version? I really need to get this ordered by next Friday or the whole project is gonna stall...

zjpepndszg · Accepted Answer

honestly tried quad NVIDIA RTX 3090 24GBs but it was a mess - thermals were tragic in my officeunfortunately pcie bandwidth limits killed my speedsmaybe just get a used NVIDIA A6000 48GB

xrrdpfsxkd · Answer

Honestly, i moved to a used NVIDIA RTX A6000 48GB and its been way more stable. My old dual NVIDIA RTX 3090 24GB setup had too many thermal issues for long research runs tho.

MichaelSwags · Answer

honestly i moved to a quad 3090 setup last year and ive been super happy with how it handles the larger models like deepseek. the vram requirement for the pro is a total beast and i totally get the frustration. your dual 3090s are just hitting a wall because of the split and the sheer size of the weights. if you want to keep it under five grand, your best bet is actually sticking with the consumer cards but getting a board that can handle four of them. i ended up using the ASUS ProWS WRX80E-SAGE SE WIFI Motherboard and its been rock solid for me. no complaints at all about the pcie lane distribution which is usually the biggest headache. for a medical project where you need total privacy, you definitely want that local vram headroom. four NVIDIA GeForce RTX 3090 24GB GDDR6X cards will give you 96gb total. that should be enough to run v4 pro at 4-bit with room for a decent context window without it crawling. tbf the heat is a real issue in a small room... i had to get a dedicated portable ac unit because it basically turns into a space heater during long inference runs. using an NVIDIA RTX A6000 48GB GDDR6 is way cleaner and uses less power, but trying to find two of those used for under five grand right now is tough. if you can manage the cooling and the power draw, the four 3090s give you the most bang for your buck. i havent noticed much bottlenecking with the interconnect speeds for pure inference so you should be good to go.