I am in a serious bind right now and need to figure this out like yesterday. Ive got this client project that requires me to run DeepSeek V4 Pro locally because of some strict data privacy stuff basically I cant use the API at all or theyll kill the contract. The deadline is next Friday and my current rig is just laughing at me when I try to load the model. Im sitting here in San Jose looking at micro center and online retailers trying to pull the trigger on a build but the more I read the more I get confused.
I have about $6500 to drop on this right now. I was looking at those 4x RTX 4090 builds because people say theyre the best bang for your buck but then I saw a thread saying the P2P bandwidth on consumer cards is basically garbage for models this big and I might see massive slowdowns during inference. Then someone else said just get a used A100 80GB but I dont know if one 80GB card is even enough for the Pro version without heavy quantization and I really need to keep the precision high for the specific coding tasks Im running. Its for a proprietary codebase so accuracy is everything.
Then theres the whole power and cooling thing. If I go with four 4090s I probably need a dedicated circuit in my home office or my breakers are gonna pop every time I run a query right?? Ive seen people suggest the Mac Studio M2 Ultra with 192GB of RAM too but Ive always been a PC guy and Im worried about the token speed compared to CUDA especially for a model this heavy. I heard the unified memory is great for size but sucks for speed.
I need to buy this today so it arrives by Monday at the latest. If I want to run DeepSeek V4 Pro with decent speeds and not have it crawl at 1 token per second what is the actual realistic GPU setup I should get for that budget? Is it the multi-4090 path or should I be looking at used enterprise gear?...
Oh man, I totally feel your pain!! I was in this exact same spot last year trying to run some massive LLMs for a local dev project and it was a total nightmare until I figured out the secret sauce. For $6500, you are right on the edge of greatness! Honestly, I absolutely love the Apple Mac Studio M2 Ultra 76-core GPU 192GB RAM for this specific use case even tho you're a PC guy. I was a die-hard Windows user for twenty years, but when I realized I could fit the entire model into unified memory without messing with PCIe bandwidth bottlenecks, it changed everything! The inference speed is actually fantastic once you're up and running. If you're dead set on sticking with PC tho, dont sleep on the multi-GPU route. I once built a crazy rig with three NVIDIA GeForce RTX 3090 24GB cards used and it was incredible for the price! You get 72GB of VRAM which is amazing for keeping precision high. I slapped them onto an ASUS Pro WS WRX80E-SAGE SE WIFI motherboard and it worked like a charm. Just make sure you grab a beefy power supply like the EVGA SuperNova 1600 P2 1600W or you'll definitely trip those breakers when the model starts crunching. Seriously, that Mac is the easiest plug-and-play win if you need it by Monday, but the triple 3090 setup is a fun project if you have the patience to tune the cooling!!