So I've been really geeking out on local LLMs lately and I'm finally ready to pull the trigger on a dedicated rig. I live in a pretty small apartment in Chicago and the power bill is already kinda high so I'm trying to be smart about this but I really need to get DeepSeek V4 Pro running locally for some coding projects I'm working on. I've got about 3500 bucks saved up for this build and I want to get it right the first time.
I've been looking around and some people on Reddit say that dual RTX 3090s are the way to go because of the 24GB VRAM each but then I read a blog post saying that V4 Pro is so massive that even 48GB total VRAM is gonna struggle with higher context windows unless you use really heavy quantization which sounds like it might mess with the accuracy. Then there's the whole debate about Mac Studio M2 or M3 Ultra setups having way more unified memory for the price but the token generation speeds look slower in the benchmarks I found online. Its honestly super confusing trying to figure out if I should go the PC route with multiple used cards or just buy a pre-built workstation.
If I go with a multi-GPU setup is it even possible to run it smoothly on a standard 15 amp circuit or am I gonna trip the breaker every time I start a prompt? What's the absolute best way to spend that 3-4k to get decent speeds on DeepSeek V4 Pro right now without it being a total headache to maintain?
I totally get the excitement! Building a local rig is absolutely amazing! Since you're in a small Chicago place and worried about that 15 amp breaker, I've gotta say... avoid the multi-GPU heater! I went through this exact same panic and ended up with a Mac setup because its just so much more reliable for a home setup. You wont be worrying about fire hazards or your electricity bill exploding! Here is why you should go with the Apple Mac Studio M2 Ultra 128GB Unified Memory:
> What's the absolute best way to spend... Regarding #2, I agree, but make sure to power-limit two NVIDIA GeForce RTX 3090 24GB GDDR6X units to avoid tripping breakers during heavy inference.
> What's the absolute best way to spend that 3-4k to get decent speeds Just stick with NVIDIA and youre golden! The CUDA backend support is just incredible for local models and makes everything run so smooth... its basically the gold standard! Reach out if you need more help!