I'm really hyped about the DeepSeek-V3 release, but looking at the specs, it seems like an absolute beast to run locally. Since it’s a 671B MoE model, I know my current mid-range setup won’t even come close. I’m trying to figure out what hardware is actually necessary to get decent tokens per second without needing an enterprise-grade server. I’m considering a multi-GPU setup, maybe pooling a few used RTX 3090s or 4090s for that 24GB VRAM, but I'm worried about the total VRAM needed for 4-bit or 8-bit quantization. Does anyone have experience benchmarkng this yet? What’s the most cost-effective GPU configuration to handle those massive VRAM requirements while keeping the generation speed usable?
Oh man, I've been obsessing over this too!! Honestly, ur best bet is definitely pooling used cards: * Option A: NVIDIA GeForce RTX 3090 24GB (~$700)
* Option B: NVIDIA GeForce RTX 4090 24GB (~$1,800) The 3090 wins on value, right? Pros? Same VRAM, way cheaper. Cons? Slower bandwidth. Ngl, just go with used 3090s... they're amazing for this! gl
Sooo, I've been building LLM rigs since the early GPT-2 days, and honestly, DeepSeek-V3 is a total monster compared to what we're used to. To even get this running at a usable speed without a $40k server, ur gonna have to get creative with quantization. At 4-bit, you're looking at nearly 400GB of VRAM. Unless you've got a massive rack, that's like 16-17 NVIDIA GeForce RTX 3090 24GB cards!! Heres what I recommend for a more "budget" pro build: * Grab used NVIDIA GeForce RTX 3090 24GB cards. The NVIDIA GeForce RTX 4090 24GB is faster, but for MoE, VRAM capacity is king and the 3090 is basically half the price.
* Look into 1.5-bit or 2-bit EXL2 quants. It'll fit in about 200-240GB, which brings the GPU count down to a "reasonable" 10 cards.
* Make sure ur power supply can handle the transient spikes... seriously, you'll trip breakers with that many cards lol. I would suggest looking at a used Supermicro AS-4124GS-TNR server chassis. It handles the PCIe lanes way better than any consumer motherboard ever could. Just be careful with the heat, it's basically a space heater. gl!
Similar situation here - I've tried many rigs over the years, including a MoE setup last summer. MoE models are basically massive libraries where the whole thing has to fit in VRAM to work. I think this 671B beast needs like 400GB+? Honestly, I ran into huge PCIe bottlenecks. IIRC, itll be slow without high-bandwidth interconnects. Basically, VRAM is only half the battle. Tbh im still learning too!
Would love to know this too
I have been tracking the hardware market for a while, and honestly, you have to be pretty cautious when looking at brand ecosystems for a beast like DeepSeek-V3. It is not just about the card specs, it is about the long-term reliability of the whole setup.
I totally agree about the used market being a total gamble right now. Plus, finding that many matching cards from one seller is nearly impossible anyway. Honestly though, I have to disagree with the idea of building a 16-GPU rack at home. It sounds cool in theory, but the infrastructure needed is basically enterprise-level. For a DIY enthusiast, you are better off looking at high-RAM unified memory workstations instead. Even if the tokens per second are lower, you avoid the nightmare of:
Came here to say the same thing lol. Great minds think alike I guess.