Hey everyone! I’m really hyped about the DeepSeek-V3 release, but I’m struggling to figure out the best hardware for a smooth local setup. Since it’s a massive Mixture-of-Experts model, my current RTX 3080 definitely won't cut it for that parameter count. I’m debating whether to save up for a single RTX 4090 or if I should look into a multi-GPU setup with used 3090s to stack enough VRAM for a decent quantization. I mainly want to use it for heavy coding tasks and local RAG. Does anyone have experience running V3 yet? What's the most cost-effective GPU configuration to get usable tokens per second without going into enterprise-grade hardware?
Quick question - what is ur actual budget?? I reallyyy wanna help cuz I struggled so much with VRAM lately!!
• NVIDIA GeForce RTX 4090 24GB: amazing speed but limited VRAM for DeepSeek-V3.
• 2x used NVIDIA GeForce RTX 3090 24GB: way more VRAM for way less cash.
tbh VRAM is literally EVERYTHING for huge MoE models like this. peace! ✌️
just saw this! Curious about one thing: whats ur actual budget? unfortunately my NVIDIA GeForce RTX 3080 10GB had issues with quants...
• maybe look for used NVIDIA GeForce RTX 3090 24GB cards for like $700? idk
Yo, so DeepSeek-V3 is a huge Mixture-of-Experts model with over 600B parameters. Basically, that means VRAM is ur biggest hurdle. I think a single NVIDIA GeForce RTX 4090 24GB just wont cut it for a decent quant... I would suggest grabbing two or three used NVIDIA GeForce RTX 3090 24GB cards instead. Stacking them is way more cost-effective for local RAG tasks. Plus, it works way better than enterprise stuff for the price. gl!
Re: "just saw this! Curious about one thing: whats..." - yeah budget is basically the biggest hurdle for everyone right now. I was looking into some other brands to see if we could get more VRAM for cheaper but unfortunately it was a bit of a letdown.
Jumping in here with a slightly different angle. Before you go down the multi-GPU rabbit hole, what's your case and PSU situation like? Stacking cards sounds easy until you realize you need a 1600W unit and a blower-style setup to keep things from melting in a standard mid-tower. I've been doing some market research on how these MoE models scale, and there's basically a brand split you should consider. If you stick with Team Green, stacking used NVIDIA GeForce RTX 3090 24GB cards is the only practical route because the NVIDIA GeForce RTX 4090 24GB is just too VRAM-limited for the price when dealing with a 600B+ model like V3. The pros are obviously top-tier CUDA support and speed, but the cons are the massive power draw and heat. Alternatively, if you're open to other brands, looking at a Apple Mac Studio M2 Ultra with 128GB+ of unified memory is actually a legit move for local RAG. Pros: it handles the massive memory requirements of DeepSeek-V3 way easier than a consumer PC, and it's silent. Cons: the tokens per second wont touch a multi-GPU NVIDIA rig. Ngl, for heavy coding where you need the context window, that unified memory is a game changer. Are you married to the PC ecosystem or would you consider a Mac for the VRAM overhead?