Best GPU for runnin...
 
Notifications
Clear all

Best GPU for running DeepSeek-V3 locally?

5 Posts
6 Users
0 Reactions
939 Views
0
Topic starter

Hey everyone! I’m really hyped about the DeepSeek-V3 release, but I’m struggling to figure out the best hardware for a smooth local setup. Since it’s a massive Mixture-of-Experts model, my current RTX 3080 definitely won't cut it for that parameter count. I’m debating whether to save up for a single RTX 4090 or if I should look into a multi-GPU setup with used 3090s to stack enough VRAM for a decent quantization. I mainly want to use it for heavy coding tasks and local RAG. Does anyone have experience running V3 yet? What's the most cost-effective GPU configuration to get usable tokens per second without going into enterprise-grade hardware?


Topic Tags
5 Answers
12

Quick question - what is ur actual budget?? I reallyyy wanna help cuz I struggled so much with VRAM lately!!

• NVIDIA GeForce RTX 4090 24GB: amazing speed but limited VRAM for DeepSeek-V3.
• 2x used NVIDIA GeForce RTX 3090 24GB: way more VRAM for way less cash.

tbh VRAM is literally EVERYTHING for huge MoE models like this. peace! ✌️


10

just saw this! Curious about one thing: whats ur actual budget? unfortunately my NVIDIA GeForce RTX 3080 10GB had issues with quants...

• maybe look for used NVIDIA GeForce RTX 3090 24GB cards for like $700? idk


5

Yo, so DeepSeek-V3 is a huge Mixture-of-Experts model with over 600B parameters. Basically, that means VRAM is ur biggest hurdle. I think a single NVIDIA GeForce RTX 4090 24GB just wont cut it for a decent quant... I would suggest grabbing two or three used NVIDIA GeForce RTX 3090 24GB cards instead. Stacking them is way more cost-effective for local RAG tasks. Plus, it works way better than enterprise stuff for the price. gl!


2

Re: "just saw this! Curious about one thing: whats..." - yeah budget is basically the biggest hurdle for everyone right now. I was looking into some other brands to see if we could get more VRAM for cheaper but unfortunately it was a bit of a letdown.

  • Intel is still way behind on the driver side for these massive models which is frustrating
  • AMD has the raw memory but the ROCm setup is still a headache compared to CUDA
  • Stacking older cards is the only way to avoid the enterprise tax but the power bill is gonna be brutal tbh the power draw is so bad it reminds me of that summer my AC died and I tried to build a DIY swamp cooler using a box fan and a bucket of ice. I ended up spilling water all over my hardwood floors and it warped the boards so bad I couldnt close the bedroom door for months. My cat was so confused and kept tripping over the bumps. Anyway lol sorry kinda went off topic there.


1

Jumping in here with a slightly different angle. Before you go down the multi-GPU rabbit hole, what's your case and PSU situation like? Stacking cards sounds easy until you realize you need a 1600W unit and a blower-style setup to keep things from melting in a standard mid-tower. I've been doing some market research on how these MoE models scale, and there's basically a brand split you should consider. If you stick with Team Green, stacking used NVIDIA GeForce RTX 3090 24GB cards is the only practical route because the NVIDIA GeForce RTX 4090 24GB is just too VRAM-limited for the price when dealing with a 600B+ model like V3. The pros are obviously top-tier CUDA support and speed, but the cons are the massive power draw and heat. Alternatively, if you're open to other brands, looking at a Apple Mac Studio M2 Ultra with 128GB+ of unified memory is actually a legit move for local RAG. Pros: it handles the massive memory requirements of DeepSeek-V3 way easier than a consumer PC, and it's silent. Cons: the tokens per second wont touch a multi-GPU NVIDIA rig. Ngl, for heavy coding where you need the context window, that unified memory is a game changer. Are you married to the PC ecosystem or would you consider a Mac for the VRAM overhead?


Share: