What is the best GPU for running DeepSeek-V3 locally?

Question

What gpu do I actually need to get deepseek-v3 running at home without it crawling at like 1 token per second? Honestly so fed up with my current 3060 setup its just not cutting it anymore and the constant oom errors are driving me insane. I spent all weekend trying to optimize the weights and quantizing everything but its still a mess. Im working on this coding assistant project for my startup and I need something that actually feels fast. I've got about $2500 to spend but I'm lost between getting a used 3090 or just biting the bullet on a 4090 even though the vram is basically the same. Is 24gb even enough or am I gonna have to go dual gpus?

NottinghamSheriff · Accepted Answer

Honestly I've been super satisfied with my dual NVIDIA GeForce RTX 3090 24GB setup lately. It just works well and stays really stable for long coding sessions. If you value reliability, picking up two used 3090s is probably your safest bet for that budget because 48GB VRAM gives you way more breathing room than a single card ever will. Quick tip tho: look into using the EXL2 quantization format instead of standard GGUF. It handles the VRAM way better for large models like DeepSeek. Also, you should definitely bookmark the r/LocalLLM subreddit and the Hugging Face model cards from LoneStriker or Bartowski. They're basically the gold standard for finding quants that wont crash your system every five minutes. Happy with my rig so far, no complaints!

viqzmtjsjy · Answer

Yo, I totally feel your pain with those OOM errors, they are the absolute worst!! Honestly if you want DeepSeek-V3 to actually fly, 24GB just isnt gonna cut it for the bigger quants. Since you have $2500, you should definitely skip the single 4090 and go for a dual setup. I did this recently and its amazing how much faster things get when you stop swapping to system RAM. Heres what I would do:

Grab two used NVIDIA GeForce RTX 3090 24GB cards. You can find them for like 800 bucks each online.

Get a massive PSU like the EVGA SuperNOVA 1600 P2 1600W because two of those pull serious power. Dual 3090s gives you 48GB VRAM which is a total game changer for coding assistants! Youll actually be able to run decent quants without it crawling. Good luck with the startup, sounds like a blast!!