What gpu do I actually need to get deepseek-v3 running at home without it crawling at like 1 token per second? Honestly so fed up with my current 3060 setup its just not cutting it anymore and the constant oom errors are driving me insane. I spent all weekend trying to optimize the weights and quantizing everything but its still a mess. Im working on this coding assistant project for my startup and I need something that actually feels fast. I've got about $2500 to spend but I'm lost between getting a used 3090 or just biting the bullet on a 4090 even though the vram is basically the same. Is 24gb even enough or am I gonna have to go dual gpus?
Honestly I've been super satisfied with my dual NVIDIA GeForce RTX 3090 24GB setup lately. It just works well and stays really stable for long coding sessions. If you value reliability, picking up two used 3090s is probably your safest bet for that budget because 48GB VRAM gives you way more breathing room than a single card ever will. Quick tip tho: look into using the EXL2 quantization format instead of standard GGUF. It handles the VRAM way better for large models like DeepSeek. Also, you should definitely bookmark the r/LocalLLM subreddit and the Hugging Face model cards from LoneStriker or Bartowski. They're basically the gold standard for finding quants that wont crash your system every five minutes. Happy with my rig so far, no complaints!
Yo, I totally feel your pain with those OOM errors, they are the absolute worst!! Honestly if you want DeepSeek-V3 to actually fly, 24GB just isnt gonna cut it for the bigger quants. Since you have $2500, you should definitely skip the single 4090 and go for a dual setup. I did this recently and its amazing how much faster things get when you stop swapping to system RAM. Heres what I would do: