What is the best GPU for running DeepSeek models locally?

Question

Ive been doing local LLM stuff for a couple years now mostly on my trusty old 3060 but man DeepSeek is just a different beast entirely. I tried running the 67b model with 4-bit quantization and it basically just bricked my system lol. I have this dev project due next Friday and I really need to get the Coder version running with decent tokens per second.

My budget is around 1500 max and im looking at maybe a used 3090 or just biting the bullet on a 4080 super but im worried about the vram limit on the 40 series. What is the actual best gpu setup for deepseek right now if i need it running like yesterday?

NARTYTRYUT3565021NERTYTRY · Accepted Answer

Like someone mentioned, VRAM is basically the only thing that matters for DeepSeek. I am very satisfied with how my ASUS TUF Gaming GeForce RTX 3090 24GB GDDR6X handles 4-bit quantization; it works well.

3090: 24GB VRAM is ideal for 67B models.

NVIDIA GeForce RTX 4080 Super 16GB GDDR6X: 16GB just isnt enough for local inference here. What power supply are you working with tho? That 3090 is thirsty.

Kevinextix · Answer

Man I feel your pain. I spent years trying to squeeze performance out of mid-range cards before I finally accepted that for local LLMs, VRAM is the only metric that actually matters. In my experience, the 40 series is a total trap for deepseek unless you go 4090. I've tried many setups over the years and honestly, if you have a 1500 budget, you should hunt down two used NVIDIA GeForce RTX 3090 24GB GDDR6X cards. I remember when I first tried running big models on a single 16gb card... it was just painful. With a 3090, you get that 24GB buffer which is the bare minimum for DeepSeek. But for the 67b version? You really need dual cards. Dont touch the NVIDIA GeForce RTX 4080 Super 16GB GDDR6X, that 16GB limit will haunt you the second you try to load a decent context window. You basically need 48GB total to run 67B comfortably.

SecondCityDerby · Answer

Same here!