What is the best GPU for running DeepSeek locally?

Question

I really gotta get DeepSeek-V2 running for this dev project I'm finishing by next weekend. I've been looking at the 3090 because of the 24GB VRAM and people say its the budget king for local LLMs but then I see folks on Reddit swearing by the 4090 for speed. My logic was that more memory is better for the big models but if it takes ten seconds per word then whats the point? I only have about $1600 to drop on this right now so I cant just buy both. If I get a used 3090 will it actually handle the 67B version well enough or am I gonna regret not getting something newer? Need to decide like today so I can get it shipped...

spkylshmfz · Accepted Answer

I went through this exact mess last year. Picking a used NVIDIA GeForce RTX 3090 24GB GDDR6X was the safer bet for my budget back then.

3090: Solid 24GB, handles the 67B model okay with some offloading.

NVIDIA GeForce RTX 4090 24GB GDDR6X: Much faster but way more expensive. Honestly, just get the 3090. It works fine for dev work without breaking the bank.

Muhpruppy · Answer

Just seeing this now... honestly, if you want to run the 67B version properly, a single 24GB card is gonna be a struggle. Even with the raw speed of a NVIDIA GeForce RTX 4090 24GB GDDR6X, you simply wont have enough VRAM to fit the model weights without heavy quantization or slow system RAM offloading. Since your budget is around $1600, the most methodical route is looking for two used NVIDIA GeForce RTX 3090 24GB GDDR6X cards. This gives you 48GB of total VRAM which is way more reliable for these larger weights. It is a bit more of a hassle regarding power requirements and cooling, but it ensures you actually have the memory headroom required for the task. The 4090 is an amazing piece of tech but memory capacity is usually the hard limit for these deep learning dev projects.

gwexkjhmgo · Answer

In my experience, VRAM capacity is your absolute bottleneck here. The NVIDIA GeForce RTX 4090 24GB GDDR6X is faster, but speed doesn't matter if the model spills into slow system RAM. At $1600, consider these points:

Memory bandwidth on the 3090 is still elite for LLM inference.

DeepSeek 67B needs roughly 40GB for usable context at 4-bit quantization. Honestly, grab a used NVIDIA GeForce RTX 3090 24GB GDDR6X now and save for a second card later.