What is the best hardware for running DeepSeek locally?

Question

I'm looking to set up a dedicated rig to run DeepSeek models locally for my coding projects. I'm specifically curious about the VRAM requirements for the 67B version versus the newer R1 models. Should I prioritize multiple 3090s or go for a high-end Mac? What's the most cost-effective hardware setup for smooth inference right now?

kloxmekfel · Accepted Answer

For your situation, I honestly think the hardware landscape for these big models is kinda frustrating right now. I've spent thousands trying to get smooth inference on the full R1, and unfortunately, it's just not as good as expected on consumer gear because of the sheer weight of the parameters.

If you want the most bang for your buck, dual NVIDIA GeForce RTX 3090 24GB cards are basically the only way to go. You can find them used for around $750 each. That gives you 48GB VRAM which handles the 67B version at 4-bit quantization pretty easily, but even that setup struggles with the larger R1 models unless you use the distilled ones like the DeepSeek-R1-Distill-Llama-70B.

I had issues with the Apple Mac Studio M2 Ultra 192GB RAM despite the massive unified memory pool. The memory bandwidth is great on paper, but the actual token-per-second speed for heavy coding tasks was way lower than my multi-GPU rig, which was super disappointing given the $5,000 price tag.

Right now, the most cost-effective sweet spot is a used workstation with a AMD Ryzen Threadripper 3960X and two or three 3090s. It's loud and eats power like crazy, but it's basically the only way to avoid the massive bottlenecking I saw on single-card setups. Basically, don't expect the full 671B R1 to run locally without an enterprise budget... it's just too heavy for us mortals right now. Anyway, stick to the used 3090s for the 67B and you'll be fine. gl!

JesusGeodo · Answer

Just saw this and honestly dont waste ur cash. I spent way too much on gear before realizing 4-bit quants are the way to go for R1.

Option A: NVIDIA GeForce RTX 3090 24GB (x2)

Faster for coding.

Option B: Apple Mac Mini M4 Pro 64GB

Better VRAM value. Pro tip: Use LM Studio to check quants first. Learned that the hard way SO many times lol. Peace

HazelBrowl · Answer

I have been obsessing over this for the last few weeks while trying to figure out my own setup. Tbh the market is wild right now and it feels like there is a huge divide between the two main ecosystems.

NVIDIA is basically the gold standard because of software support, meaning most DeepSeek libraries are optimized for their chips first.

Apple is really disrupting things with their unified memory architecture which lets you access way more memory for cheaper than top tier enterprise cards. Before you pull the trigger, I have a couple questions to help narrow it down:

How much are you looking to spend in total?

Is your priority super fast response times for coding or just being able to run the biggest possible model version regardless of speed? The market for used parts is also shifting really fast, so it depends if you want brand new gear or if you are okay with hunting for deals...