What is the best ha...
 
Notifications
Clear all

What is the best hardware for running DeepSeek locally?

6 Posts
7 Users
0 Reactions
603 Views
0
Topic starter

I'm looking to set up a dedicated rig to run DeepSeek models locally for my coding projects. I'm specifically curious about the VRAM requirements for the 67B version versus the newer R1 models. Should I prioritize multiple 3090s or go for a high-end Mac? What's the most cost-effective hardware setup for smooth inference right now?


6 Answers
12

For your situation, I honestly think the hardware landscape for these big models is kinda frustrating right now. I've spent thousands trying to get smooth inference on the full R1, and unfortunately, it's just not as good as expected on consumer gear because of the sheer weight of the parameters.

  • If you want the most bang for your buck, dual NVIDIA GeForce RTX 3090 24GB cards are basically the only way to go. You can find them used for around $750 each. That gives you 48GB VRAM which handles the 67B version at 4-bit quantization pretty easily, but even that setup struggles with the larger R1 models unless you use the distilled ones like the DeepSeek-R1-Distill-Llama-70B.
  • I had issues with the Apple Mac Studio M2 Ultra 192GB RAM despite the massive unified memory pool. The memory bandwidth is great on paper, but the actual token-per-second speed for heavy coding tasks was way lower than my multi-GPU rig, which was super disappointing given the $5,000 price tag.
  • Right now, the most cost-effective sweet spot is a used workstation with a AMD Ryzen Threadripper 3960X and two or three 3090s. It's loud and eats power like crazy, but it's basically the only way to avoid the massive bottlenecking I saw on single-card setups. Basically, don't expect the full 671B R1 to run locally without an enterprise budget... it's just too heavy for us mortals right now. Anyway, stick to the used 3090s for the 67B and you'll be fine. gl!


10

Just saw this and honestly dont waste ur cash. I spent way too much on gear before realizing 4-bit quants are the way to go for R1.

  • Option A: NVIDIA GeForce RTX 3090 24GB (x2)
  • Faster for coding.
  • Option B: Apple Mac Mini M4 Pro 64GB
  • Better VRAM value. Pro tip: Use LM Studio to check quants first. Learned that the hard way SO many times lol. Peace


3

I have been obsessing over this for the last few weeks while trying to figure out my own setup. Tbh the market is wild right now and it feels like there is a huge divide between the two main ecosystems.

  • NVIDIA is basically the gold standard because of software support, meaning most DeepSeek libraries are optimized for their chips first.
  • Apple is really disrupting things with their unified memory architecture which lets you access way more memory for cheaper than top tier enterprise cards. Before you pull the trigger, I have a couple questions to help narrow it down:
  • How much are you looking to spend in total?
  • Is your priority super fast response times for coding or just being able to run the biggest possible model version regardless of speed? The market for used parts is also shifting really fast, so it depends if you want brand new gear or if you are okay with hunting for deals...


2

Works great for me


1

Yo, ngl I'm still learning the ropes with DeepSeek but I've built rigs for years. I think those R1 models are way heavier on VRAM than the 67B ones. IIRC you're gonna need massive memory for smooth inference.

  • dual powerful cards for speed
  • a Mac for the huge memory pool
  • loads of system RAM Honestly, I'd go with the Mac cuz it's usually less of a headache for beginners. gl!


1

Late to the party but I've been running these big quants for years. Ngl, if you want to run the heavy R1 stuff without the headache of heat and power spikes from multiple consumer cards, check out the NVIDIA RTX A6000 48GB. In my experience, single-card stability wins every time when youre deep in a coding session. That 48GB VRAM pool handles the 67B version perfectly and can squeeze in decent quants of the R1 models without breaking a sweat. If you want a quieter life and need to run the larger quants, look for a refurbished Apple Mac Studio M1 Ultra 128GB RAM. I've tried many setups over the years and for VRAM capacity per dollar, Apples unified memory is basically the easiest route for local inference right now. Just keep in mind NVIDIA is still king if you ever want to fine-tune your own models. But for just running DeepSeek while you work? The Mac is dead silent and just works.


Share: