What is the best ha...
 
Notifications
Clear all

What is the best hardware for running DeepSeek models locally?

8 Posts
9 Users
0 Reactions
430 Views
0
Topic starter

I've been blown away by the performance of DeepSeek-V3 and R1 lately, but I really want to move away from the API and run them locally for privacy. My current RTX 3060 is definitely hitting a wall with the larger parameter versions. I'm trying to figure out if it's better to invest in a multi-GPU setup like dual 3090s for that 48GB VRAM, or if going the Mac Studio route with unified memory is the smarter play for these specific models. Given the high token usage, I'm also worried about inference speed. What hardware configuration are you guys using to get smooth performance on DeepSeek without breaking the bank?


8 Answers
12

For your situation, i'd suggest dual NVIDIA GeForce RTX 3090 24GB Graphics Card units for speed vs the Apple Mac Studio M2 Ultra with 128GB Unified Memory for capacity. gpus have way better tokens/sec but memory is limited. the mac is basically easier for huge models but slower. honestly, i would suggest the dual 3090s for r1 distilled... just make sure to check your psu first cuz it gets hot!! gl


10

Seconding the 3090s! To save cash, I'm satisfied with: - Fractal Design Meshify 2 XL
- EVGA SuperNOVA 1600 G+ PSU Works well and handles dual cards easily!!


3

Quick question - are you planning to run the full-fat 671B parameter DeepSeek-V3 or are you sticking to the "distilled" versions like the 32B or 70B? Honestly, that makes a huge difference because the big models are absolute VRAM monsters. Before I give advice, here's how I see the technical side:
1. **VRAM capacity:** Even with the dual 3090s mentioned earlier, you're only at 48GB. That's perfect for 70B models, but the full R1 needs way more unless you offload to system RAM (which is slow af).
2. **Budget alternative:** If you're okay with slower speeds but want tons of VRAM, look at the NVIDIA Tesla P40 24GB. You can find them used for cheap and chain 3 or 4 together for massive capacity.
3. **The Mac Floor:** For the larger models, you'd really need the Mac Studio M2 Ultra 192GB specifically to avoid memory errors. What's your target model size?? That'll help me give a better recommendation!!


3

Works great for me


2

Basically, if you're looking at the market right now, there's a HUGE divide between 'consumer gaming' and 'prosumer AI' builds. Since the 3090s were already mentioned, I’d look into these alternatives: - Check out the **DeepSeek VRAM usage spreadsheets** on GitHub. They're basically ESSENTIAL for planning a build so you don't overspend on memory you won't actually saturate, right?
- Consider the AMD Radeon RX 7900 XTX. You get 24GB VRAM and while ROCm isn't as plug-and-play as CUDA, the price-to-performance is actually insane for local LLMs lately, though I'm not 100% sure how it handles the specific R1 kernels just yet.
- If you have the budget, hunt for a used NVIDIA RTX A6000. Having 48GB on one card is way cleaner than dealing with the heat and massive PSU requirements of dual cards. Anyway, NVIDIA is still the king for raw speed but AMD is catching up fast if you want pure VRAM capacity on a budget. Just make sure to check if Ollama or your preferred backend has full support for your choice first!


2

late to the party but honestly merlin is spot on about that divide. it drives me crazy how we have to jump through so many hoops just to get decent vram these days. ngl nvidia is clearly just gatekeeping memory to push people toward their enterprise stuff that costs as much as a car. its such a scam tbh. and dont even get me started on apple. yeah the unified memory is nice but the markup they charge for extra ram is basically highway robbery at this point. i feel your pain tho... trying to run these deepseek models without spending ten grand is a total nightmare. it feels like the hardware companies are actively trying to stop us from running local llms unless we have a massive corporate budget. its just frustrating seeing the software move so fast while the hardware market stays greedy and stagnant. drives me absolutely insane.


2

TIL! Thanks for sharing


1

Unfortunately, I've found that even high-end consumer setups struggle with the memory bandwidth needed for DeepSeek-V3. I tried a multi-GPU rig last month and the latency was just depressing compared to enterprise gear. Before you drop thousands, I need to know:

  • Are you aiming for high tokens per second for real-time chat, or just batch processing where speed matters less?
  • Whats your hard ceiling on power draw and heat, since these rigs basically turn into space heaters? Reminds me of when I tried to build a dedicated render farm back in 2018. I spent weeks cable managing everything perfectly, only to realize my apartments old wiring couldnt handle the load. Every time I hit render, the kitchen lights would flicker. It was a total disaster and I ended up having to run extension cords from the bathroom just to keep the system from tripping the breaker. My roommate was furious because he couldnt use the hairdryer while I was working. Tbh it was such a mess. Anyway, but yeah, the hardware landscape right now is pretty frustrating for these massive models.


Share: