What is the best GP...
 
Notifications
Clear all

What is the best GPU for running DeepSeek locally?

8 Posts
9 Users
0 Reactions
389 Views
0
Topic starter

Hey everyone! I’ve been blown away by the performance of DeepSeek-V3 and R1 lately, but I’m tired of dealing with API rate limits and privacy concerns. I’m looking to build a dedicated local machine to run these models smoothly. I’m mainly eyeing the 70B distilled versions, so I know VRAM is going to be my biggest bottleneck. I'm currently torn between picking up two used RTX 3090s for that sweet 48GB VRAM pool or just going for a single RTX 4090 for the speed. Has anyone here experimented with multi-GPU setups for DeepSeek, or is the performance hit from NVLink/PCIe bandwidth a dealbreaker? What’s the most cost-effective GPU setup for running these larger models without constant quantization issues?


Topic Tags
8 Answers
11

I went through this last year when I was trying to scale up for the bigger Llama models before DeepSeek dropped. Honestly, building a multi-GPU rig is such a headache compared to what the tech influencers make it look like. I spent weeks messing with a dual NVIDIA GeForce RTX 3090 24GB setup and the experience was... well, let's just say it was frustratingly loud and hot. Here is what I found after living with both setups for a while: 1. Dual NVIDIA GeForce RTX 3090 24GB:
- Pros: Having 48GB of VRAM is literally the only way to run DeepSeek-V3 or the 70B R1 quants at decent precision like Q4_K_M. It handles the context window much better too.
- Cons: Unfortunately, the power draw is a beast. I had issues with my old PSU tripping and eventually had to swap to a EVGA SuperNOVA 1600 P2 80+ PLATINUM. Also, without NVLink, the PCIe bandwidth bottleneck is real if youre using a cheap motherboard. 2. Single NVIDIA GeForce RTX 4090 24GB:
- Pros: The speed is actually insane. It is so much faster for smaller models.
- Cons: 24GB is basically useless for the 70B models unless you wanna use heavy quantization like IQ2_S, which makes the model way more "stupid" than it should be imo. Not as good as expected for the high price tag. 3. Professional NVIDIA RTX A6000 48GB:
- Pros: Single card, 48GB VRAM, low power draw.
- Cons: The price is just painful... i couldnt justify it even with years of experience lol. Basically, I ended up sticking with the dual 3090s but it's been a constant battle with fan curves and room temps... kinda sucks but its the price we pay for local AI i guess!


11

Seconding the recommendation above. Honestly, I've been there with the multi-GPU headache and it was NOT as good as expected at first because of the heat issues I ran into. But if your on a budget, it's basically the only way to get DeepSeek-V3 or R1 running well without spending $5k+ on professional gear. Buying a single NVIDIA GeForce RTX 4090 24GB is fast, but 24GB is just not enough VRAM for the 70B models. You'd have to use heavy quantization, and the quality drop is honestly pretty disappointing... it just doesn't feel as smart as the full model. Here's how I'd do it to save money:
* Grab two used NVIDIA GeForce RTX 3090 24GB cards for about $750 each on eBay.
* Get a massive PSU, at least a Corsair RM1200x 1200 Watt 80 Plus Gold Fully Modular PSU.
* Use a case with a lot of space, like the Fractal Design Meshify 2 XL Black ATX Tower. Unfortunately, it takes a lot of tinkering to get the drivers and cooling right, but having that 48GB VRAM pool is literally a game changer for local LLMs. Good luck tho!


3

Honestly, as someone who’s lived with a dual 3090 setup for over a year, the biggest long-term issue is the VRAM thermal maintenance. Those GDDR6X chips on the back of the 3090 get insanely hot during long DeepSeek-R1 sessions. If ur buying used, u basically HAVE to repad them with something like GELID Solutions GP-Ultimate pads or they'll throttle and kill ur speeds after 20 mins of use. A few expert tips for the long haul:
- Don't just look at the GPUs. You need a mobo with proper PCIe spacing and lane distribution like the ASUS Pro WS X570-ACE. Without a PLX chip or 8x/8x lanes, the bottleneck between cards will make the 48GB pool feel way slower than it should.
- If u have the budget, hunting for a used NVIDIA RTX A6000 (Ampere) is much better for reliability. It’s 48GB on one card, so zero NVLink/bandwidth headaches and it draws way less power than two 3090s.
- Idk if u've looked at power, but get a dedicated 20A circuit if ur running this in an old house—dual 3090s plus a high-end CPU will trip a standard breaker sooo fast.


2

For your situation, I would suggest going with two used RTX 3090 cards. I’ve been tinkering with a similar build lately and having that 48GB pool is a total game changer for the 70B models. WARNING: You gotta be really careful about your power supply and case airflow before you commit to this.
- Youll basically need a high-end 1200W+ PSU because those cards have crazy power spikes.
- If you dont have enough space between the PCIe slots, the top card is gonna choke and get way too hot.
- Make sure ur case is actually big enough for two triple-slot cards, cuz they're literally massive!! Honestly, i’m super satisfied with how my dual setup handles DeepSeek. I was also worried about the PCIe bandwidth speed, but for inference, it actually doesnt seem to matter as much as people say. Having 48GB of VRAM is basically mandatory if your trying to avoid the heavy quantization that makes the models act all weird and dumb. If you try to squeeze a 70B model onto a single RTX 4090, the quality drop is pretty noticeable. I’m still kinda new to all the technical fine-tuning, but i've found that VRAM capacity is king every time. The tokens per second are good enough for reading, and it's way more cost-effective than buying a single expensive new card that cant even fit the whole model. Are you planning on getting a specific motherboard to handle that dual-card spacing??


2

tbh I’ve been doing some research on the market side of this too because I'm also pretty new to all this! From what I can tell: - Go with NVIDIA, you basically can't go wrong because they are the industry standard. It seems like all the DeepSeek guides are written specifically for their tech, so it feels like the safest way to avoid annoying software bugs and driver issues.
- Have you looked at any gear from Apple? I keep seeing people say their unified memory is reallyyy helpful for those huge 70B models because of how it handles memory, but I'm not sure if it's actually better for a dedicated home build.
- AMD is an option too, but as a beginner, I’d be worried that the setup might be way too complicated and I'd get stuck without help? Honestly, I’m just leaning towards staying with the most popular brand to be safe. Do you think the software support is more important than the raw hardware specs? Idk, I'm just reallyyy trying not to buy something that won't work!


1

Honestly, I went through the same dilemma a few months ago when I was trying to get some of the larger DeepSeek models running for my dev work. I initially looked at professional pre-built workstations from places like Lambda or Puget, but the markups are just INSANE for a hobbyist, right? So I decided to go the full DIY route. What worked for me was hunting down blower-style cards. If you're doing the dual GPU thing, those open-fan cards just dump heat on each other. I ended up grabbing two used NVIDIA GeForce RTX 3090 Turbo blower cards and putting them in a Lian Li O11 Dynamic EVO XL. It’s a bit of a jet engine when it's crunching through a long prompt, but the temps stay way more stable than the triple-fan gaming cards... usually. One thing I didn't realize until I was halfway through the build was the motherboard. You REALLY need to check your PCIe lane distribution. I had to swap to a ASUS Pro WS WRX80E-SAGE SE WIFI II just to make sure I wasn't bottlenecking the data transfer between the cards. It was a steep learning curve and basically a total headache at times, but having that local setup running 24/7 without a subscription is SO worth it, you know?


1

Same setup here, love it


1

Following this thread


Share: