What is the best GP...
 
Notifications
Clear all

What is the best GPU for running Deepseek locally?

4 Posts
5 Users
0 Reactions
406 Views
0
Topic starter

Ive been trying to get Deepseek running for my dev work but my old 2060 just isnt cutting it anymore its way too slow. Im looking to upgrade by this weekend since I have a big project starting Monday. I found a used 3090 for like 700 bucks on marketplace which has the 24gb vram everyone says you need but then Im also looking at a new 4080 super for the warranty and better speed. Or should I just bite the bullet and go for a 4090? My case is kind of a tight squeeze and my budget is strictly under 1300 so the 4090 might be pushing it honestly. What do you guys think is the sweet spot for Deepseek specifically?


Topic Tags
4 Answers
11

Adding my two cents, I had some issues with big cards getting way too hot in my small case, it was a total letdown... if you want reliability without going broke, maybe try the ASUS Dual GeForce RTX 4070 Ti Super 16GB GDDR6X. It fits better and wont kill your budget. Check these for help:

  • Hugging Face for smaller quantized models
  • r/LocalLLM for setup guides Honestly, dont sleep on the 16gb cards for dev work...


10

Honestly, I'd say grab that 3090 but be super careful. I bought a used card once and it literally started smoking after an hour of heavy inference... never again without testing first. For Deepseek, that 24GB of VRAM is kind of non-negotiable if you want the bigger models to run smoothly without dragging.

  • NVIDIA GeForce RTX 3090 24GB GDDR6X: Best for VRAM but it draws a ton of power. Make sure your PSU can handle the spikes.
  • NVIDIA GeForce RTX 4080 Super 16GB GDDR6X: Way more efficient, but you'll hit a wall with model size pretty quick. 16GB feels cramped for large LLMs. Since your budget is strictly under 1300, a new 4090 is definitely out. If you go with the used 3090, just make sure to run a stress test like FurMark first. Also measure your case twice because these cards are massive and might not fit if your setup is already a tight squeeze.


3

^ This. Also, i went through a similar headache and ended up grabbing the ZOTAC Gaming GeForce RTX 4080 Super Trinity OC 16GB because i needed something reliable that wouldnt catch fire. Honestly, Im super satisfied with how it handles Deepseek. It runs cool and the drivers are rock solid, which is a huge relief when youve got deadlines.

  • Use 4-bit quantization to keep memory usage low while keeping high performance
  • Measure your case twice for the cable bend, those new connectors are stiff Its basically the perfect middle ground for dev work if you dont want the risk of a used 3090 or the massive size of most 4090s.


2

Adding my two cents since I went through this nightmare last month. Honestly, I was pretty disappointed with how my initial build handled local inference. I tried squeezing a high-end card into a mid-tower and the thermal throttling was just brutal... the performance dropped by like 30 percent after just ten minutes of generation. Its frustrating because everyone talks about the VRAM but nobody mentions how these things turn your room into a literal sauna. I had issues with the fan noise too, it was so loud I couldnt even think while coding. Instead of a single massive card, I actually experimented with a multi-GPU setup using two smaller cards to spread the heat load. It wasnt as good as I expected for gaming, but for stuff like Deepseek, it actually worked out okay since I could split the model weights. Specifically, I messed around with two MSI GeForce RTX 4060 Ti Ventus 3X 16G OC units. The 32GB total VRAM was a lifesaver, but the setup was a total pain to configure and the PCIe lanes on my board ended up being a bottleneck. Before you commit to that marketplace deal or a new card, which specific Deepseek model size are you actually aiming for? If you want the full 67B parameters with a decent context window, even 24GB is gonna feel cramped pretty fast. Also, what kind of power supply are you rocking? If you dont have enough overhead, youre gonna have a bad time when the GPU spikes during heavy inference.


Share: