What is the best GP...
 
Notifications
Clear all

What is the best GPU to run DeepSeek 67B locally?

5 Posts
6 Users
0 Reactions
292 Views
0
Topic starter

Ive been spending way too much money on API calls for my coding side projects lately and finally decided its time to just build a dedicated rig for local LLMs. Ive been looking specifically at DeepSeek 67B because the coding performance looks insane compared to Llama 3 for my specific workflow. I did some digging on Reddit and some people say a single RTX 4090 is enough if you use 4-bit quantization but then others are claiming you really need dual 3090s to get any decent speed or to run it at higher precision without it crawling at 1 token per second. Im really torn because my budget is around 2300 Euro and I'm based in Berlin so electricity is pretty pricey here... running two older 3090s sounds like a space heater and a power hog. I also saw some folks mentioning the Mac Studio with M2 Ultra but thats way out of my price range for the RAM I would need. If I go the 4090 route will I regret the 24GB VRAM limit almost immediately? Or should I look into those used Tesla cards? Its all a bit overwhelming trying to figure out the actual VRAM math for DeepSeek specifically when you factor in the context window. What are you guys actually using to get smooth performance out of this model?


5 Answers
11

Just saw this. Over the years I've realized VRAM math is a lie once context kicks in. Quick question tho: what's your target context length? I tried a single 4090 and it crawled once I fed it a large file.

  • Hunt for a used NVIDIA RTX A6000 48GB. Its more efficient than dual 3090s for Berlin.
  • Get a Seasonic PRIME TX-1000 1000W 80+ Titanium to save your bill.


10

Definitely grab two used NVIDIA GeForce RTX 3090 24GB GDDR6X cards! That 48GB total VRAM is amazing for 4-bit quants and fits your budget perfectly, unlike a single 4090!


3

I went through this exact same headache last year when I started hosting my own coding assistants. I started with just one high-end card thinking 24GB would be plenty, but I hit a wall fast once my context window grew. To run this model comfortably, you're likely gonna need more than one card.

  • Low quantization fits, but I noticed it lost nuance for logic.
  • Speed was snappy at first, but dropped off when my context hit 8k.
  • One card is okay on power, but the room definitely got warmer. tbh I eventually caved and grabbed a second used unit to pool the memory because that VRAM limit is very real. It definitely bumped up my electric bill tho, which is something to watch out for if youre in a place with high rates. It gets pretty loud and the heat is honestly no joke.


3

Same here!


2

I've spent several months testing hardware configurations for DeepSeek 67B and unfortunately, the reality of local hosting is quite frustrating. My experience with a single card was particularly underwhelming once the context grew. It just wasnt as good as expected.

  • NVIDIA GeForce RTX 4090 24GB: The inference speed is high, but the 24GB memory buffer is a severe limitation for serious coding. I found it constantly hitting VRAM ceilings during long sessions.
  • NVIDIA RTX A6000 48GB: This card offers the necessary memory overhead, yet I was disappointed by the actual performance value for the cost. It underperformed in terms of raw tokens per second given the price.
  • ASUS TUF Gaming GeForce RTX 3090 24GB: A dual-card setup provided sufficient VRAM, but the thermal output and power consumption were not as manageable as I'd hoped for a residential office in Berlin. Its disheartening that even with a decent budget, there isnt a perfect path forward. I felt like I was spending more time managing hardware than writing code...


Share: