Hey everyone, I’ve been really impressed with the benchmarks coming out for DeepSeek 67B lately, especially its performance in coding and logic tasks. I’m finally looking to move away from cloud-based APIs and set up a local rig so I can experiment without worrying about privacy or monthly subscription limits.
However, I’m a bit stuck on the hardware side. Since it’s a 67B parameter model, I know VRAM is the ultimate bottleneck. I’ve been debating between picking up two used RTX 3090s to get that 48GB of total VRAM or just biting the bullet on a single RTX 4090. If I go with 4-bit or 5-bit quantization, will 24GB even be enough to run it at a usable speed, or is a multi-GPU setup mandatory for a model this size? I’m also a little concerned about the power draw and cooling requirements if I end up running dual cards in a standard mid-tower case.
I’ve seen some people mentioning Mac Studio (M2/M3 Ultra) as an alternative, but I’d prefer to stay in the NVIDIA ecosystem if possible for better library support. For those of you already running DeepSeek 67B, what kind of tokens per second are you seeing on your hardware?
What is currently the most cost-effective GPU configuration to get DeepSeek 67B running smoothly with decent inference speeds?
For your situation, ur gonna need way more than 24GB VRAM!
- Background: 67B models basically need 40GB+ for decent 4-bit quantization.
- Why it matters: One NVIDIA GeForce RTX 4090 24GB forces slow offloading to RAM.
- Solution: I went with two used NVIDIA GeForce RTX 3090 24GB cards; ngl, 48GB total is amazing and the best value for fast inference!!
oh man, i totally feel u on this! i remember being so confused when i first started looking at these massive 67B models. basically, a model this size at 4-bit quantization needs about 38-40GB of VRAM just to sit in memory. why it matters is that if u try to run it on a single NVIDIA GeForce RTX 4090 24GB, the software has to offload layers to your regular RAM, and that makes things lowkey painful... like 1 token per second maybe?? honestly not worth the frustration.
here is the most cost-effective solution in my experience:
* two used NVIDIA GeForce RTX 3090 24GB cards. it gives u 48GB total which is plenty of headroom for quants and context.
* get a beefy PSU like the Corsair RM1000x 1000W 80 PLUS Gold cuz those cards are hungry for power.
* a high-airflow case like the Fractal Design Torrent Black RGB TG Light Tint to handle the heat.
in my experience, ive tried many setups and dual 3090s are the best value for stuff like this. i usually see around 8 t/s with DeepSeek. gl with the build!!
Yo, i totally feel u on this! I actually started my local AI journey because i was tired of hitting those monthly subscription limits while working on some coding projects. It was lowkey a huge headache, and i honestly spent weeks researching builds before i finally took the plunge.
In my experience, running a model of that scale is all about the VRAM, and im not 100% sure if a single card will cut it. Here is what i think based on my own trial and error:
1. Basically, for a 67B model at 4-bit quantization, youre looking at needing around 38GB to 40GB of VRAM just to load the weights and have some room for context. If you use a single 24GB card, youll have to offload layers to your system RAM, and honestly, the speed drops to like... 1 or 2 tokens per second. It's super slow.
2. I think the dual GPU setup is your best bet for cost-effectiveness. By combining two 24GB cards, you get that 48GB pool which is basically perfect for 4-bit or even 5-bit quants.
3. Over the years, Ive tried many cooling solutions, and putting two high-power cards in a mid-tower is definitely a challenge. It gets *really* hot, so you might need a high-airflow case or even an open-air frame, right?
Lesson learned: go for the dual cards if you want it to be smooth, but check your power supply first!! Do you think your current PSU can handle like 1000W? That was my biggest mistake lol. Anyway, hope this helps! gl
just saw this thread and wanted to chime in... it is a solid discussion so far. tbh before you pull the trigger on a dual card setup you might want to consider your cooling more seriously. what kind of power supply and case are you actually working with? if youre planning on jamming two beefy cards into a standard mid-tower you might run into some thermal throttling pretty quick. i would suggest just sticking with NVIDIA honestly, you really cant go wrong with their ecosystem for local llms. basically any of their high-end cards will save you a lot of headache compared to trying to hack together support for other brands. just be careful about the total power draw though... two cards can pull way more than you think when theyre both pegged during inference and you dont want to fry anything.
^ This. Also, it is honestly so frustrating how much we have to shell out just to get enough VRAM for these models. I am finally happy with my current setup, but the stress of finding deals that dont break the bank was just too much. It really sucks that we are basically forced into the used market just to avoid spending five figures on enterprise gear. I remember looking at the price of a single NVIDIA GeForce RTX 4090 24GB and just feeling defeated because i knew it wouldnt even be enough memory on its own. Even trying to save cash by hunting for an older NVIDIA Tesla P40 24GB is such a pain because you end up worrying about weird cooling mods and power adapters. The VRAM tax is just brutal for us DIY folks. I am satisfied with how my model runs now, but man, the barrier to entry for something like DeepSeek 67B is just plain exhausting. It felt like I had to spend a fortune on a EVGA SuperNOVA 1000 GT and a massive Phanteks Enthoo Pro just to fit everything, and it still feels like my wallet is crying. I hate how expensive this hobby has become lately.
100% agree
It sounds like everyone is in agreement that 48GB of VRAM is the magic number for DeepSeek 67B. If you try to squeeze it into a single consumer card, the performance just tanks because of the offloading. From what I've seen in market research, here are the main paths people are taking: