I'm really itching to try out DeepSeek 67B, but I'm a bit stuck on the hardware side. Since it's a massive model, I know my current setup won't cut it. I’ve been looking into VRAM requirements, and it seems like a single consumer card might not be enough unless I use heavy quantization. Would a dual RTX 3090/4090 setup be the way to go for 48GB of VRAM, or should I be looking at something like a Mac Studio with unified memory? I’m mainly worried about inference speed and making sure I don't run out of memory mid-task. For those of you running it smoothly, what’s the best GPU configuration to handle this model without breaking the bank on enterprise cards?
Honestly, I tried running it on a single card and it was a total disaster. For a model that size, I would suggest going with two NVIDIA GeForce RTX 3090 24GB cards. I have been doing this for years and, unfortunately, anything less than 48GB VRAM basically turns your PC into a space heater with zero output. A Mac Studio is okay, but the dual 3090 setup is way faster for raw inference speeds if you dont mind the power draw. gl!
Honestly, just grab two used NVIDIA GeForce RTX 3090 24GB cards. I had issues with speeds on a Mac, and enterprise stuff is too pricey. Budget for $1400 total and you're golden. gl!
Following this thread
Ok so, I’m still pretty new to the hardware side of things, but have you considered just renting a setup online instead of building it? I’ve been looking into the DIY route versus using a professional cloud service like RunPod or Vast. Basically, if you go DIY with a couple of NVIDIA GeForce RTX 3090 24GB cards, you own it forever which is cool, but the setup seems like a huge headache... like, what about the cooling and the massive power bill? It feels really risky if you aren't a pro at building PCs. On the other hand, you could just rent an NVIDIA A100 Tensor Core GPU for a few cents an hour. It’s sooo much faster than consumer cards and you dont have to worry about running out of VRAM mid-task. But yeah, the downside is you’re basically paying for every minute and your data is on someone else's server. I’m honestly stuck between the two because I reallyyy want the speed but I'm scared of breaking something in a local build. Is the privacy of a local setup worth the extra work for a beginner? Anyway, just something to think about if you dont want to commit to the hardware yet!
Honestly, having a rig capable of this stuff is awesome but the long-term ownership is a whole vibe. Everyone focuses on the VRAM numbers, but nobody mentions the literal noise of the fans spinning up at 2 AM while you're trying to debug. It gets realy loud - like, vacuum cleaner loud - if you don't spend a ton on specialized cooling. I've found that the physical footprint and the heat can realy change how you use your room. Before you commit to a specific path, I’d think about these:
- Do you have a spot for the PC where the noise won't drive you crazy?
- Are you looking for a "set it and forget it" experience, or are you okay with tinkering and cleaning dust out of the chassis every few weeks? It's basically like owning a high-maintenance pet that eats electricity and talks back lol. Totally worth it for the local privacy and speed tho.
Ok so, before I give advice, quick question—are you prioritizing raw generation speed or just making sure it runs without crashing?? In my experience, DeepSeek 67B is literally a beast to tame. Over the years, I've found that running out of VRAM is highkey the biggest risk... so yeah, I'm curious what your actual budget limit is before I suggest a dual NVIDIA GeForce RTX 3090 24GB setup or something even wilder. tbh, it's a bit of a gamble lol.
+1
Unfortunately, reading through these comments brings back some rather frustrating memories from my early days trying to host large-scale models at home. I recall when I first decided to commit to a multi-gpu workstation for local inference. It seemed like such a logical progression at the time. I spent weeks meticulously researching power draw and clearance requirements, convinced that I had engineered the perfect environment for sustained workloads. However, the reality of long-term ownership was not as good as expected. One particular summer, the ambient temperature in my office rose so sharply that my primary system began to throttle after just thirty minutes of generation. It became a whole ordeal where I ended up having to relocate the entire rig to my basement, which then required running nearly fifty feet of cabling just to maintain a stable connection. It was an absolute mess of logistics and unexpected maintenance that I hadnt properly accounted for in my initial enthusiasm. I spent more time troubleshooting my airflow than actually interacting with the models... it is funny how the technical specs on paper never quite capture the sheer physical exhaustion of managing that much hardware in a residential space.