Ive been hosting LLMs for a while on my 3090 but DeepSeek-V3 is totally crushing my VRAM. Need to upgrade my rig for a big coding project due next month and got about 4k to spend.
What hardware setup are you guys running to actually get decent tokens per second on these bigger DeepSeek weights?
I switched to a Apple Mac Studio M2 Ultra 128GB RAM and honestly the 800GB/s bandwidth works well.
Adding my two cents... i was in a similar spot a few months back. I already had one 3090 but V3 is just a beast. Buying used gear made me nervous, but i ended up grabbing two more NVIDIA GeForce RTX 3090 24GB cards off a local builder. To make sure everything stayed stable i went with an ASUS Pro WS WRX80E-SAGE SE WIFI board and a beefy EVGA SuperNOVA 1600 P+ 80+ Platinum 1600W power supply. Honestly i was scared it would just blow a fuse or something but its been super reliable for me. The cooling was the hardest part tho. I had to get creative with some extra fans but now it just sits there crunching through tokens without any complaints. Most of that 4k budget stayed in my pocket by sticking with the 30 series instead of jumping to the 4090s. Its been a very satisfying project and it handles the larger weights way better than my single card ever could... especially with a decent quant.