I'm in a bit of a panic because I have to get this support bot live for my client by tomorrow morning and the output from DeepSeek v4 Flash is just... weird. It keeps hallucinating policies that don't exist even though I gave it the context. I've been digging through some forums and I saw a couple of people on Reddit saying that you have to use a very specific persona but then I saw a GitHub thread saying that for the Flash version you actually shouldn't use personas because it wastes tokens and makes it more likely to drift off track.
I tried some of the generic system prompts from the official docs but it still feels too robotic or it gives these massive 500-word answers when the user just asked about shipping costs. I'm based in London and trying to keep the API spend under like 40 or 50 quid for the month since it's a small shop. I'm really struggling to find the sweet spot for the system instructions to keep it concise but accurate.
Should I be using XML tags for the rules or just a bulleted list? I read that XML helps the flash models stay focused but then someone else said that's only for the bigger v4 models and it might confuse the lightweight one. Does anyone have a reliable system prompt they use specifically for the v4 Flash model to keep it snappy and grounded? I really need to get this sorted tonight or I'm gonna be in trouble with the client...
Just saw this. Honestly, was pretty satisfied with the output once I stripped back the persona stuff. A similar crunch happened to me last week and sticking to simple hyphenated lists instead of XML works well. Quick tip: put a hard response limit upfront so it dont ramble. Benchmarking this on my Apple MacBook Pro M3 Max 14-inch 128GB RAM showed it keeps everything snappy. Let me know if you need help!