Best system prompts for optimizing DeepSeek v4 Flash output?

Question

I'm in a bit of a panic because I have to get this support bot live for my client by tomorrow morning and the output from DeepSeek v4 Flash is just... weird. It keeps hallucinating policies that don't exist even though I gave it the context. I've been digging through some forums and I saw a couple of people on Reddit saying that you have to use a very specific persona but then I saw a GitHub thread saying that for the Flash version you actually shouldn't use personas because it wastes tokens and makes it more likely to drift off track.

I tried some of the generic system prompts from the official docs but it still feels too robotic or it gives these massive 500-word answers when the user just asked about shipping costs. I'm based in London and trying to keep the API spend under like 40 or 50 quid for the month since it's a small shop. I'm really struggling to find the sweet spot for the system instructions to keep it concise but accurate.

Should I be using XML tags for the rules or just a bulleted list? I read that XML helps the flash models stay focused but then someone else said that's only for the bigger v4 models and it might confuse the lightweight one. Does anyone have a reliable system prompt they use specifically for the v4 Flash model to keep it snappy and grounded? I really need to get this sorted tonight or I'm gonna be in trouble with the client...

jxfludltid · Accepted Answer

Regarding what #1 said about "Just saw this. Honestly, was pretty satisfied with..."I definitely agree with stripping out the persona. I was super happy with how my latest deployment for a logistics firm turned out once I stopped trying to make the bot sound like a person and just made it a tool. I managed to keep my monthly spend around 35 quid using DeepSeek-V3 API for the complex logic and Flash for the basic Q&A. To fix the hallucinations, I stopped using fancy XML tags and went with a strict Grounding block. Basically, I tell it You are a factual assistant. Use ONLY the provided context. It sounds boring but it works way better for the lightweight models. Few things that worked well for me:Use simple headers like RULES to separate your instructions from the user data.Put a Constraint: max 30 words at the very end of the prompt.Give it two Good examples and one Bad example of a response. I've been using LangSmith Tracing to catch those weird 500-word rants before they hit the customer. Honestly, keeping the prompt under 400 tokens total is the sweet spot for v4 Flash. It keeps it snappy and keeps the bill low. Dont overthink the formatting tho... simple lists are usually enough. Good luck with the client... hopefully you get some sleep.

rzhpywmqrg · Answer

Just saw this. Honestly, was pretty satisfied with the output once I stripped back the persona stuff. A similar crunch happened to me last week and sticking to simple hyphenated lists instead of XML works well. Quick tip: put a hard response limit upfront so it dont ramble. Benchmarking this on my Apple MacBook Pro M3 Max 14-inch 128GB RAM showed it keeps everything snappy. Let me know if you need help!