Best dataset format for fine-tuning DeepSeek-R1?

Question

Hey everyone! I’ve been diving deep into the new DeepSeek-R1 release, and I’m absolutely blown away by its reasoning capabilities. I’m currently planning to fine-tune a version of the model (specifically the Llama-8B distilled version) for a niche medical coding project I’m working on. However, I’m hitting a bit of a wall when it comes to the data preparation phase.

Since R1 relies so heavily on Chain of Thought (CoT) and specialized reinforcement learning patterns, I’m wondering what the community has found to be the most effective dataset format. Usually, for standard LLMs, I just stick to the classic `{"instruction": "...", "input": "...", "output": "..."}` or the OpenAI-style chat messages format. But with DeepSeek-R1, I’m curious if we need to explicitly include `` tags or specific reasoning steps within the training examples to maintain that 'thinking' behavior during the fine-tuning process.

I’ve read through the technical report, but it’s a bit dense on the specific formatting for supervised fine-tuning (SFT) versus the RL phase. My dataset consists of about 5,000 high-quality expert demonstrations, and I really don't want to mess up the formatting and 'break' the model's ability to reason logically before it gives an answer. I’m particularly concerned about whether I should be using the ShareGPT format or if a simple Alpaca-style template is enough.

Has anyone here experimented with different JSON structures for R1 yet? Specifically, did you find that including a 'reasoning' field in your JSON helps, or does the model perform better if the reasoning is just baked into the beginning of the 'content' string?

I’d love to hear what worked for you or if there's a standardized template that’s becoming the go-to for these 'reasoning' models. What’s the best dataset format you’ve found to keep the CoT performance high without degrading the final output quality?

NorthernLineDeep · Accepted Answer

> Specifically, did you find that including a 'reasoning' field in your JSON helps, or does the model perform better if the reasoning is just baked into the beginning of the 'content' string? ok so i'm pretty new to this too but i've been playing around with DeepSeek-R1-Distill-Llama-8B and honestly it's kind of a learning curve... like, the technical report says it's all about the reinforcement learning, but for SFT (supervised fine-tuning) i think you really gotta keep those tags. basically, from what i've seen, you should bake the reasoning directly into the 'content' string using `
...

Actual answer` format. I tried a simple Alpaca style first and it lowkey felt like it was losing its "brain" and just answering too fast?? So yeah, i would suggest using the ShareGPT format but make sure your expert demonstrations actually include the reasoning steps inside those tags. If you dont include the `` tags in your training data, the model might stop using them altogether which would be a bummer for medical coding where you need that logic! plus, make sure to check out Axolotl or Unsloth for the actual training - they have some templates that might help. anyway, gl with the project, sounds super cool! 👍

WandsworthParkJog · Answer

sooo i went through this last week with DeepSeek-R1-Distill-Llama-8B and honestly, i was *terrified* of messing up the reasoning logic! i tried two ways: one with a dedicated 'reasoning' field in JSON and another just putting `` tags inside the content string. the tag approach felt way more stable and cost-effective since i didn't have to overhaul my existing scripts. i mean, it just feels safer to let the model 'think' naturally inside the block rather than forcing a new structure. i'm highkey obsessed with keeping things simple cuz i dont wanna waste credits on broken training runs!! it's amazing how much better it flows when you just let it do its thing naturally inside the chat format. plus, the DeepSeek-R1 architecture is sooo sensitive to those tags... definitely something to watch out for! hope that helps lol

JimmyFum · Answer

sooo i've been messing with the DeepSeek-R1-Distill-Llama-8B for a bit now and honestly, I'd actually suggest a different approach than just sticking to the basic formats. i tried the whole 'baked-in' reasoning thing at first and unfortunately, it kinda felt like it was lobotomizing the model's actual logic flow. not as good as expected tbh.

From my expert-ish technical analysis (lol), here is what I found:
* Option A (Alpaca/ShareGPT): These are too simple and the model starts skipping steps.
* Option B (Baked content): Reasoning gets mixed with the answer and it gets *really* messy.
* Option C (Explicit tags): Using `` tags in a specific `reasoning` field is the way to go.

Basically, if you don't wrap the CoT in those tags, the model loses that 'thinking' trigger. I mean, you gotta keep the structure strict if you want those expert demonstrations to actually land. gl!!