How do I train AI agents to handle complex negotiations?

Question

so i’ve been messin around with llm agents for a few years now—mostly standard rag and support bots—but i just hit a massive wall trying to build something way more advanced. i’m working on a b2b supply chain tool for a pilot launch in october and i need these agents to handle actual complex negotiations. its not just about asking for a price it’s about trading off variables like lead times vs bulk discounts vs payment terms.

right now the agent is just too nice?? or it totally loses the plot after three turns and forgets its walk-away price. i’ve tried some basic chain-of-thought and i even did a bit of fine-tuning on llama 3 with some old sales logs but it still feels like it’s just roleplaying a push-over. i really need it to be strategic and actually push back when the 'human' side gets aggressive or tries to lowball.

heres what im looking at right now:

budget: got about $5k for compute and token burn for the testing phase

tech stack: currently on langgraph but open to switching if something else handles long-term state better

timeline: gotta have a solid demo ready in about 6 weeks

should i be looking into reinforcement learning or maybe some kind of game theory framework like monte carlo tree search for the decision tree? i’m worried mcts might be overkill but honestly regular prompting isnt cutting it for these multi-step trade-offs. has anyone actually successfully built a 'hard' negotiator that doesn't just hallucinate concessions? i feel like im missing something obvious in how to model the state...

wpulipqwxm · Accepted Answer

Ive been through this exact nightmare with procurement bots. The niceness is baked into the RLHF, so you really have to fight it. Skip the MCTS for now... its a massive time sink for a 6-week deadline. Instead, use a dual-agent architecture in LangGraph. One agent drafts the response, and a Hard-Nosed Auditor agent reviews it strictly against your walk-away price and variables before anything is sent. In my experience, you need better reasoning than base Llama 3 for these trade-offs. Here is what I would do:

Use OpenAI GPT-4o API 128k Context for the auditor role because it catches subtle lowballs better than most models.

Implement LangChain LangSmith Developer Tier to track every time the agent caves so you can refine the system prompt.

Hard-code your constraints in the LangGraph state schema so they arent just suggestions in a prompt. Keep the state immutable. If the LLM tries to go below the floor, the auditor agent should just trigger a firm No response automatically.

BrewstersCuppa · Answer

Saw this late but I've been down this road. In my experience, letting an LLM handle the final price is a huge liability. I once saw a pilot fail because a bot hallucinated a massive discount on an Intel Xeon Platinum 8480+ 56-Core 2.0GHz. Honestly, just keep your walk-away constraints in a hard-coded lookup table outside the prompt. Its way safer and saves your budget from token bloat.