
Kimi K2 Thinking API Guide: Building Multi-Step Agents Without Losing Reasoning State

kimi-k2-thinking and kimi-k2.5 support deep reasoning and multi-step tool use, but the dedicated kimi-k2-thinking model keeps thinking forcibly enabled. The same docs also make one implementation rule unusually explicit: keep reasoning_content in the conversation context, or long-horizon tool workflows degrade.TL;DR
- Use
kimi-k2-thinkingwhen you want a dedicated always-thinking model for multi-step agents. - Use
kimi-k2.5when you want a more flexible default that can enable or disable thinking. - Keep
reasoning_contentin context, setmax_tokensto at least16000, keeptemperatureat1.0, and prefer streaming. - Moonshot's reviewed docs clearly support multi-step tool calls, but they do not publish a stable public "300-step" quota on the pages used for this rewrite, so your app should enforce its own loop limits.
What Moonshot's current docs actually confirm
| Question | Current documented answer |
|---|---|
| Which Kimi models support thinking? | kimi-k2-thinking and kimi-k2.5 |
| Which one is the dedicated thinking model? | kimi-k2-thinking |
| Which one is the recommended flexible default? | kimi-k2.5, with thinking enabled by default |
| How is reasoning exposed? | Through the reasoning_content field |
| What matters for multi-step tool use? | Preserve reasoning_content, give the model enough token budget, and keep tool choice compatible with thinking mode |
| What endpoint should you use? | https://api.moonshot.ai/v1 for the international endpoint |
Which Kimi route should you start with?
| If you need... | Start with | Why |
|---|---|---|
| Always-on reasoning for agent workflows | kimi-k2-thinking | It is Moonshot's dedicated thinking model |
| A general-purpose default that can still think | kimi-k2.5 | It is the recommended flexible model in Moonshot's docs |
| Faster thinking-oriented responses via EvoLink | kimi-k2-thinking through api.evolink.ai | EvoLink routes to the fastest available Moonshot endpoint |
| OpenClaw-based deployment | moonshot/kimi-k2-thinking-turbo | OpenClaw's Moonshot provider catalog currently lists a turbo thinking variant |
kimi-k2-thinking in the examples so the reader does not have to reason about one more toggle.The implementation detail most guides miss
reasoning_content, not just in the final content field.That matters because a multi-step agent is not one request. It is a loop:
- The model reasons.
- The model calls a tool.
- Your app executes the tool.
- The model reasons again using the prior tool result.
reasoning_content between turns, the model loses part of the chain it was using to decide what to do next. Moonshot's docs explicitly say to include the entire reasoning content in the context and let the model decide what it still needs.Minimal multi-step agent loop
This example is intentionally small. The point is to show the control flow that matters for Kimi's thinking models.
import json
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.moonshot.ai/v1",
api_key=os.environ["MOONSHOT_API_KEY"],
)
tools = [
{
"type": "function",
"function": {
"name": "search_docs",
"description": "Search internal product documentation",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
}
]
messages = [
{"role": "system", "content": "You are a careful research agent."},
{"role": "user", "content": "Find the API limits for our billing service and summarize the risks."},
]
for _ in range(8):
completion = client.chat.completions.create(
model="kimi-k2-thinking",
messages=messages,
tools=tools,
tool_choice="auto",
temperature=1.0,
max_tokens=16000,
)
message = completion.choices[0].message
# Preserve the assistant turn exactly, including reasoning_content when present.
messages.append(message.model_dump(exclude_none=True))
if not message.tool_calls:
print(message.content)
break
for tool_call in message.tool_calls:
args = json.loads(tool_call.function.arguments)
if tool_call.function.name == "search_docs":
result = {"matches": ["rate_limit=500 rpm", "burst_limit=1000 rpm"]}
else:
result = {"error": "unknown tool"}
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"name": tool_call.function.name,
"content": json.dumps(result),
}
)Four rules that matter more than model marketing
| Rule | Why it matters |
|---|---|
Preserve reasoning_content | This is the main continuity mechanism Moonshot documents for thinking models |
Set max_tokens >= 16000 | Moonshot warns that reasoning tokens and answer tokens share the same budget |
Keep temperature = 1.0 | This is Moonshot's stated best-performance setting for thinking models |
| Prefer streaming | Thinking responses are larger and streaming helps reduce timeout pain |
Two more production notes are worth adding:
- Treat loop length as your policy. The reviewed docs say Kimi supports deep reasoning across multiple tool calls, but they do not expose a stable universal public step quota that should be hard-coded into a blog post.
- Validate tool arguments before you execute side effects. That part is implementation guidance, not a Moonshot guarantee, but it is the difference between a useful agent and an expensive retry loop.
Use Kimi K2 Thinking through EvoLink
api.evolink.ai and use the same model name:from openai import OpenAI
client = OpenAI(
base_url="https://api.evolink.ai/v1",
api_key="YOUR_EVOLINK_API_KEY",
)
completion = client.chat.completions.create(
model="kimi-k2-thinking",
messages=[{"role": "user", "content": "Analyze the tradeoffs of event sourcing vs CRUD."}],
temperature=1.0,
max_tokens=16000,
)reasoning_content preservation rules from the earlier section still apply — EvoLink passes the full response through, so your agent loop works exactly as shown above.Alternative: OpenClaw integration
If your runtime is OpenClaw rather than a direct API or EvoLink gateway, OpenClaw's Moonshot provider docs currently list:
moonshot/kimi-k2.5moonshot/kimi-k2-thinkingmoonshot/kimi-k2-thinking-turbo
The documented onboarding shortcut is:
openclaw onboard --auth-choice moonshot-api-key
openclaw models list
openclaw models set moonshot/kimi-k2-thinking
openclaw models statusOpenClaw also documents binary native thinking control for Moonshot:
/think offdisables Moonshot thinking- any non-off thinking level maps back to
thinking.type=enabled
That is useful if you want one gateway that can switch between a cheaper non-thinking pass and a deeper reasoning pass.
A safer decision framework
| Use case | Better fit |
|---|---|
| Multi-step research agent with tools | kimi-k2-thinking via EvoLink or direct Moonshot API |
| General app assistant that only sometimes needs thinking | kimi-k2.5 via EvoLink |
| OpenClaw deployment that needs a Kimi default model | moonshot/kimi-k2.5 first, then escalate to moonshot/kimi-k2-thinking for harder sessions |
| Tool-heavy workflow where latency matters | Test kimi-k2-thinking through EvoLink's smart routing for automatic failover |
FAQ
Is Kimi K2 Thinking the same as Kimi K2.5?
kimi-k2-thinking as the dedicated thinking model and kimi-k2.5 as the recommended flexible model that has thinking enabled by default.What breaks most multi-step Kimi agents?
reasoning_content, starving the model with too small a max_tokens budget, or building a tool loop that never validates arguments or exits cleanly.Does reasoning_content count toward tokens?
reasoning_content and content must fit inside max_tokens.Should I disable thinking for every simple task?
kimi-k2.5, that can make sense for cost and latency control. If you specifically choose kimi-k2-thinking, the more natural assumption is that the workflow is reasoning-heavy enough to justify always-on thinking.Can I use Kimi K2 Thinking through EvoLink?
https://api.evolink.ai/v1 with your EvoLink API key and use kimi-k2-thinking as the model name. EvoLink handles routing, retry, and failover automatically.Can I use Kimi K2 Thinking in OpenClaw?
moonshot/kimi-k2-thinking as a supported model reference.Where should I publish pricing numbers from?
From Moonshot's live pricing pages, not from third-party benchmark tables or older comparison posts. This rewrite intentionally avoids hard-coded pricing claims because those values change faster than model-behavior guidance.
Try Kimi Through One Gateway
If you want to test Kimi alongside Claude, GPT, and other agent-friendly models without wiring each provider separately, use a gateway layer and verify the currently available routes before you publish a cost comparison.
Compare Agent Models on EvoLink

