Early Access to Seedance 2.0 APIGet Started
Kimi K2 Thinking API Guide: Building Multi-Step Agents Without Losing Reasoning State
guide

Kimi K2 Thinking API Guide: Building Multi-Step Agents Without Losing Reasoning State

EvoLink Team
EvoLink Team
Product Team
March 29, 2026
8 min read
If you want to build a real multi-step agent with Kimi K2 Thinking, the most important detail is not "does it support tools?" It is whether your app preserves the model's reasoning state across turns.
Moonshot's current thinking-model docs say both kimi-k2-thinking and kimi-k2.5 support deep reasoning and multi-step tool use, but the dedicated kimi-k2-thinking model keeps thinking forcibly enabled. The same docs also make one implementation rule unusually explicit: keep reasoning_content in the conversation context, or long-horizon tool workflows degrade.

TL;DR

  • Use kimi-k2-thinking when you want a dedicated always-thinking model for multi-step agents.
  • Use kimi-k2.5 when you want a more flexible default that can enable or disable thinking.
  • Keep reasoning_content in context, set max_tokens to at least 16000, keep temperature at 1.0, and prefer streaming.
  • Moonshot's reviewed docs clearly support multi-step tool calls, but they do not publish a stable public "300-step" quota on the pages used for this rewrite, so your app should enforce its own loop limits.

What Moonshot's current docs actually confirm

QuestionCurrent documented answer
Which Kimi models support thinking?kimi-k2-thinking and kimi-k2.5
Which one is the dedicated thinking model?kimi-k2-thinking
Which one is the recommended flexible default?kimi-k2.5, with thinking enabled by default
How is reasoning exposed?Through the reasoning_content field
What matters for multi-step tool use?Preserve reasoning_content, give the model enough token budget, and keep tool choice compatible with thinking mode
What endpoint should you use?https://api.moonshot.ai/v1 for the international endpoint

Which Kimi route should you start with?

If you need...Start withWhy
Always-on reasoning for agent workflowskimi-k2-thinkingIt is Moonshot's dedicated thinking model
A general-purpose default that can still thinkkimi-k2.5It is the recommended flexible model in Moonshot's docs
Faster thinking-oriented responses via EvoLinkkimi-k2-thinking through api.evolink.aiEvoLink routes to the fastest available Moonshot endpoint
OpenClaw-based deploymentmoonshot/kimi-k2-thinking-turboOpenClaw's Moonshot provider catalog currently lists a turbo thinking variant
The practical rule is simple: if the article intent is specifically Kimi K2 Thinking API, use kimi-k2-thinking in the examples so the reader does not have to reason about one more toggle.

The implementation detail most guides miss

Moonshot exposes the model's reasoning in reasoning_content, not just in the final content field.

That matters because a multi-step agent is not one request. It is a loop:

  1. The model reasons.
  2. The model calls a tool.
  3. Your app executes the tool.
  4. The model reasons again using the prior tool result.
If your app drops reasoning_content between turns, the model loses part of the chain it was using to decide what to do next. Moonshot's docs explicitly say to include the entire reasoning content in the context and let the model decide what it still needs.

Minimal multi-step agent loop

This example is intentionally small. The point is to show the control flow that matters for Kimi's thinking models.

import json
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.moonshot.ai/v1",
    api_key=os.environ["MOONSHOT_API_KEY"],
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_docs",
            "description": "Search internal product documentation",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            }
        }
    }
]

messages = [
    {"role": "system", "content": "You are a careful research agent."},
    {"role": "user", "content": "Find the API limits for our billing service and summarize the risks."},
]

for _ in range(8):
    completion = client.chat.completions.create(
        model="kimi-k2-thinking",
        messages=messages,
        tools=tools,
        tool_choice="auto",
        temperature=1.0,
        max_tokens=16000,
    )

    message = completion.choices[0].message

    # Preserve the assistant turn exactly, including reasoning_content when present.
    messages.append(message.model_dump(exclude_none=True))

    if not message.tool_calls:
        print(message.content)
        break

    for tool_call in message.tool_calls:
        args = json.loads(tool_call.function.arguments)

        if tool_call.function.name == "search_docs":
            result = {"matches": ["rate_limit=500 rpm", "burst_limit=1000 rpm"]}
        else:
            result = {"error": "unknown tool"}

        messages.append(
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": tool_call.function.name,
                "content": json.dumps(result),
            }
        )

Four rules that matter more than model marketing

RuleWhy it matters
Preserve reasoning_contentThis is the main continuity mechanism Moonshot documents for thinking models
Set max_tokens >= 16000Moonshot warns that reasoning tokens and answer tokens share the same budget
Keep temperature = 1.0This is Moonshot's stated best-performance setting for thinking models
Prefer streamingThinking responses are larger and streaming helps reduce timeout pain

Two more production notes are worth adding:

  • Treat loop length as your policy. The reviewed docs say Kimi supports deep reasoning across multiple tool calls, but they do not expose a stable universal public step quota that should be hard-coded into a blog post.
  • Validate tool arguments before you execute side effects. That part is implementation guidance, not a Moonshot guarantee, but it is the difference between a useful agent and an expensive retry loop.
The simplest way to access Kimi K2 Thinking without wiring Moonshot credentials directly is through EvoLink's OpenAI-compatible gateway. Point your existing OpenAI SDK client at api.evolink.ai and use the same model name:
from openai import OpenAI

client = OpenAI(
    base_url="https://api.evolink.ai/v1",
    api_key="YOUR_EVOLINK_API_KEY",
)

completion = client.chat.completions.create(
    model="kimi-k2-thinking",
    messages=[{"role": "user", "content": "Analyze the tradeoffs of event sourcing vs CRUD."}],
    temperature=1.0,
    max_tokens=16000,
)
EvoLink handles provider routing, retry, and failover. The same reasoning_content preservation rules from the earlier section still apply — EvoLink passes the full response through, so your agent loop works exactly as shown above.

Alternative: OpenClaw integration

If your runtime is OpenClaw rather than a direct API or EvoLink gateway, OpenClaw's Moonshot provider docs currently list:

  • moonshot/kimi-k2.5
  • moonshot/kimi-k2-thinking
  • moonshot/kimi-k2-thinking-turbo

The documented onboarding shortcut is:

openclaw onboard --auth-choice moonshot-api-key
openclaw models list
openclaw models set moonshot/kimi-k2-thinking
openclaw models status

OpenClaw also documents binary native thinking control for Moonshot:

  • /think off disables Moonshot thinking
  • any non-off thinking level maps back to thinking.type=enabled

That is useful if you want one gateway that can switch between a cheaper non-thinking pass and a deeper reasoning pass.

A safer decision framework

Use caseBetter fit
Multi-step research agent with toolskimi-k2-thinking via EvoLink or direct Moonshot API
General app assistant that only sometimes needs thinkingkimi-k2.5 via EvoLink
OpenClaw deployment that needs a Kimi default modelmoonshot/kimi-k2.5 first, then escalate to moonshot/kimi-k2-thinking for harder sessions
Tool-heavy workflow where latency mattersTest kimi-k2-thinking through EvoLink's smart routing for automatic failover

FAQ

Is Kimi K2 Thinking the same as Kimi K2.5?

No. Moonshot's current docs describe kimi-k2-thinking as the dedicated thinking model and kimi-k2.5 as the recommended flexible model that has thinking enabled by default.

What breaks most multi-step Kimi agents?

Dropping reasoning_content, starving the model with too small a max_tokens budget, or building a tool loop that never validates arguments or exits cleanly.

Does reasoning_content count toward tokens?

Yes. Moonshot's docs say the combined tokens from reasoning_content and content must fit inside max_tokens.

Should I disable thinking for every simple task?

If you are using kimi-k2.5, that can make sense for cost and latency control. If you specifically choose kimi-k2-thinking, the more natural assumption is that the workflow is reasoning-heavy enough to justify always-on thinking.
Yes. Point your OpenAI SDK at https://api.evolink.ai/v1 with your EvoLink API key and use kimi-k2-thinking as the model name. EvoLink handles routing, retry, and failover automatically.

Can I use Kimi K2 Thinking in OpenClaw?

Yes. OpenClaw's Moonshot provider page currently lists moonshot/kimi-k2-thinking as a supported model reference.

Where should I publish pricing numbers from?

From Moonshot's live pricing pages, not from third-party benchmark tables or older comparison posts. This rewrite intentionally avoids hard-coded pricing claims because those values change faster than model-behavior guidance.

Try Kimi Through One Gateway

If you want to test Kimi alongside Claude, GPT, and other agent-friendly models without wiring each provider separately, use a gateway layer and verify the currently available routes before you publish a cost comparison.

Compare Agent Models on EvoLink

Sources

Ready to Reduce Your AI Costs by 89%?

Start using EvoLink today and experience the power of intelligent API routing.