guide

Kimi K2 Thinking API Guide: Building Multi-Step Agents Without Losing Reasoning State

Name: EvoLink AI Model API Platform
Brand: EvoLink
Availability: InStock

EvoLink Team

Product Team

March 29, 2026

8 min read

If you want to build a real multi-step agent with Kimi K2 Thinking, the most important detail is not "does it support tools?" It is whether your app preserves the model's reasoning state across turns.

Moonshot's current thinking-model docs say both kimi-k2-thinking and kimi-k2.5 support deep reasoning and multi-step tool use, but the dedicated kimi-k2-thinking model keeps thinking forcibly enabled. The same docs also make one implementation rule unusually explicit: keep reasoning_content in the conversation context, or long-horizon tool workflows degrade.

TL;DR

Use kimi-k2-thinking when you want a dedicated always-thinking model for multi-step agents.
Use kimi-k2.5 when you want a more flexible default that can enable or disable thinking.
Keep reasoning_content in context, set max_tokens to at least 16000, keep temperature at 1.0, and prefer streaming.
Moonshot's reviewed docs clearly support multi-step tool calls, but they do not publish a stable public "300-step" quota on the pages used for this rewrite, so your app should enforce its own loop limits.

What Moonshot's current docs actually confirm

Question	Current documented answer
Which Kimi models support thinking?	`kimi-k2-thinking` and `kimi-k2.5`
Which one is the dedicated thinking model?	`kimi-k2-thinking`
Which one is the recommended flexible default?	`kimi-k2.5`, with thinking enabled by default
How is reasoning exposed?	Through the `reasoning_content` field
What matters for multi-step tool use?	Preserve `reasoning_content`, give the model enough token budget, and keep tool choice compatible with thinking mode
What endpoint should you use?	`https://api.moonshot.ai/v1` for the international endpoint

Which Kimi route should you start with?

If you need...	Start with	Why
Always-on reasoning for agent workflows	`kimi-k2-thinking`	It is Moonshot's dedicated thinking model
A general-purpose default that can still think	`kimi-k2.5`	It is the recommended flexible model in Moonshot's docs
Faster thinking-oriented responses via EvoLink	`kimi-k2-thinking` through `api.evolink.ai`	EvoLink routes to the fastest available Moonshot endpoint
OpenClaw-based deployment	`moonshot/kimi-k2-thinking-turbo`	OpenClaw's Moonshot provider catalog currently lists a turbo thinking variant

The practical rule is simple: if the article intent is specifically Kimi K2 Thinking API, use kimi-k2-thinking in the examples so the reader does not have to reason about one more toggle.

The implementation detail most guides miss

Moonshot exposes the model's reasoning in reasoning_content, not just in the final content field.

That matters because a multi-step agent is not one request. It is a loop:

The model reasons.
The model calls a tool.
Your app executes the tool.
The model reasons again using the prior tool result.

If your app drops reasoning_content between turns, the model loses part of the chain it was using to decide what to do next. Moonshot's docs explicitly say to include the entire reasoning content in the context and let the model decide what it still needs.

Minimal multi-step agent loop

This example is intentionally small. The point is to show the control flow that matters for Kimi's thinking models.

import json
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.moonshot.ai/v1",
    api_key=os.environ["MOONSHOT_API_KEY"],
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_docs",
            "description": "Search internal product documentation",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            }
        }
    }
]

messages = [
    {"role": "system", "content": "You are a careful research agent."},
    {"role": "user", "content": "Find the API limits for our billing service and summarize the risks."},
]

for _ in range(8):
    completion = client.chat.completions.create(
        model="kimi-k2-thinking",
        messages=messages,
        tools=tools,
        tool_choice="auto",
        temperature=1.0,
        max_tokens=16000,
    )

    message = completion.choices[0].message

    # Preserve the assistant turn exactly, including reasoning_content when present.
    messages.append(message.model_dump(exclude_none=True))

    if not message.tool_calls:
        print(message.content)
        break

    for tool_call in message.tool_calls:
        args = json.loads(tool_call.function.arguments)

        if tool_call.function.name == "search_docs":
            result = {"matches": ["rate_limit=500 rpm", "burst_limit=1000 rpm"]}
        else:
            result = {"error": "unknown tool"}

        messages.append(
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": tool_call.function.name,
                "content": json.dumps(result),
            }
        )

Four rules that matter more than model marketing

Rule	Why it matters
Preserve `reasoning_content`	This is the main continuity mechanism Moonshot documents for thinking models
Set `max_tokens >= 16000`	Moonshot warns that reasoning tokens and answer tokens share the same budget
Keep `temperature = 1.0`	This is Moonshot's stated best-performance setting for thinking models
Prefer streaming	Thinking responses are larger and streaming helps reduce timeout pain

Two more production notes are worth adding:

Treat loop length as your policy. The reviewed docs say Kimi supports deep reasoning across multiple tool calls, but they do not expose a stable universal public step quota that should be hard-coded into a blog post.
Validate tool arguments before you execute side effects. That part is implementation guidance, not a Moonshot guarantee, but it is the difference between a useful agent and an expensive retry loop.

Use Kimi K2 Thinking through EvoLink

The simplest way to access Kimi K2 Thinking without wiring Moonshot credentials directly is through EvoLink's OpenAI-compatible gateway. Point your existing OpenAI SDK client at api.evolink.ai and use the same model name:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.evolink.ai/v1",
    api_key="YOUR_EVOLINK_API_KEY",
)

completion = client.chat.completions.create(
    model="kimi-k2-thinking",
    messages=[{"role": "user", "content": "Analyze the tradeoffs of event sourcing vs CRUD."}],
    temperature=1.0,
    max_tokens=16000,
)

EvoLink handles provider routing, retry, and failover. The same reasoning_content preservation rules from the earlier section still apply — EvoLink passes the full response through, so your agent loop works exactly as shown above.

Alternative: OpenClaw integration

If your runtime is OpenClaw rather than a direct API or EvoLink gateway, OpenClaw's Moonshot provider docs currently list:

moonshot/kimi-k2.5
moonshot/kimi-k2-thinking
moonshot/kimi-k2-thinking-turbo

The documented onboarding shortcut is:

openclaw onboard --auth-choice moonshot-api-key
openclaw models list
openclaw models set moonshot/kimi-k2-thinking
openclaw models status

OpenClaw also documents binary native thinking control for Moonshot:

/think off disables Moonshot thinking
any non-off thinking level maps back to thinking.type=enabled

That is useful if you want one gateway that can switch between a cheaper non-thinking pass and a deeper reasoning pass.

A safer decision framework

Use case	Better fit
Multi-step research agent with tools	`kimi-k2-thinking` via EvoLink or direct Moonshot API
General app assistant that only sometimes needs thinking	`kimi-k2.5` via EvoLink
OpenClaw deployment that needs a Kimi default model	`moonshot/kimi-k2.5` first, then escalate to `moonshot/kimi-k2-thinking` for harder sessions
Tool-heavy workflow where latency matters	Test `kimi-k2-thinking` through EvoLink's smart routing for automatic failover

FAQ

Is Kimi K2 Thinking the same as Kimi K2.5?

No. Moonshot's current docs describe kimi-k2-thinking as the dedicated thinking model and kimi-k2.5 as the recommended flexible model that has thinking enabled by default.

What breaks most multi-step Kimi agents?

Dropping reasoning_content, starving the model with too small a max_tokens budget, or building a tool loop that never validates arguments or exits cleanly.

Does `reasoning_content` count toward tokens?

Yes. Moonshot's docs say the combined tokens from reasoning_content and content must fit inside max_tokens.

Should I disable thinking for every simple task?

If you are using kimi-k2.5, that can make sense for cost and latency control. If you specifically choose kimi-k2-thinking, the more natural assumption is that the workflow is reasoning-heavy enough to justify always-on thinking.

Can I use Kimi K2 Thinking through EvoLink?

Yes. Point your OpenAI SDK at https://api.evolink.ai/v1 with your EvoLink API key and use kimi-k2-thinking as the model name. EvoLink handles routing, retry, and failover automatically.

Can I use Kimi K2 Thinking in OpenClaw?

Yes. OpenClaw's Moonshot provider page currently lists moonshot/kimi-k2-thinking as a supported model reference.

Where should I publish pricing numbers from?

From Moonshot's live pricing pages, not from third-party benchmark tables or older comparison posts. This rewrite intentionally avoids hard-coded pricing claims because those values change faster than model-behavior guidance.

Try Kimi Through One Gateway

If you want to test Kimi alongside Claude, GPT, and other agent-friendly models without wiring each provider separately, use a gateway layer and verify the currently available routes before you publish a cost comparison.

Compare Agent Models on EvoLink

Sources

All Posts

#Kimi K2 Thinking #Moonshot AI #agent workflows #tool calling #API guide

Kimi K2 Thinking API Guide: Building Multi-Step Agents Without Losing Reasoning State

TL;DR

What Moonshot's current docs actually confirm

Which Kimi route should you start with?

The implementation detail most guides miss

Minimal multi-step agent loop

Four rules that matter more than model marketing

Use Kimi K2 Thinking through EvoLink

Alternative: OpenClaw integration

A safer decision framework

FAQ

Is Kimi K2 Thinking the same as Kimi K2.5?

What breaks most multi-step Kimi agents?

Does `reasoning_content` count toward tokens?

Should I disable thinking for every simple task?

Can I use Kimi K2 Thinking through EvoLink?

Can I use Kimi K2 Thinking in OpenClaw?

Where should I publish pricing numbers from?

Try Kimi Through One Gateway

Sources

Related Articles

How to Use Gemini 3.5 Flash API: Model ID, Pricing, and Code Examples

Nano Banana 2 Lite Batch Image Generation: Low-Cost 1K Workflows

How to Use Midjourney V8.1 API with EvoLink: Code Examples, Polling, and Callback

Ready to Reduce Your AI Costs by 89%?

Kimi K2 Thinking API Guide: Building Multi-Step Agents Without Losing Reasoning State

TL;DR

What Moonshot's current docs actually confirm

Which Kimi route should you start with?

The implementation detail most guides miss

Minimal multi-step agent loop

Four rules that matter more than model marketing

Use Kimi K2 Thinking through EvoLink

Alternative: OpenClaw integration

A safer decision framework

FAQ

Is Kimi K2 Thinking the same as Kimi K2.5?

What breaks most multi-step Kimi agents?

Does reasoning_content count toward tokens?

Should I disable thinking for every simple task?

Can I use Kimi K2 Thinking through EvoLink?

Can I use Kimi K2 Thinking in OpenClaw?

Where should I publish pricing numbers from?

Try Kimi Through One Gateway

Sources

Related Articles

How to Use Gemini 3.5 Flash API: Model ID, Pricing, and Code Examples

Nano Banana 2 Lite Batch Image Generation: Low-Cost 1K Workflows

How to Use Midjourney V8.1 API with EvoLink: Code Examples, Polling, and Callback

Ready to Reduce Your AI Costs by 89%?

Does `reasoning_content` count toward tokens?