Kimi K2 Thinking API

Launch Moonshot AI's open-source Kimi K2 Thinking for deep, tool-using reasoning. With a 256K context window, a 1T-parameter MoE backbone (32B active), and structured tool-call outputs, it is built for long-horizon agents and high-stakes analysis.

PRICING

PLAN	CONTEXT WINDOW	MAX OUTPUT	INPUT	OUTPUT	CACHE READ
Kimi K2 Thinking	262.1K	262.1K	$0.556-7% $0.600Official Price	$2.22-11% $2.50Official Price	$0.139-7% $0.150Official Price

Web Search Tool

Server-side web search capability

$0.004/search

Pricing Note: Price unit: USD / 1M tokens

Cache Hit: Price applies to cached prompt tokens.

Kimi K2 Thinking API for Long-Horizon Reasoning

Run the K2 Thinking model through EvoLink to analyze massive documents, orchestrate tools, and produce structured outputs. Built for a 256K-token context window, native tool calling, and reliable multi-step workflows.

Kimi K2 Thinking long-context reasoning model

What can you build with Kimi K2 Thinking?

Long-Context Research

Process full reports, codebases, or knowledge bases in a single request. The 256K context window makes it practical to reason over large inputs without aggressive chunking.

Build research flows

Tool-Orchestrated Agents

Design agents that call tools and stay on track. K2 Thinking accepts tool definitions and returns JSON tool calls, supporting long, multi-step plans.

Create agents

Codebase and Data Workflows

Use the model for refactors, debugging, and data analysis across large repositories or datasets with consistent, step-by-step reasoning.

Try coding tasks

Why developers choose Kimi K2 Thinking API

Get open-source flexibility, 256K context, and native tool use for robust, long-horizon agent workflows.

256K Context Window

Reason across long documents and multi-turn histories with a full 256K-token window for complex tasks.

Native Tool Calling

Accepts tool definitions and produces JSON tool calls, enabling reliable orchestration and structured outputs.

Open-Source + MoE Efficiency

Open weights with a modified MIT license and a 1T-parameter MoE design (32B active) for scale-efficient reasoning.

How to integrate Kimi K2 Thinking API

Three steps to add long-horizon reasoning and tool use to your app.

Step 1 — Provide Context

Send long inputs or RAG-augmented context up to 256K tokens to give the model full task visibility.

Step 2 — Define Tools

Attach function schemas so the model can call search, code, or business tools using structured JSON.

Step 3 — Execute and Verify

Run multi-step reasoning, stream results, and validate tool calls or reasoning traces before acting on outputs.

View API Docs

Kimi K2 Thinking Capabilities

Engineered for agentic reasoning at long context lengths

Context

256K Token Context

Handle long documents, chats, and codebases in one request.

Architecture

MoE 1T / 32B Active

Mixture-of-Experts architecture balances scale with efficiency.

Tools

Tool Definitions + JSON Calls

Supports structured tool calling and JSON outputs for automation.

Explainability

Reasoning Traces

Supports separate reasoning_content traces when enabled by the provider.

Performance

Native INT4 Quantization

Optimized for efficient inference with quantization-aware training.

License

Open-Source License

Modified MIT license with commercial use permitted (review terms).

Frequently Asked Questions about Kimi K2 Thinking

Everything you need to know about the product and billing.

Kimi K2 Thinking is Moonshot AI's open-source thinking model built as a tool-using agent. It uses a 1T-parameter Mixture-of-Experts architecture (32B active), supports a 256K context window, and accepts tool definitions with JSON tool calls for long-horizon workflows.

The model supports up to a 256K-token context window. Providers may apply smaller per-request limits or output caps depending on their infrastructure.

Yes. The model is trained to interleave step-by-step reasoning with function calls and to maintain stable multi-step tool use across 200–300 sequential invocations.

Yes. The model weights are published on Hugging Face under a modified MIT license. Review the license and third-party notices to confirm commercial usage terms.

Yes. It is recommended to run K2 Thinking on engines like vLLM, SGLang, or KTransformers with suitable GPU resources.

Yes. The model uses Quantization-Aware Training for INT4 weight-only inference, reporting roughly 2x speed-up in low-latency mode while preserving quality.

Moonshot AI provides OpenAI- and Anthropic-compatible API endpoints for Kimi K2 Thinking, which simplifies integration with existing SDKs.

Reported results include HLE (with tools) at 44.9%, BrowseComp (with tools) at 60.2%, and SWE-bench Verified (with tools) at 71.3%, with evaluations reported under INT4 precision.