Kimi K2 Thinking API
Moonshot AI reasoning model with 128K context, Chain of Thought capabilities, and native tool calling. Available in Standard and Turbo variants for different use cases.
Kimi K2 Thinking API for Long-Horizon Reasoning
Run the K2 Thinking model through EvoLink to analyze massive documents, orchestrate tools, and produce structured outputs. Built for a 256K-token context window, native tool calling, and reliable multi-step workflows.

PRICING
| PLAN | CONTEXT WINDOW | MAX OUTPUT | INPUT | OUTPUT | CACHE READ |
|---|---|---|---|---|---|
| Kimi K2 Thinking | 262.1K | 262.1K | $0.556-7% $0.600Official Price | $2.222-11% $2.50Official Price | $0.139-7% $0.150Official Price |
Server-side web search capability
Pricing Note: Price unit: USD / 1M tokens
Cache Hit: Price applies to cached prompt tokens.
What can you build with Kimi K2 Thinking?
Long-Context Research
Process full reports, codebases, or knowledge bases in a single request. The 256K context window makes it practical to reason over large inputs without aggressive chunking.

Tool-Orchestrated Agents
Design agents that call tools and stay on track. K2 Thinking accepts tool definitions and returns JSON tool calls, supporting long, multi-step plans.

Codebase and Data Workflows
Use the model for refactors, debugging, and data analysis across large repositories or datasets with consistent, step-by-step reasoning.

Why developers choose Kimi K2 Thinking API
Get open-source flexibility, 256K context, and native tool use for robust, long-horizon agent workflows.
256K Context Window
Reason across long documents and multi-turn histories with a full 256K-token window for complex tasks.
Native Tool Calling
Accepts tool definitions and produces JSON tool calls, enabling reliable orchestration and structured outputs.
Open-Source + MoE Efficiency
Open weights with a modified MIT license and a 1T-parameter MoE design (32B active) for scale-efficient reasoning.
How to integrate Kimi K2 Thinking API
Three steps to add long-horizon reasoning and tool use to your app.
Step 1 — Provide Context
Send long inputs or RAG-augmented context up to 256K tokens to give the model full task visibility.
Step 2 — Define Tools
Attach function schemas so the model can call search, code, or business tools using structured JSON.
Step 3 — Execute and Verify
Run multi-step reasoning, stream results, and validate tool calls or reasoning traces before acting on outputs.
Kimi K2 Thinking Capabilities
Engineered for agentic reasoning at long context lengths
256K Token Context
Handle long documents, chats, and codebases in one request.
MoE 1T / 32B Active
Mixture-of-Experts architecture balances scale with efficiency.
Tool Definitions + JSON Calls
Supports structured tool calling and JSON outputs for automation.
Reasoning Traces
Supports separate reasoning_content traces when enabled by the provider.
Native INT4 Quantization
Optimized for efficient inference with quantization-aware training.
Open-Source License
Modified MIT license with commercial use permitted (review terms).
Kimi K2 Thinking vs. Other Reasoning Models
Compare context windows, reasoning styles, and tooling support across leading reasoning APIs
| Model | Best for | Context window | Reasoning style | Tooling & streaming |
|---|---|---|---|---|
| Kimi K2 Thinking | Long-horizon agents, tool orchestration | 256K tokens | Step-by-step with tool calls | Native tool calling, JSON outputs, streaming |
| OpenAI o1 | Complex reasoning, math, coding | 200K tokens | Internal chain-of-thought | Limited tool support, no streaming |
| Claude 3.5 Sonnet | General tasks, coding, analysis | 200K tokens | Direct response with reasoning | Full tool use, streaming supported |
| DeepSeek R1 | Math, coding, open-source deployment | 128K tokens | Explicit reasoning traces | Basic tool support, streaming |
Frequently Asked Questions about Kimi K2 Thinking
Everything you need to know about the product and billing.