Kimi K2 Thinking API
Launch Moonshot AI's open-source Kimi K2 Thinking for deep, tool-using reasoning. With a 256K context window, a 1T-parameter MoE backbone (32B active), and structured tool-call outputs, it is built for long-horizon agents and high-stakes analysis.
PRICING
| PLAN | CONTEXT WINDOW | MAX OUTPUT | INPUT | OUTPUT | CACHE READ |
|---|---|---|---|---|---|
| Kimi K2 Thinking | 262.1K | 262.1K | $0.556-7% $0.600Official Price | $2.22-11% $2.50Official Price | $0.139-7% $0.150Official Price |
Web Search Tool
Server-side web search capability
Pricing Note: Price unit: USD / 1M tokens
Cache Hit: Price applies to cached prompt tokens.
Kimi K2 Thinking API for Long-Horizon Reasoning
Run the K2 Thinking model through EvoLink to analyze massive documents, orchestrate tools, and produce structured outputs. Built for a 256K-token context window, native tool calling, and reliable multi-step workflows.

What can you build with Kimi K2 Thinking?
Long-Context Research
Process full reports, codebases, or knowledge bases in a single request. The 256K context window makes it practical to reason over large inputs without aggressive chunking.

Tool-Orchestrated Agents
Design agents that call tools and stay on track. K2 Thinking accepts tool definitions and returns JSON tool calls, supporting long, multi-step plans.

Codebase and Data Workflows
Use the model for refactors, debugging, and data analysis across large repositories or datasets with consistent, step-by-step reasoning.

Why developers choose Kimi K2 Thinking API
Get open-source flexibility, 256K context, and native tool use for robust, long-horizon agent workflows.
256K Context Window
Reason across long documents and multi-turn histories with a full 256K-token window for complex tasks.
Native Tool Calling
Accepts tool definitions and produces JSON tool calls, enabling reliable orchestration and structured outputs.
Open-Source + MoE Efficiency
Open weights with a modified MIT license and a 1T-parameter MoE design (32B active) for scale-efficient reasoning.
How to integrate Kimi K2 Thinking API
Three steps to add long-horizon reasoning and tool use to your app.
Step 1 — Provide Context
Send long inputs or RAG-augmented context up to 256K tokens to give the model full task visibility.
Step 2 — Define Tools
Attach function schemas so the model can call search, code, or business tools using structured JSON.
Step 3 — Execute and Verify
Run multi-step reasoning, stream results, and validate tool calls or reasoning traces before acting on outputs.
Kimi K2 Thinking Capabilities
Engineered for agentic reasoning at long context lengths
256K Token Context
Handle long documents, chats, and codebases in one request.
MoE 1T / 32B Active
Mixture-of-Experts architecture balances scale with efficiency.
Tool Definitions + JSON Calls
Supports structured tool calling and JSON outputs for automation.
Reasoning Traces
Supports separate reasoning_content traces when enabled by the provider.
Native INT4 Quantization
Optimized for efficient inference with quantization-aware training.
Open-Source License
Modified MIT license with commercial use permitted (review terms).
Frequently Asked Questions about Kimi K2 Thinking
Everything you need to know about the product and billing.