Claude Sonnet 4.0 API
Claude Sonnet 4.0 API is a balanced, high-performance model designed for production teams that need strong reasoning, safe outputs, and predictable costs. Use the Claude Sonnet 4.0 API for support agents, document analysis, and developer workflows where quality and budget both matter.
PRICING
| PLAN | CONTEXT WINDOW | MAX OUTPUT | INPUT | OUTPUT | CACHE WRITE | CACHE READ |
|---|---|---|---|---|---|---|
| Claude Sonnet 4.0 | 200.0K | 64.0K | ≤200.0K$2.55-15% $3.00Official Price >200.0K$5.10-15% $6.00Official Price | ≤200.0K$12.75-15% $15.00Official Price >200.0K$19.13-15% $22.50Official Price | ≤200.0K$3.19-15% $3.75Official Price >200.0K$6.38-15% $7.50Official Price | ≤200.0K$0.256-15% $0.300Official Price >200.0K$0.511-15% $0.600Official Price |
| Claude Sonnet 4.0 (Beta) | 200.0K | 64.0K | ≤200.0K$0.780-74% $3.00Official Price >200.0K$1.56-74% $6.00Official Price | ≤200.0K$3.90-74% $15.00Official Price >200.0K$5.85-74% $22.50Official Price | ≤200.0K$0.975-74% $3.75Official Price >200.0K$1.95-74% $7.50Official Price | ≤200.0K$0.078-74% $0.300Official Price >200.0K$0.156-74% $0.600Official Price |
Web Search Tool
Server-side web search capability
Pricing Note: Price unit: USD / 1M tokens
Cache Hit: Price applies to cached prompt tokens.
Two ways to run Claude Sonnet 4.0 — pick the tier that matches your workload.
- · Claude Sonnet 4.0: the default tier for production reliability and predictable availability.
- · Claude Sonnet 4.0 (Beta): a lower-cost tier with best-effort availability; retries recommended for retry-tolerant workloads.
Claude Sonnet 4.0 API — Balanced Intelligence for Production
Ship reliable AI experiences with the Claude Sonnet 4.0 API, combining practical latency with strong reasoning for real teams and real workloads.

What can you build with the Claude Sonnet 4.0 API?
Customer support agents
Create support assistants that resolve tickets end-to-end with the Claude Sonnet 4.0 API. It maintains brand tone, understands long customer histories, and can call tools to fetch orders or update CRM records. Teams use the Claude Sonnet 4.0 API to reduce handle time, increase resolution quality, and keep replies consistent across languages and channels.

Document analysis and extraction
Turn contracts, reports, and logs into structured summaries with the Claude Sonnet 4.0 API. With long-context options, the Claude Sonnet 4.0 API can read large documents, answer precise questions, and output JSON that fits your schema. This is ideal for compliance reviews, knowledge bases, and analytics pipelines that need accuracy and traceable summaries.

Developer copilots and code review
Ship coding copilots that review diffs, propose fixes, and explain design choices. The Claude Sonnet 4.0 API brings Claude 4 reasoning to everyday engineering tasks, with a pricing tier that fits teams scaling PR summaries, refactors, and architecture guidance. Use the Claude Sonnet 4.0 API to keep reviews fast, helpful, and consistent across large codebases.

Why teams choose the Claude Sonnet 4.0 API
Claude Sonnet 4.0 API balances capability, cost, and reliability for production AI.
Balanced performance
Strong reasoning with practical latency for daily workflows.
Clear cost planning
Transparent base pricing with caching and batch options.
Production readiness
Tool use, structured outputs, and long-context options.
How to integrate the Claude Sonnet 4.0 API
From API key to production workflows in minutes with the Claude Sonnet 4.0 API.
Step 1 — Authenticate
Create an API key, set the Sonnet 4 model alias, and send a first prompt from your app or backend.
Step 2 — Add tools
Define tools and JSON Schema inputs so the model returns structured, actionable results for your workflow.
Step 3 — Optimize
Use caching or batch processing, then monitor usage, latency, and quality as you scale the Claude Sonnet 4.0 API.
Claude Sonnet 4.0 API capabilities
Practical features that match real product needs
Transparent base pricing
Claude Sonnet 4 is priced at $3 per million input tokens and $15 per million output tokens. This clear baseline helps teams forecast costs and pick the right model for production workloads.
Prompt caching rates
Prompt caching uses separate rates: 5-minute cache writes are 1.25x base input, 1-hour cache writes are 2x, and cache reads are 0.1x. This makes repeated context far cheaper over time.
1M context beta pricing
The 1M context window is in beta for usage tier 4 or custom rate limits and is only available for Claude Sonnet 4 and 4.5. Requests over 200K input tokens use premium rates: $6 input and $22.50 output per MTok.
Batch processing savings
Batch processing provides a 50% discount on both input and output tokens for asynchronous jobs, which can lower costs for large-scale ingestion and nightly automation.
Tool use with JSON Schema
Tool definitions include an input_schema that uses JSON Schema to define parameters. This keeps tool calls predictable and improves reliability for agents that must execute actions or return structured data.
Multimodal and multilingual
All current Claude models support text and image input, text output, multilingual capabilities, and vision. Claude models are available via the Anthropic API and on AWS Bedrock, Google Vertex AI, and Microsoft Foundry.
Frequently Asked Questions
Everything you need to know about the product and billing.