MiniMax-M2.5 API

Use MiniMax-M2.5 through EvoLink with one API key, OpenAI-compatible requests, and production pricing from $0.18/1M input tokens. It is well positioned for coding agents, repo Q&A, research agents, and long-context document workflows without changing your SDK stack.

Model Type:

Price:

$0.191(~ 13 credits) per 1M input tokens; $0.762(~ 51.8 credits) per 1M output tokens

$0.025(~ 1.7 credits) per 1M cache read tokens

Web search tool charged separately per request.

Highest stability with guaranteed 99.9% uptime. Recommended for production environments.

Use the same API endpoint for all versions. Only the model parameter differs.

PRICING

PLAN	CONTEXT WINDOW	MAX OUTPUT	INPUT	OUTPUT	CACHE READ
MiniMax-M2.5	204,800	131,072	$0.191-36% (13 Credits)	$0.762-37% (51.8 Credits)	$0.025-17% (1.7 Credits)
Web Search Tool Server-side web search capability					$0.001/search (0.04 Credits)

Pricing Note: Prices show both USD and Credits. Units default to / 1M tokens unless noted separately.

Cache Hit: Price applies to cached prompt tokens.

MiniMax-M2.5 API Pricing and Access for Reasoning Workloads

Route MiniMax-M2.5 through EvoLink for coding agents, repo Q&A, research, and document analysis with 204K context, built-in web search, and prompt caching. Start with OpenAI-compatible access and pricing from $0.18/1M input tokens.

Access and workflow fit

Best fit

Coding agents

Access

OpenAI-compatible

Context

204K window

Built-in

Web search + caching

View pricing Gateway setup for coding CLIs

What can you build with the MiniMax-M2.5 API?

Intelligent Coding Assistants

Build coding copilots and coding agents that handle repo Q&A, code generation, bug triage, and review workflows. MiniMax-M2.5 is a strong fit when your product needs long-context code understanding and step-by-step reasoning in one text API.

Start building

Use-case showcase of MiniMax-M2.5 API coding

Research & Analysis with Web Search

Use MiniMax-M2.5 for research agents, market scans, and knowledge workflows that need fresh web data. Search can be enabled only when needed, helping teams balance answer quality, latency, and cost.

Explore research

Use-case showcase of MiniMax-M2.5 API research

Document Processing & Summarization

Process contracts, reports, support transcripts, and long internal knowledge bases without aggressive chunking. The 204K context window is useful for structured summaries, extraction pipelines, and document comparison tasks.

Process documents

Use-case showcase of MiniMax-M2.5 API documents

Why teams choose the MiniMax-M2.5 API

Teams choose MiniMax-M2.5 on EvoLink when they need long-context reasoning, predictable token pricing, and faster onboarding than a separate vendor-specific integration.

Lower-friction integration

Keep the OpenAI-style request shape, use one EvoLink key, and plug MiniMax-M2.5 into coding agents or gateway-style workflows without building a MiniMax-specific integration path first.

Predictable production cost

Visible token pricing makes budgeting easier: input starts at $0.18/1M, output at $0.72/1M, and cache hits at $0.018/1M for repeated prompts.

Reasoning plus live retrieval

Use 204K context for large prompts and turn on built-in web search for research or verification flows that need fresh information.

How to integrate the MiniMax-M2.5 API

Keep your existing OpenAI client, point it to EvoLink, set the model to MiniMax-M2.5, and use the same route for coding-agent, repo Q&A, and long-context workflows.

Step 1 — Authenticate

Create an EvoLink API key, set the EvoLink base URL, and send requests with standard Bearer authentication.

Step 2 — Set required fields

Send `model: MiniMax-M2.5` with your `messages` array. Reuse stable system prompts and prefixes to benefit from prompt caching on repeated workloads.

Step 3 — Tune outputs

Adjust temperature, top_p, max_tokens, and stream as usual. Turn on `enable_search` only when needed, then choose `search_strategy: turbo` or `max` based on latency and coverage.

View API Docs

MiniMax-M2.5 API features for production teams

Concrete controls and deployment signals instead of a generic model overview

Reasoning

Reasoning model for text workloads

Use MiniMax-M2.5 for coding, structured analysis, and multi-step text tasks where response quality matters more than lightweight chat output.

Context

204K Context Window

Fit long documents, large prompts, and multi-turn context into one request before you reach for aggressive chunking or multi-pass orchestration.

Search modes for fresh data

Enable real-time retrieval with `enable_search: true` and choose `turbo` or `max` depending on whether speed or broader coverage is more important.

Compatibility

OpenAI SDK Compatible

Move existing OpenAI-style clients onto MiniMax-M2.5 by changing the base URL and model name instead of rebuilding your integration path for coding tools or internal agents.

Caching

Prompt Caching

Repeated prefixes and system prompts can be billed more efficiently, which helps recurring agent workflows and high-volume production traffic.

Speed

Alibaba Cloud deployment path

The route is deployed on Alibaba Cloud for low-latency access and a production-oriented delivery path on EvoLink.

MiniMax-M2.5 API FAQs

Everything you need to know about the product and billing.

MiniMax-M2.5 pricing on EvoLink starts at $0.18 per 1M input tokens and $0.72 per 1M output tokens. Cache hits start at $0.018 per 1M tokens, which is useful when you reuse long system prompts or stable prefixes across repeated requests.

MiniMax-M2.5 is a strong fit for coding agents, coding assistants, repo Q&A, research workflows, document analysis, and other text applications that benefit from long context, multi-step reasoning, and optional web search.

MiniMax-M2.5 supports a 204K token context window (204,800 tokens), with up to 196,608 tokens for input and 131,072 tokens for combined reasoning chain and output.

Yes. MiniMax-M2.5 supports built-in web search and implicit prompt caching. Enable search with `enable_search: true`, choose `turbo` or `max` for your search strategy, and benefit from lower-cost cache hits when prompts share the same stable prefix.

Yes. EvoLink provides an OpenAI-compatible API endpoint. You can use the OpenAI SDK by changing the base URL to your EvoLink endpoint and setting the model to MiniMax-M2.5.

Usually yes. Teams evaluating MiniMax-M2.5 for coding agents often want one stable gateway path for editor tools, CLIs, and internal agents. If your workflow already accepts an OpenAI-compatible endpoint, EvoLink keeps that migration light. For adjacent setup patterns, see One Gateway for 3 Coding CLIs and Gateway vs Direct APIs.

Use the model enum `MiniMax-M2.5` in the request body. EvoLink will route the request to the MiniMax-M2.5 model through the optimal provider.