Gemini 3.1 Flash Lite API
- One API for Code Agents & CLIs. (View Docs)
$0.200(~ 14.4 credits) per 1M input tokens; $1.200(~ 86.4 credits) per 1M output tokens
$0.019(~ 1.4 credits) per 1M cache read tokens; $0.400(~ 28.8 credits) per 1M audio tokens
Google Search grounding charged separately per query.
Highest stability with guaranteed 99.9% uptime. Recommended for production environments.
Use the same API endpoint for all versions. Only the model parameter differs.
A low-cost Gemini model for translation, extraction, and document workflows
Gemini 3.1 Flash Lite fits high-throughput tasks where cost, latency, and retryability matter more than premium model quality. With 1M context, multimodal input, and tool support, it works well as the lower-cost processing layer in a broader Gemini stack.
Page keyword
Gemini 3.1 Flash Lite API
Request model ID
gemini-3.1-flash-lite-preview

Best use cases for Gemini 3.1 Flash Lite API
Cost-Efficient High-Volume Processing
Flash Lite works well as the cheap processing layer in a larger AI stack. Use it for translation backfills, tagging queues, extraction jobs, and first-pass classification before escalating edge cases to a stronger model.

Multimodal Inputs with 1M Context
Send text, images, video, audio, or PDFs in a single request with up to 1,050,000 input tokens. Handle long documents, batch content, or multi-turn conversations without splitting context.

Agentic Tasks and Tool Use
Supports function calling, structured outputs, thinking, code execution, search grounding, and caching. That makes it useful for low-cost agent substeps, retrieval cleanup, and structured preprocessing inside multi-model pipelines.

Why use EvoLink for Gemini 3.1 Flash Lite API
EvoLink makes Gemini 3.1 Flash Lite more useful for teams that already ship on OpenAI-style infrastructure: one gateway, lower migration friction, and cleaner model routing across cheap and premium tiers.
Keep OpenAI-Style Workflows While Using Gemini
Teams already built around the OpenAI SDK can add Gemini 3.1 Flash Lite without rebuilding their request layer, auth flow, or fallback logic from scratch.
Use Flash Lite as the Low-Cost Stage in a Multi-Model Stack
Route cheap translation, extraction, and classification traffic to Flash Lite first, then send only the harder or higher-value requests to stronger models on the same gateway.
Lower Migration Cost Than Vendor-Specific Integrations
One API key, OpenAI-compatible and native Gemini request formats, plus caching and batch support make it easier to operate Gemini alongside the rest of your model catalog.
How to use Gemini 3.1 Flash Lite API
Use this page as an access overview: pick your request format, use the preview model ID, and move detailed request examples to docs.
Step 1 - Choose the Request Format
Gemini 3.1 Flash Lite can be called through OpenAI-compatible requests or the native Gemini API, which makes it easier to fit into existing stacks without rebuilding your whole integration path.
Step 2 - Use the Current Request Model ID
Use the exact request model ID "gemini-3.1-flash-lite-preview" when sending production traffic. That keeps the page keyword focused on Gemini 3.1 Flash Lite API while still matching the route you actually call.
Step 3 - Scale the Right Workloads Here
Use Flash Lite for translation queues, extraction jobs, tagging, and other high-volume tasks, then send edge cases or harder requests to stronger models. For exact request bodies, parameters, and endpoint examples, continue to docs.
Gemini 3.1 Flash Lite API Features and Limits
Core capabilities and limits for planning production integrations
1,050,000 Input Tokens
Up to 1,050,000 input tokens and 65,536 output tokens.
Multimodal Inputs
Text, image, video, audio, and PDF inputs with text output.
Thinking + Structured Outputs
Thinking and structured outputs supported for reliable, machine-readable results.
Function Calling + Tools
Function calling, code execution, and search grounding are supported.
Caching + Batch
Context caching and Batch API supported for repeated or large-scale workloads.
Ultra-Low Cost
Use the live pricing table above to verify the current EvoLink pay-as-you-go rate for this route.
Gemini 3.1 Flash Lite API FAQs
Everything you need to know about the product and billing.
Continue with Gemini family pages and integration guides
Where Gemini 3.1 Flash Lite fits in the Gemini family
Treat this route as the lower-cost execution layer in the Gemini family, not as a replacement for stronger general-purpose models. It fits high-throughput, retry-friendly, batch-heavy workloads; when task difficulty or output quality matters more, move up to a stronger Flash route on the site.
Group family-model links and integration content in one place so the page stays focused and the next step is clearer.