Gemini 3.1 Flash Lite API

Gemini 3.1 Flash Lite is a low-cost, high-throughput Gemini model for translation, extraction, classification, and document processing. Access it on EvoLink with OpenAI-compatible or native Gemini requests; the current request model ID is gemini-3.1-flash-lite-preview.
Price: 

$0.200(~ 14.4 credits) per 1M input tokens; $1.200(~ 86.4 credits) per 1M output tokens

$0.019(~ 1.4 credits) per 1M cache read tokens; $0.400(~ 28.8 credits) per 1M audio tokens

Google Search grounding charged separately per query.

Highest stability with guaranteed 99.9% uptime. Recommended for production environments.

Use the same API endpoint for all versions. Only the model parameter differs.

A low-cost Gemini model for translation, extraction, and document workflows

Gemini 3.1 Flash Lite fits high-throughput tasks where cost, latency, and retryability matter more than premium model quality. With 1M context, multimodal input, and tool support, it works well as the lower-cost processing layer in a broader Gemini stack.

Page keyword

Gemini 3.1 Flash Lite API

Request model ID

gemini-3.1-flash-lite-preview

Gemini 3.1 Flash Lite API

Best use cases for Gemini 3.1 Flash Lite API

Cost-Efficient High-Volume Processing

Flash Lite works well as the cheap processing layer in a larger AI stack. Use it for translation backfills, tagging queues, extraction jobs, and first-pass classification before escalating edge cases to a stronger model.

Cost-efficient processing

Multimodal Inputs with 1M Context

Send text, images, video, audio, or PDFs in a single request with up to 1,050,000 input tokens. Handle long documents, batch content, or multi-turn conversations without splitting context.

Long context

Agentic Tasks and Tool Use

Supports function calling, structured outputs, thinking, code execution, search grounding, and caching. That makes it useful for low-cost agent substeps, retrieval cleanup, and structured preprocessing inside multi-model pipelines.

Agentic tasks

Why use EvoLink for Gemini 3.1 Flash Lite API

EvoLink makes Gemini 3.1 Flash Lite more useful for teams that already ship on OpenAI-style infrastructure: one gateway, lower migration friction, and cleaner model routing across cheap and premium tiers.

Keep OpenAI-Style Workflows While Using Gemini

Teams already built around the OpenAI SDK can add Gemini 3.1 Flash Lite without rebuilding their request layer, auth flow, or fallback logic from scratch.

Use Flash Lite as the Low-Cost Stage in a Multi-Model Stack

Route cheap translation, extraction, and classification traffic to Flash Lite first, then send only the harder or higher-value requests to stronger models on the same gateway.

Lower Migration Cost Than Vendor-Specific Integrations

One API key, OpenAI-compatible and native Gemini request formats, plus caching and batch support make it easier to operate Gemini alongside the rest of your model catalog.

How to use Gemini 3.1 Flash Lite API

Use this page as an access overview: pick your request format, use the preview model ID, and move detailed request examples to docs.

1

Step 1 - Choose the Request Format

Gemini 3.1 Flash Lite can be called through OpenAI-compatible requests or the native Gemini API, which makes it easier to fit into existing stacks without rebuilding your whole integration path.

2

Step 2 - Use the Current Request Model ID

Use the exact request model ID "gemini-3.1-flash-lite-preview" when sending production traffic. That keeps the page keyword focused on Gemini 3.1 Flash Lite API while still matching the route you actually call.

3

Step 3 - Scale the Right Workloads Here

Use Flash Lite for translation queues, extraction jobs, tagging, and other high-volume tasks, then send edge cases or harder requests to stronger models. For exact request bodies, parameters, and endpoint examples, continue to docs.

Gemini 3.1 Flash Lite API Features and Limits

Core capabilities and limits for planning production integrations

Context

1,050,000 Input Tokens

Up to 1,050,000 input tokens and 65,536 output tokens.

Multimodal

Multimodal Inputs

Text, image, video, audio, and PDF inputs with text output.

Reasoning

Thinking + Structured Outputs

Thinking and structured outputs supported for reliable, machine-readable results.

Tools

Function Calling + Tools

Function calling, code execution, and search grounding are supported.

Scale

Caching + Batch

Context caching and Batch API supported for repeated or large-scale workloads.

Pricing

Ultra-Low Cost

Use the live pricing table above to verify the current EvoLink pay-as-you-go rate for this route.

Gemini 3.1 Flash Lite API FAQs

Everything you need to know about the product and billing.

Yes. Gemini 3.1 Flash Lite is positioned as a lower-cost Flash route for high-volume workloads where throughput and price matter more than the stronger general-purpose quality you would expect from a larger Gemini Flash model.
Yes. EvoLink supports OpenAI-compatible requests at POST /v1/chat/completions, and it also supports Google Native API requests at POST /v1beta/models/gemini-3.1-flash-lite-preview:{method}.
Gemini 3.1 Flash Lite supports up to 1,050,000 input tokens and 65,536 output tokens, which makes it suitable for long documents, large batches, and multi-step processing pipelines.
Yes. Gemini 3.1 Flash Lite supports text, image, video, audio, and PDF input with text output, which makes it useful for extraction, summarization, and multimodal document workflows.
Use the exact preview model identifier "gemini-3.1-flash-lite-preview" in API requests. This page targets the Gemini 3.1 Flash Lite API route, while the request model ID remains the preview identifier.
Choose Flash Lite for translation, extraction, classification, tagging, and other retry-friendly workloads that need lower cost at scale. Move up to a larger Gemini Flash route when output quality or task difficulty matters more than keeping each request cheap.
Gemini 3.1 Flash Lite is best for cost-sensitive, high-throughput tasks such as translation, classification, extraction, tagging, document processing, and lightweight agent workflows where low latency matters more than frontier reasoning depth.
Image generation, audio generation, and Live API are not supported on this model. Google Maps grounding is also not supported, so it is best positioned for low-cost text-output workflows instead of realtime or generative media tasks.

Continue with Gemini family pages and integration guides

Where Gemini 3.1 Flash Lite fits in the Gemini family

Treat this route as the lower-cost execution layer in the Gemini family, not as a replacement for stronger general-purpose models. It fits high-throughput, retry-friendly, batch-heavy workloads; when task difficulty or output quality matters more, move up to a stronger Flash route on the site.

Group family-model links and integration content in one place so the page stays focused and the next step is clearer.