Gemini 3.1 Flash Lite API

Gemini 3.1 Flash Lite is a low-cost, high-throughput Gemini model for translation, extraction, classification, and document processing. Access it on EvoLink with OpenAI-compatible or native Gemini requests; the current request model ID is gemini-3.1-flash-lite-preview.

Using coding CLIs? Run Gemini 3.1 Flash Lite via EvoCode
- One API for Code Agents & CLIs. (View Docs)

Model Type:

✓Gemini 3.1 Flash Lite Preview

Price:

$0.234(~ 15.9 credits) per 1M input tokens; $1.399(~ 95.1 credits) per 1M output tokens

$0.028(~ 1.9 credits) per 1M cache read tokens; $0.466(~ 31.7 credits) per 1M audio tokens

Google Search grounding charged separately per query.

Highest stability with guaranteed 99.9% uptime. Recommended for production environments.

Use the same API endpoint for all versions. Only the model parameter differs.

PRICING

PLAN	CONTEXT WINDOW	MAX OUTPUT	INPUT	OUTPUT	CACHE READ	AUDIO INPUT
Gemini 3.1 Flash Lite	1,050,000	65,536	$0.234-6% (15.9 Credits)	$1.399-7% (95.1 Credits)	$0.028-7% (1.9 Credits)	$0.466-7% (31.7 Credits)
Web Search Tool Server-side web search capability						$0.013/search (0.89 Credits)

Pricing Note: Prices show both USD and Credits. Units default to / 1M tokens unless noted separately.

Cache Hit: Price applies to cached prompt tokens.

Audio Input: Audio tokens charged at a separate rate.

A low-cost Gemini model for translation, extraction, and document workflows

Gemini 3.1 Flash Lite fits high-throughput tasks where cost, latency, and retryability matter more than premium model quality. With 1M context, multimodal input, and tool support, it works well as the lower-cost processing layer in a broader Gemini stack.

Page keyword

Gemini 3.1 Flash Lite API

Request model ID

gemini-3.1-flash-lite-preview

Best use cases for Gemini 3.1 Flash Lite API

Cost-Efficient High-Volume Processing

Flash Lite works well as the cheap processing layer in a larger AI stack. Use it for translation backfills, tagging queues, extraction jobs, and first-pass classification before escalating edge cases to a stronger model.

Use Cases

Multimodal Inputs with 1M Context

Send text, images, video, audio, or PDFs in a single request with up to 1,050,000 input tokens. Handle long documents, batch content, or multi-turn conversations without splitting context.

Token Limits

Agentic Tasks and Tool Use

Supports function calling, structured outputs, thinking, code execution, search grounding, and caching. That makes it useful for low-cost agent substeps, retrieval cleanup, and structured preprocessing inside multi-model pipelines.

Supported Features

Why use EvoLink for Gemini 3.1 Flash Lite API

EvoLink makes Gemini 3.1 Flash Lite more useful for teams that already ship on OpenAI-style infrastructure: one gateway, lower migration friction, and cleaner model routing across cheap and premium tiers.

Keep OpenAI-Style Workflows While Using Gemini

Teams already built around the OpenAI SDK can add Gemini 3.1 Flash Lite without rebuilding their request layer, auth flow, or fallback logic from scratch.

Use Flash Lite as the Low-Cost Stage in a Multi-Model Stack

Route cheap translation, extraction, and classification traffic to Flash Lite first, then send only the harder or higher-value requests to stronger models on the same gateway.

Lower Migration Cost Than Vendor-Specific Integrations

One API key, OpenAI-compatible and native Gemini request formats, plus caching and batch support make it easier to operate Gemini alongside the rest of your model catalog.

How to use Gemini 3.1 Flash Lite API

Use this page as an access overview: pick your request format, use the preview model ID, and move detailed request examples to docs.

Step 1 - Choose the Request Format

Gemini 3.1 Flash Lite can be called through OpenAI-compatible requests or the native Gemini API, which makes it easier to fit into existing stacks without rebuilding your whole integration path.

Step 2 - Use the Current Request Model ID

Use the exact request model ID "gemini-3.1-flash-lite-preview" when sending production traffic. That keeps the page keyword focused on Gemini 3.1 Flash Lite API while still matching the route you actually call.

Step 3 - Scale the Right Workloads Here

Use Flash Lite for translation queues, extraction jobs, tagging, and other high-volume tasks, then send edge cases or harder requests to stronger models. For exact request bodies, parameters, and endpoint examples, continue to docs.

Open Developer Playground

Gemini 3.1 Flash Lite API Features and Limits

Core capabilities and limits for planning production integrations

Context

1,050,000 Input Tokens

Up to 1,050,000 input tokens and 65,536 output tokens.

Multimodal

Multimodal Inputs

Text, image, video, audio, and PDF inputs with text output.

Reasoning

Thinking + Structured Outputs

Thinking and structured outputs supported for reliable, machine-readable results.

Tools

Function Calling + Tools

Function calling, code execution, and search grounding are supported.

Scale

Caching + Batch

Context caching and Batch API supported for repeated or large-scale workloads.

Pricing

Ultra-Low Cost

Use the live pricing table above to verify the current EvoLink pay-as-you-go rate for this route.

Gemini 3.1 Flash Lite API FAQs

Everything you need to know about the product and billing.

Yes. Gemini 3.1 Flash Lite is positioned as a lower-cost Flash route for high-volume workloads where throughput and price matter more than the stronger general-purpose quality you would expect from a larger Gemini Flash model.

Yes. EvoLink supports OpenAI-compatible requests at POST /v1/chat/completions, and it also supports Google Native API requests at POST /v1beta/models/gemini-3.1-flash-lite-preview:{method}.

Gemini 3.1 Flash Lite supports up to 1,050,000 input tokens and 65,536 output tokens, which makes it suitable for long documents, large batches, and multi-step processing pipelines.

Yes. Gemini 3.1 Flash Lite supports text, image, video, audio, and PDF input with text output, which makes it useful for extraction, summarization, and multimodal document workflows.

Use the exact preview model identifier "gemini-3.1-flash-lite-preview" in API requests. This page targets the Gemini 3.1 Flash Lite API route, while the request model ID remains the preview identifier.

Choose Flash Lite for translation, extraction, classification, tagging, and other retry-friendly workloads that need lower cost at scale. Move up to a larger Gemini Flash route when output quality or task difficulty matters more than keeping each request cheap.

Gemini 3.1 Flash Lite is best for cost-sensitive, high-throughput tasks such as translation, classification, extraction, tagging, document processing, and lightweight agent workflows where low latency matters more than frontier reasoning depth.

Image generation, audio generation, and Live API are not supported on this model. Google Maps grounding is also not supported, so it is best positioned for low-cost text-output workflows instead of realtime or generative media tasks.

Gemini API Models on EvoLink

Gemini 3.1 Flash Lite is the lowest-cost route in the Gemini family. Move up to Gemini 3 Flash Preview for stronger multimodal, or Gemini 3.1 Pro for frontier reasoning. All models share the same API format — switch with one parameter.

Explore Gemini family Gemini 3 Flash Preview Gemini 3.1 Pro Gemini 2.5 Flash