Gemini 3 Flash Preview API

Access Google's Gemini 3 Flash Preview (gemini-3-flash-preview) through EvoLink with OpenAI SDK compatibility and native Gemini API support. Send text, image, video, audio, and PDF inputs with a 1,048,576 token context window, plus caching and batch options for production workloads.

Using coding CLIs? Run Gemini 3 Flash via EvoCode — One API for Code Agents & CLIs. (View Docs)

PRICING

PLAN	CONTEXT WINDOW	MAX OUTPUT	INPUT	OUTPUT	CACHE READ
Gemini 3 Flash	1.05M	65.5K	≤200.0K$0.400-20% $0.500Official Price >200.0K$0.400-20% $0.500Official Price	≤200.0K$2.40-20% $3.00Official Price >200.0K$2.40-20% $3.00Official Price	≤200.0K$0.040-19% $0.050Official Price >200.0K$0.040-19% $0.050Official Price
Gemini 3 Flash (Beta)	1.05M	65.5K	≤200.0K$0.130-74% $0.500Official Price >200.0K$0.130-74% $0.500Official Price	≤200.0K$0.780-74% $3.00Official Price >200.0K$0.780-74% $3.00Official Price	≤200.0K$0.013-74% $0.050Official Price >200.0K$0.013-74% $0.050Official Price

Pricing Note: Price unit: USD / 1M tokens

Cache Hit: Price applies to cached prompt tokens.

Two ways to run Gemini 3 Flash — pick the tier that matches your workload.

· Gemini 3 Flash: the default tier for production reliability and predictable availability.
· Gemini 3 Flash (Beta): a lower-cost tier with best-effort availability; retries recommended for retry-tolerant workloads.

Gemini 3 Flash Preview API on EvoLink

Built for speed and scale, Gemini 3 Flash Preview understands text, images, video, audio, and PDFs, and handles massive context (up to 1M tokens). It delivers clear, reliable answers for real-time assistants, document understanding, and media analysis.

What You Can Build with Gemini 3 Flash Preview

Multimodal Inputs, Reliable Text Outputs

A single request can include text, images, video, audio, or PDFs and return text output. This makes it easy to summarize meetings, review media, and extract structured insights without separate pipelines.

Input Types

1M-Token Context for Long Sessions

Handle up to 1,048,576 input tokens and 65,536 output tokens in a single request. That lets you keep long documents, codebases, or multi-turn chats in one coherent context.

Token Limits

Tools, Grounding, and Reasoning

Use thinking and structured outputs with function calling, code execution, file search, search grounding, and URL context. Batch API and caching are supported for scale and cost control.

Supported Features

Why Use EvoLink for Gemini 3 Flash Preview

Run gemini-3-flash-preview via OpenAI SDK format or Google Native API format with official Gemini capabilities and pricing.

One Integration, Two Formats

Call Gemini 3 Flash Preview in OpenAI SDK or native Gemini format without changing app logic.

Batch + Caching Savings

Use batch processing and context caching to lower repeat costs while scaling high-volume workloads safely.

Ready for Production Use

Multimodal inputs, long context, and tool support cover real production assistants, analysis, and automation workflows.

How to Call Gemini 3 Flash Preview

Choose OpenAI SDK or Google Native API format, then send your request.

Step 1 - Choose API Format

OpenAI SDK format: POST /v1/chat/completions with model "gemini-3-flash-preview". Native API format: POST /v1beta/models/gemini-3-flash-preview:{method} with method generateContent or streamGenerateContent.

Step 2 - Add Auth and Inputs

Include Authorization: Bearer <token>. Send messages/contents with text or multimodal parts (image, video, audio, PDF).

Step 3 - Stream or Scale

Enable streaming for real-time UX, or use X-Async-Mode to return a task ID. Combine batch and caching for cost-efficient high-volume runs.

Open Developer Playground

Technical Specs

Official model capabilities for gemini-3-flash-preview

Context

1,048,576 Input Tokens

Up to 1,048,576 input tokens and 65,536 output tokens.

Multimodal

Multimodal Inputs

Text, image, video, audio, and PDF inputs with text output.

Reasoning

Thinking + Structured Outputs

Thinking and structured outputs are supported for reliable, machine-readable results.

Tools

Function Calling + Tools

Function calling, code execution, and file search are supported.

Scale

Caching + Batch

Context caching and Batch API are supported for repeated or large-scale workloads.

Grounding

Search Grounding + URL Context

Search grounding and URL context are supported (Google Maps grounding is not).

Gemini 3 Flash Preview API FAQs

Everything you need to know about the product and billing.

Gemini 3 Flash is a balanced model built for speed, scale, and strong reasoning. It is designed for everyday tasks, agentic coding, and multimodal, long‑context understanding, making it a practical default for production workloads.

The official preview model name is "gemini-3-flash-preview". Use this exact identifier in requests.

Gemini 3 Flash Preview supports text, image, video, audio, and PDF inputs, and returns text output. This enables mixed‑media summarization, extraction, and question answering in a single workflow.

It supports up to 1,048,576 input tokens and 65,536 output tokens, giving a large context window for long documents, codebases, or multi‑turn sessions.

It supports function calling, structured outputs, code execution, file search, thinking, context caching, and Batch API. Search grounding and URL context are supported, along with multimodal function responses and code execution with images.

Image generation, audio generation, and the Live API are not supported. Grounding with Google Maps is also not supported for this model.

The latest update is listed as December 2025, and the knowledge cutoff is January 2025.

EvoLink supports OpenAI SDK format (POST /v1/chat/completions) and Google Native API format (POST /v1beta/models/gemini-3-flash-preview:{method}) using generateContent or streamGenerateContent. Add Authorization: Bearer <token> in the request header.