Gemini 3 Flash Preview API

Google's fastest frontier model with 3x speed advantage. Features native audio input, configurable thinking levels, and world-class agentic capabilities at less than 25% of Pro pricing.

Playground Not Available

This feature is currently only available for selected image and video generation models.

Gemini 3 Flash Preview API - Speed Meets Intelligence

Deploy gemini-3-flash-preview with configurable reasoning and native audio support. Achieve 78% SWE-bench performance while running 3x faster than alternatives via EvoLink.

Gemini 3 Flash Preview

Capabilities of Gemini 3 Flash Preview API

Blazing Fast Inference

3x faster than previous models while maintaining frontier-class intelligence.

Speed benchmark

Native Audio Input

Process audio recordings directly without transcription middleware. Analyze meetings, podcasts, and lectures.

Audio processing

Configurable Thinking Levels

Balance speed and reasoning depth with adjustable thinking levels from minimal to high.

Thinking levels

Why Integrate Gemini 3 Flash via EvoLink

Get the fastest frontier AI model at a fraction of the cost. We optimize routing and caching to deliver maximum value for your AI workloads.

Unmatched Speed

3x faster inference than alternatives, perfect for real-time applications and user-facing products.

Best-in-Class Agentic Performance

78% on SWE-bench Verified - the highest score for agentic coding tasks among all models.

Cost Efficiency

Less than 25% of Gemini 3 Pro pricing while maintaining frontier performance. $0.50/$3 per 1M tokens.

How to Use Gemini 3 Flash Preview API

Configure thinking levels, process audio, and deploy via EvoLink.

1

Step 1 - Configure Model

Select 'gemini-3-flash-preview' and set `thinking_level` based on task complexity: 'minimal' for speed, 'high' for complex reasoning.

2

Step 2 - Process Inputs

Send text, images, video, PDFs, or audio files directly. No transcription needed for audio - the model handles it natively.

3

Step 3 - Deploy & Scale

Route through EvoLink for automatic caching and load balancing. Save up to 20% with our optimized pricing.

Technical Specs

Advanced features of the Gemini 3 Flash Preview API

Context

1M Token Window

Process entire codebases, long documents, or hours of audio in a single request.

Reasoning

Thinking Levels

Configurable reasoning depth: minimal, low, medium, high. Balance speed vs accuracy per request.

Multimodal

Native Audio

Process audio input at $1/1M tokens. Upload recordings and get intelligent analysis.

Performance

78% SWE-bench

Best-in-class agentic coding performance. Outperforms even Gemini 3 Pro on this benchmark.

Intelligence

90.4% GPQA Diamond

PhD-level reasoning on graduate-level science questions.

Cost

Context Caching

Cache Write/Hit at $0.05/1M tokens. Dramatically reduce costs for repeated contexts.

Gemini 3 Flash vs Competitors

Speed meets intelligence at the right price

ModelDurationResolutionPriceStrength
Gemini 3 Flash PreviewN/AConfigurable Thinking$0.50/$3 (1M tokens)3x faster, 78% SWE-bench, native audio, <25% Pro cost.
Gemini 3 Pro PreviewN/ADeep Thinking Mode$2/$12 (1M tokens)Maximum reasoning depth, Thought Signatures for agents.
Claude Sonnet 4.5N/AExtended Thinking$3/$15Strong coding, detailed responses, hybrid reasoning.

Gemini 3 Flash API FAQs

Everything you need to know about the product and billing.

Input tokens cost $0.50/1M, output tokens cost $3/1M, and audio input costs $1/1M. Context caching (write/hit) is just $0.05/1M tokens. This is less than 25% of Gemini 3 Pro pricing.
Flash is 3x faster and costs 75% less. It actually outperforms Pro on agentic coding (78% vs lower on SWE-bench). Use Flash for speed-critical applications; Pro for complex reasoning that needs maximum depth.
You can set thinking_level to 'minimal', 'low', 'medium', or 'high'. Minimal is fastest with basic reasoning, high provides deepest analysis but takes longer. Choose based on task complexity.
Upload audio files directly to the API - no transcription step needed. The model natively processes audio and can analyze content, detect knowledge gaps, create quizzes, and more.
1,048,576 tokens (approximately 1M). This allows processing of very long documents, entire codebases, or hours of audio/video content in a single request.