
Gemini 3.5 Flash Pricing Guide: Token Costs, Workload Examples, and Production Budgeting

$1.50/$9.00 per 1M tokens, it sits between budget options like Gemini 3 Flash Preview and premium models like Gemini 3.1 Pro. This guide breaks down every pricing dimension and shows what real production workloads actually cost.TL;DR
- Input: $1.50 per 1M tokens
- Output: $9.00 per 1M tokens
- Cache hit: $0.15 per 1M tokens (90% savings on cached input)
- Audio/Video input: $1.50 per 1M tokens (same as text)
- Context caching, Batch API, and Google Search grounding are supported
- The biggest cost driver is output tokens, not input — optimize output length first
Complete pricing table
| Token type | Price per 1M tokens | Notes |
|---|---|---|
| Text input | $1.50 | Standard text prompt tokens |
| Text output | $9.00 | Generated response tokens |
| Cache hit (input) | $0.15 | 90% discount vs standard input; storage costs $1.00/hour |
| Audio input | $1.50 | Processed audio tokens |
| Video input | $1.50 | Processed video frame tokens |
| Image input | $1.50 | Processed image tokens |
| PDF input | $1.50 | Processed document tokens |
Batch and Flex pricing
Google also offers discounted pricing for non-urgent workloads:
| Pricing tier | Input / 1M | Output / 1M | Use case |
|---|---|---|---|
| Standard | $1.50 | $9.00 | Real-time requests |
| Batch | $0.75 | $4.50 | Asynchronous bulk processing |
| Flex | $0.75 | $4.50 | Flexible delivery timing |
| Priority | $2.70 | $16.20 | Guaranteed low-latency |
Key observations
- Output tokens cost 6x more than input tokens. This is the single most important cost lever.
- Cache hits reduce input cost by 90% — but factor in
$1.00/hourcache storage cost. - Batch/Flex pricing halves both input and output costs for non-urgent workloads.
- All multimodal inputs (audio, video, image, PDF) are priced at the same rate as text input.
How Gemini 3.5 Flash compares on price
| Model | Input / 1M | Output / 1M | Cache hit / 1M | Context |
|---|---|---|---|---|
| Gemini 3.1 Flash Lite Preview | $0.25 | $1.50 | $0.025 | 1M |
| Gemini 3 Flash Preview | $0.50 | $3.00 | $0.05 | 1M |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.10 | 200K |
| Gemini 3.5 Flash | $1.50 | $9.00 | $0.15 | 1M |
| Gemini 3.1 Pro | $2.00 | $12.00 | — | 1M |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | 200K |
Workload cost examples
Example 1: Classification pipeline
High-volume classification with short prompts and short responses.
Daily volume: 100,000 requests
Average input: 500 tokens per request
Average output: 50 tokens per request
Daily input tokens: 50M
Daily output tokens: 5M
| Cost component | Calculation | Daily | Monthly |
|---|---|---|---|
| Input | 50M × $1.50/1M | $75.00 | $2,250 |
| Output | 5M × $9.00/1M | $45.00 | $1,350 |
| Total | $120.00 | $3,600 |
With context caching (80% of input tokens cached):
| Cost component | Calculation | Daily | Monthly |
|---|---|---|---|
| Input (uncached 20%) | 10M × $1.50/1M | $15.00 | $450 |
| Input (cached 80%) | 40M × $0.15/1M | $6.00 | $180 |
| Output | 5M × $9.00/1M | $45.00 | $1,350 |
| Total with caching | $66.00 | $1,980 |
Example 2: Coding agent
Agent workflow with moderate input (code context) and heavy output (generated code).
Daily volume: 5,000 agent sessions
Average input: 10,000 tokens per session
Average output: 3,000 tokens per session
Daily input tokens: 50M
Daily output tokens: 15M
| Cost component | Calculation | Daily | Monthly |
|---|---|---|---|
| Input | 50M × $1.50/1M | $75.00 | $2,250 |
| Output | 15M × $9.00/1M | $135.00 | $4,050 |
| Total | $210.00 | $6,300 |
Output dominates — 64% of total cost. Reducing average output length by 20% saves $1,260/month.
Example 3: Long-context document analysis
Processing large documents with summarization output.
Daily volume: 500 documents
Average input: 100,000 tokens per document
Average output: 2,000 tokens per document
Daily input tokens: 50M
Daily output tokens: 1M
| Cost component | Calculation | Daily | Monthly |
|---|---|---|---|
| Input | 50M × $1.50/1M | $75.00 | $2,250 |
| Output | 1M × $9.00/1M | $9.00 | $270 |
| Total | $84.00 | $2,520 |
For long-context, input-heavy workloads, context caching is critical. If 60% of document context is shared (common headers, templates, instructions):
Example 4: Multimodal pipeline (video + audio)
Processing video content with audio for content understanding.
Daily volume: 1,000 videos
Average video input: 20,000 tokens per video
Average audio input: 5,000 tokens per video
Average text input: 1,000 tokens per video
Average output: 500 tokens per video
Daily video tokens: 20M
Daily audio tokens: 5M
Daily text tokens: 1M
Daily output tokens: 500K
| Cost component | Calculation | Daily | Monthly |
|---|---|---|---|
| Video input | 20M × $1.50/1M | $30.00 | $900 |
| Audio input | 5M × $1.50/1M | $7.50 | $225 |
| Text input | 1M × $1.50/1M | $1.50 | $45 |
| Output | 0.5M × $9.00/1M | $4.50 | $135 |
| Total | $43.50 | $1,305 |
Multimodal pricing is straightforward — all input types share the same rate.
Cost optimization strategies
1. Use context caching aggressively
$0.15 vs $1.50 per 1M tokens), but note that Google charges $1.00/hour for cache storage. Caching is most cost-effective when cached content is reused frequently within each storage hour. Invest in caching for:- System prompts and instructions
- Few-shot examples
- Shared document context across requests
- Repeated tool definitions and schemas
2. Optimize output length
Output tokens cost 6x more than input. Strategies:
- Set
max_tokensto the minimum needed for your task - Use structured output schemas to constrain response format
- For classification, use enum-style outputs instead of explanations
- For extraction, return only the extracted fields
3. Use Batch API for non-urgent workloads
Batch API typically offers lower pricing for workloads that can tolerate higher latency. Use it for:
- Nightly data processing
- Bulk classification
- Document analysis pipelines
- Evaluation and testing
4. Route by workload tier
Not every request needs Gemini 3.5 Flash. Route simpler tasks to cheaper models:
| Workload complexity | Recommended model | Why |
|---|---|---|
| Simple classification | Gemini 3.1 Flash Lite Preview ($0.25/$1.50) | 6x cheaper input, 6x cheaper output |
| Standard extraction | Gemini 3 Flash Preview ($0.50/$3.00) | 3x cheaper, good enough for simple tasks |
| Agent sub-steps | Gemini 3.5 Flash ($1.50/$9.00) | GA stability, better reasoning |
| Complex reasoning | Gemini 3.1 Pro ($2.00/$12.00) | Higher quality for hard tasks |
5. Monitor cost per successful task, not just token cost
A cheaper model that requires 3 retries can cost more than a more expensive model that succeeds on the first attempt. Track:
- Token cost per request
- Retry rate
- Fallback rate
- Cost per successful task (including retries and fallbacks)
Hidden cost factors
Retries
If 10% of requests fail validation and require retrying, add 10% to your token budget. For agent workflows with multi-step chains, retry costs compound across steps.
Fallback to stronger models
If Gemini 3.5 Flash cannot handle 5% of requests and you fallback to Gemini 3.1 Pro, factor in the Pro-tier pricing for those requests.
Context growth in agent loops
Agent workflows often accumulate context across steps. A 5-step agent loop with growing context can use 2-3x more input tokens than the initial prompt. Budget for context growth, not just the first request.
Rate limit overhead
If you hit rate limits and need to queue or retry requests, the latency cost translates to engineering time and user experience impact — not just token spend.
FAQ
What is the cheapest way to use Gemini 3.5 Flash?
Enable context caching for repeated prompts, constrain output length with structured schemas, use Batch API for non-urgent work, and route simple tasks to cheaper Flash models.
Is Gemini 3.5 Flash cheaper than Claude Haiku 4.5?
No. Claude Haiku 4.5 is cheaper on both input ($1.00 vs $1.50) and output ($5.00 vs $9.00) per 1M tokens. But Gemini 3.5 Flash offers 1M context (vs 200K) and native multimodal inputs that Haiku does not support.
How much does context caching save?
Cache hits cost $0.15 per 1M tokens vs $1.50 for standard input — a 90% reduction. For workloads with shared system prompts or repeated context, caching can reduce total costs by 30-50%.
Is Gemini 3.5 Flash cheaper than Gemini 3.1 Pro?
Yes. Gemini 3.5 Flash is 25% cheaper on input ($1.50 vs $2.00) and 25% cheaper on output ($9.00 vs $12.00) compared to Gemini 3.1 Pro.
How do I estimate my monthly cost?
(daily input tokens × $1.50/1M) + (daily output tokens × $9.00/1M) × 30. Then subtract savings from context caching and add overhead for retries and fallbacks.Budget Your Gemini 3.5 Flash Workloads on EvoLink
EvoLink provides a unified API with usage monitoring and cost tracking across all Gemini models. Compare costs, set budget alerts, and route between Flash tiers from one integration.
Related reading:
- Gemini 3.5 Flash API — Product page with pricing, model ID, and playground
- Gemini 3.5 Flash vs Gemini 3 Flash Preview — Generation comparison with cost analysis
- Gemini 3.5 Flash vs Claude Haiku 4.5 — Cross-family cost comparison
- Gemini 3.5 Flash for Coding Agents — Agent cost analysis
Explore on EvoLink:
- Gemini 3.5 Flash API — $1.50/$9.00 per 1M tokens
- Gemini 3 Flash Preview API — $0.50/$3.00 per 1M tokens
- Gemini API Family — Compare all Gemini routes by price


