Gemini Omni Flash API
$1.275(~ 86.7 credits) per 1M input tokens; $14.875(~ 1011.5 credits) per 1M video output tokens
$7.650(~ 520.2 credits) per 1M other output tokens
Token-based billing. Actual cost follows the usage object returned by the API.
Highest stability with guaranteed 99.9% uptime. Recommended for production environments.
Use the same video endpoint for all modes. Only the model parameter differs.
Choose landscape, portrait, or Auto to let the provider select the output ratio.
Auto lets the provider decide the output duration (estimated as 10s). Choose 3-10s to send a fixed duration.
Click Generate to see preview
History
Max 20 items0 running · 0 completed
Gemini Omni Flash API on EvoLink
Use Gemini Omni Flash on EvoLink for text-to-video, image-to-video, reference-to-video, and video editing through one unified video API. Public discussion often frames Gemini Omni as a video counterpart to Nano Banana because it brings multimodal video creation and conversational editing into short-form workflows. On EvoLink, the practical value is API access: EvoLink model IDs, async task workflow, callback support, token-based usage visibility, and the same API key used for Veo, Seedance, Kling, and other video models.

Billing Rules
- •Gemini Omni Flash is billed by token usage. The task returns a credits_reserved estimate on creation and settles from the actual usage tokens once the task completes.
- •Text input: counted from the prompt tokens.
- •Video input: 5,792 tokens per second of input video.
- •Video output: 5,792 tokens per second of 720p video (audio included).
- •The output follows the input video, so video edit does not accept duration or aspect_ratio.
Pricing
| Model | Mode | Meter | Price |
|---|---|---|---|
| Text to Video | Output video | Video output tokens | $0.015/ 1K tokens(1.0115 Credits) |
| Text to Video | Input text / image / video | Input tokens | $0.0013/ 1K tokens(0.0867 Credits) |
| Text to Video | Thinking / text output | Other output tokens | $0.0077/ 1K tokens(0.5202 Credits) |
If it's down, we automatically use the next cheapest available—ensuring 99.9% uptime at the best possible price.
Figures are pre-bill estimates. Actual charges follow the upstream usage tokens returned by the model.
What can you build with Gemini Omni API?
Chat-Based Video Editing
Generate a clip with Gemini Omni, then refine it in conversation — "make the lighting warmer", "replace the red car". The workflow is designed for iterative edits while preserving the surrounding scene, subject identity, and motion as much as the selected route supports.

Object Replacement and Scene Rewrite
Swap an object in frame, remove an unwanted element, or rewrite a scene while preserving identity and motion. Useful for ad creative iteration and product variant rendering without external editing tools.

Reference Image Workflow
Pass a reference image and Gemini Omni anchors character identity, lighting, and color across the generated video. Combine with chat-based editing to refine specific shots without losing visual consistency.

Audio-Capable Video Generation
Gemini Omni Flash routes can return short video outputs with audio where supported by the selected mode, reducing the need to stitch a separate TTS or sound-design pipeline into first-pass generation.

How Gemini Omni Compares — All models on one EvoLink API key
Gemini Omni is most interesting for workflow rather than raw fidelity alone: multimodal inputs, conversational editing, and a practical EvoLink route for testing it beside Veo, Seedance, and Kling with one API key.
Chat-Native Editing Workflow
Gemini Omni is positioned around conversational video editing, while Veo 3.1 and Seedance 2.0 are usually evaluated first as generation routes. For multi-turn refinement, this is the workflow difference to test.
Long-Context Character Consistency
Gemini Omni is reported to benefit from Gemini context and world knowledge for continuity across multi-input and edit-heavy workflows. Treat this as a behavior to evaluate in your own storyboard or short-video pipeline.
No Google Cloud Project — Same Async Pattern as Veo and Seedance
No GCP setup, no Vertex billing, no separate region approval. If you already run video generation through EvoLink, adding Gemini Omni is a one-parameter change — same request shape, same task lifecycle as Veo 3.1, Seedance 2.0, and Kling.
Gemini Omni vs Veo 3.1 vs Seedance 2.0 — Side-by-side comparison
Three models commonly shortlisted for production video workflows in 2026. All three accessible through one EvoLink API key.
| Feature | Gemini Omni | Veo 3.1 | Seedance 2.0 |
|---|---|---|---|
| EvoLink price | Token-based | From $0.50/s | From $0.092/s |
| Quality | 720p | 720p / 1080p, 4K upscaling where available | 480p / 720p / 1080p |
| Native audio | Yes | Yes | Yes |
| Reference control | Text + image + chat edit | Text + image | Text + image + video + audio |
| Video length | 3-10s / Auto | Short clips with Extend for longer scenes where supported | 4–15s |
| Editing | Conversational editing workflow | Generation-first | V2V mode |
| Best for | Short-form editing and multi-input workflows | Cinematic baseline | Multimodal reference production |
How to Integrate Gemini Omni API
Three steps to your first Gemini Omni video task. Same integration pattern as Veo 3.1, Seedance 2.0, and Kling 3.0.
Step 1 — Get Your API Key
Sign up on EvoLink.ai and generate your API key from the dashboard. No Google Cloud project required.
Step 2 — Submit Generation Task
POST to /v1/videos/generations with one of the Gemini Omni Flash model names and your prompt. Add duration for 3-10 second or Auto generation modes, image_urls for image-to-video or reference-to-video, video_urls for video edit, and callback_url for completion notification. The API processes asynchronously and returns a task id.
Step 3 — Retrieve Video Result
Use the task ID to poll the status endpoint, or wait for the callback_url webhook. When status reaches completed, you receive a download URL for the generated MP4. Links are valid for 24 hours.
Gemini Omni API Capabilities
Technical specifications for production video workflows.
Chat-Based Video Editing
Multi-turn refinement in a conversational workflow, with scene continuity depending on the selected route and input quality.
720p, 3-10s / Auto Clips
720p output with configurable 3-10 second or Auto clips for generation modes. Auto is estimated as 10 seconds. Video edit accepts one MP4 input up to 10 seconds.
Text-to-Video and Image-to-Video
T2V from prompts and I2V with reference image input. Chat editing applies to outputs of either mode.
Audio-Capable Video Output
Short video outputs can include audio where supported by the selected Gemini Omni Flash route.
Long-Context Character Consistency
Designed for stronger continuity across multi-input and edit-heavy workflows; validate consistency on your own production prompts.
Async API with Task ID and Callback
Submit a task, receive an ID, poll status or configure a callback_url. Same lifecycle as other EvoLink video models.
Cost Example — Gemini Omni pricing estimates
100 × 3-10s/Auto clips for social media batch
Use current Pricing tab rates
1,000 × 3-10s/Auto clips/month at production scale
Use current Pricing tab rates
1 generation + 3 edits multi-turn workflow
Use current Pricing tab rates
Use the Pricing tab above for current token-based rates. Select the workflow by changing the model parameter.
Gemini Omni API Frequently Asked Questions
Everything you need to know about the product and billing.
All Gemini Video API Models
EvoLink provides unified access to Google's video and media model family through a single API key. All models share the same EvoLink API endpoint. Switch models with one parameter.
API Reference
Select endpoint
Authentication
All APIs require Bearer Token authentication.
Authorization:
Bearer YOUR_API_KEY/v1/videos/generationsCreate Gemini Omni Flash Video Task
Text to Video uses the unified EvoLink video generation endpoint. Select the mode by changing the model parameter.
Asynchronous processing returns a task ID. Use it to , or provide callback_url for completion notifications.
Generated outputs should be stored in your own system when result URLs are time-limited.
Request Parameters
modelstringRequiredDefault: gemini-omni-flash-text-to-videoGemini Omni Flash model name. Fixed to gemini-omni-flash-text-to-video for text-to-video generation.
gemini-omni-flash-text-to-videopromptstringRequiredNatural-language instruction describing the requested video.
Create a cinematic product video with smooth camera motion and natural audio ambienceaspect_ratiostringOptionalDefault: 16:9Output aspect ratio. Use auto to let the provider choose.
| Value | Description |
|---|---|
| 16:9 | Landscape video |
| 9:16 | Portrait video |
| auto | Let the provider choose the output ratio |
16:9durationinteger or stringOptionalDefault: 10 if omittedOutput video duration in seconds. The Playground sends auto by default.
| Value | Description |
|---|---|
| 3-10 | Any integer from 3 to 10 seconds. If omitted, the API default is 10 seconds. |
| auto | Let the provider decide the output duration. Playground sends auto by default and estimates it as 10 seconds. |
Notes
- Use auto to let the model decide the duration; reservations estimate auto as 10 seconds
- Affects the estimated reservation; completed tasks are billed from API usage tokens
autocallback_urlstringOptionalOptional HTTPS callback address after task completion.
Notes
- Use polling if no callback_url is provided
- Store outputs promptly when result URLs are time-limited
https://your-domain.com/webhooks/video-task-completedRequest Example
Response Example
Billing Rules
Gemini Omni Flash is billed by token usage. The task returns a credits_reserved estimate on creation and settles from the actual usage tokens once the task completes. Token counts per material:
- Text input — counted from the prompt tokens.
- Video output — 5,792 tokens per second of 720p video (audio included).
- Duration only affects the reservation estimate; Auto is estimated as 10 seconds.