Kling O3 API

Name: EvoLink AI Model API Platform
Brand: EvoLink
Availability: InStock

Kling O3 (kling-v3-omni) is Kling's latest video model family for text-to-video, image-to-video, reference-to-video, and video editing. Through EvoLink, teams can test modes online, route requests through one unified API, and ship 3-15 second video workflows with predictable per-second billing.

Model Type:

✓Kling O3 Text to Video Kling O3 Image to Video Kling O3 Reference to Video Kling O3 Video Edit Custom Element

Price: $0.079 - 0.397(~ 5.4 - 27 credits) per second of video

Highest stability with guaranteed 99.9% uptime. Recommended for production environments.

Use the same API endpoint for all versions. Only the model parameter differs.

Prompt*

105 (suggested: 2,000)

Aspect Ratio

Duration5s

3s15s

Quality

Sound

0:00 / 0:00

Audio

History

Max 20 items

0 running · 0 completed

Your generation history will appear here

Billing Rules

•Price shown is per second
•Duration range: 3-15 seconds
•Total = price/second × duration

Pricing

Model	Mode	Quality	Sound	Price
Kling O3 Text to Video	Video Generation	720p	Off	$0.079/ second(5.4 Credits)
Kling O3 Text to Video	Video Generation	720p	On	$0.106/ second(7.2036 Credits)
Kling O3 Text to Video	Video Generation	1080p	Off	$0.106/ second(7.2036 Credits)
Kling O3 Text to Video	Video Generation	1080p	On	$0.132/ second(9.0018 Credits)
Kling O3 Text to Video	Video Generation	4K	Off	$0.397/ second(27 Credits)
Kling O3 Text to Video	Video Generation	4K	On	$0.397/ second(27 Credits)

Kling O3 Text to Video

Video Generation

Quality:720p

Sound:Off

Price:

$0.079/ second

(5.4 Credits)

Kling O3 Text to Video

Video Generation

Quality:720p

Sound:On

Price:

$0.106/ second

(7.2036 Credits)

Kling O3 Text to Video

Video Generation

Quality:1080p

Sound:Off

Price:

$0.106/ second

(7.2036 Credits)

Kling O3 Text to Video

Video Generation

Quality:1080p

Sound:On

Price:

$0.132/ second

(9.0018 Credits)

Kling O3 Text to Video

Video Generation

Quality:4K

Sound:Off

Price:

$0.397/ second

(27 Credits)

Kling O3 Text to Video

Video Generation

Quality:4K

Sound:On

Price:

$0.397/ second

(27 Credits)

If it's down, we automatically use the next cheapest available—ensuring 99.9% uptime at the best possible price.

Kling O3 (3.0 Omni) API Pricing, Playground, and Integration

Access Kling O3 through EvoLink's unified API gateway. Run text-to-video, image-to-video, reference-to-video, and video editing workflows with one integration, online testing, and 3-15 second output support.

Kling O3 pricing starts at $0.075 per second on EvoLink, compared to $0.084 on the official Kling API. Access all four video modes — text-to-video, image-to-video, reference-to-video, and video editing — with free credits to start.

Hero showcase of Kling O3 video capabilities

Kling O3 overview and what changed from Kling 3.0

Kling O3 (Kling 3.0 Omni) is the most capable video model in the Kling AI family. It extends Kling 3.0 with reference-to-video and video editing — four modes total through a single API.

Choose O3 over standard Kling 3.0 when your workflow needs more than prompt-driven generation. Available on EvoLink at $0.075/s (vs $0.084 official) with free credits and playground access.

Kling O3 API video modes

Kling O3 Text-to-Video API

Generate videos directly from text prompts with Kling O3. Describe scenes, actions, and styles in natural language and let the model produce 3-15 second clips ready for marketing, social media, or creative projects.

Kling O3 Image-to-Video and Reference-to-Video API

Use images or reference videos to guide generation. Kling O3 supports image-to-video and reference-to-video modes, giving teams precise control over visual style, character consistency, and scene composition.

Kling O3 Video Editing API

Edit and transform existing footage with Kling O3's video editing mode. Apply style transfers, adjust scenes, and refine content without starting from scratch — ideal for iterating on commercial content at scale.

Why teams use Kling O3 through EvoLink

Kling O3 combines four production-ready video modes in one model family, while EvoLink gives teams unified access, predictable billing, and a faster integration path.

Four specialized modes

Text, image, reference, and editing modes cover the full video creation workflow.

Latest V3 Omni architecture

Built on Kling's newest generation for improved quality and consistency.

Flexible 3-15s output

Generate videos from 3 to 15 seconds with per-second billing.

How to integrate the Kling O3 API

Test a mode online, send an async request, and move approved outputs into production.

Choose your mode

Select text-to-video, image-to-video, reference-to-video, or video editing based on your workflow needs.

Submit a generation task

Send your request with prompts, images, or references. Track the async task until results are ready.

Review and iterate

Download results, compare variations, and reuse the same structure for fast iteration across campaigns.

View API Docs

Core capabilities of Kling O3

Four production-ready video modes through one unified API

Text

Text-to-video generation

Generate videos purely from text descriptions. Kling O3 interprets natural language prompts to produce dynamic video content without requiring any visual input.

Image

Image-to-video transformation

Transform static images into dynamic videos. Provide reference images and let Kling O3 animate them with natural motion and scene dynamics.

Reference

Reference video guidance

Use existing videos as references to guide new generation. This mode helps maintain visual consistency and style across multiple outputs.

Edit

AI video editing

Edit and transform existing footage with AI-powered tools. Apply style changes, scene adjustments, and creative transformations without manual editing.

Billing

Per-second billing

Pay only for what you generate with per-second billing. Videos range from 3 to 15 seconds, giving teams precise cost control for every project.

V3 Omni architecture

Built on Kling's latest V3 Omni foundation, delivering improved visual quality, better motion coherence, and more accurate prompt following.

Kling O3 API FAQ

Everything you need to know about the product and billing.

The Kling O3 API provides access to Kling's latest V3 Omni video model through EvoLink. It supports four modes: text-to-video, image-to-video, reference-to-video, and video editing. Each mode generates 3-15 second videos with per-second billing. Use your EvoLink dashboard for current pricing and availability.

Kling O3 offers four modes: text-to-video for generating from prompts, image-to-video for animating images, reference-to-video for style-guided generation using reference videos, and video editing for transforming existing footage. Each mode is optimized for different production workflows.

Kling O3 generates videos between 3 and 15 seconds. Billing is per-second within this range. Videos shorter than 3 seconds are billed at the 3-second minimum. This range is suitable for social media clips, ads, and short-form content.

Kling O3 pricing starts from base per-second rates and then applies mode-specific factors. Text-to-video and image-to-video use a 5.4 credits/second base rate: 720p with sound off = 1.0x, 720p with sound on = 1.334x, 1080p with sound off = 1.334x, 1080p with sound on = 1.667x, and 4K = 5.0x (sound surcharge does not apply at 4K). Reference-to-video and video editing use an 8.1 credits/second base rate, with 1080p billed at 1.334x the 720p rate and sound forced off (4K is not available in these modes). Check your EvoLink dashboard for your group's specific pricing.

Kling O3 is built on the newer V3 Omni architecture and adds text-to-video as a new mode. It also introduces reference-to-video for style-guided generation. The video duration range is 3-15 seconds compared to O1's varying ranges. O3 represents the latest generation with improved quality and consistency.

Start with a clear subject and describe the action, mood, and setting in simple terms. For image-to-video, provide high-quality reference images. For reference-to-video, use videos that match your desired style. Consistency improves when your prompt structure stays stable across runs.

Limits, pricing, and available modes are determined by your provider and region. Use your EvoLink dashboard and API responses as the source of truth. Check the API documentation for the most current constraints and parameters.

All Kling AI Models

EvoLink provides unified API access to the full Kling model family: All models share the same API key. Switch models with one parameter.

Explore Kling family View Kling 3.0 View Kling O1 View Motion Control

API Reference

Select endpoint

Authentication

All APIs require Bearer Token authentication.

Header

Authorization: 
Bearer YOUR_API_KEY

Get API Key

POST

/v1/videos/generations

Create Video

Kling O3 Text to Video (kling-o3-text-to-video) generates videos from text prompts using the V3 Omni model. Supports single-shot and multi-shot modes with optional sound effects.

Asynchronous processing mode, use the returned task ID to query status.

Generated video links are valid for 24 hours, please save them promptly.

Important Notes

Text-to-video mode: no image input required.
Video duration: 3-15 seconds, billed per second.
Pricing varies by quality and sound: 720p+off = 1.0x, 720p+on = 1.334x, 1080p+off = 1.334x, 1080p+on = 1.667x, 4k = 5.0x (sound surcharge does not apply at 4K).

Request Parameters

modelstringRequiredDefault: kling-o3-text-to-video

Video generation model name.

Examplekling-o3-text-to-video

promptstringRequired

Text prompt describing what kind of video to generate. When multi_shot=true and shot_type=customize, this can be empty (use multi_prompt instead).

Notes

Max 2500 characters
Reference elements using <<<element_1>>> syntax in the prompt

ExampleA golden retriever running through a sunlit meadow, cinematic slow motion.

durationintegerOptionalDefault: 5

Specifies the generated video duration in seconds.

Notes

Range: 3-15 seconds (integer)
Base price: 5.4 credits per second
Minimum billing: 3 seconds

Example5

aspect_ratiostringOptional

Video aspect ratio.

Value	Description
16:9	Landscape video
9:16	Portrait video
1:1	Square video

Example16:9

qualitystringOptionalDefault: 720p

Video resolution quality. Affects billing multiplier.

Value	Description
720p	Standard 720P (1.0x base)
1080p	High quality 1080P (1.334x base)
4k	Ultra HD 4K (5.0x base, sound surcharge does not apply)

Example720p

soundstringOptionalDefault: off

Sound effect control. Affects billing multiplier (no effect when quality=4k).

Value	Description
off	No sound effects (1.0x)
on	Generate sound effects (1.334x)

Notes

Combined multiplier: 720p+off=1.0x, 720p+on=1.334x, 1080p+off=1.334x, 1080p+on=1.667x, 4k=5.0x (sound has no effect)

Exampleoff

callback_urlstringOptional

HTTPS callback address after task completion.

Notes

Triggered on completion, failure, or cancellation
HTTPS only, no internal IPs
Max length: 2048 chars
Timeout: 10s, Max 3 retries

Examplehttps://your-domain.com/webhooks/video-task-completed

model_params.multi_shotbooleanOptionalDefault: false

Enable multi-shot mode for generating videos with multiple camera angles or scenes.

Notes

When enabled, prompt parameter will be ignored — use multi_prompt instead
Sum of all shot duration values must equal total video duration

Exampletrue

model_params.shot_typestringOptional

Shot type for multi-shot mode. Required when multi_shot is true.

Value	Description
customize	Custom per-shot prompts and durations

Notes

Only effective when multi_shot=true

Examplecustomize

model_params.multi_promptarrayOptional

Per-shot prompt array. Required when multi_shot=true and shot_type=customize. Each item defines a shot segment.

Notes

Format: [{index: number, prompt: string, duration: string}, ...]
Max 6 shots, each shot prompt max 512 characters
Sum of all shot durations must equal total video duration
When used, top-level prompt can be empty

Example

[{"index": 1, "prompt": "A person on a hilltop", "duration": "5"}, {"index": 2, "prompt": "Camera pulls back", "duration": "5"}]

model_params.element_listarrayOptional

Subject element list for consistent character appearance. Elements are created via kling-custom-element model.

Notes

Format: [{element_id: string}, ...]
Max 7 elements per request
element_id is obtained from kling-custom-element creation result
Ensures consistent character appearance across generated videos

Example[{"element_id": "123456"}]

model_params.watermark_infoobjectOptional

Watermark configuration for the generated video.

Notes

Format: {enabled: boolean}

Example{"enabled": false}

Request Example

{
  "model": "kling-o3-text-to-video",
  "prompt": "A golden retriever running through a sunlit meadow, cinematic slow motion.",
  "duration": 5,
  "aspect_ratio": "16:9",
  "quality": "720p",
  "sound": "off"
}

Multi-Shot Example

{
  "model": "kling-o3-text-to-video",
  "duration": 10,
  "aspect_ratio": "16:9",
  "quality": "1080p",
  "sound": "on",
  "model_params": {
    "multi_shot": true,
    "shot_type": "customize",
    "multi_prompt": [
      {"index": 1, "prompt": "A person standing on a hilltop watching sunrise", "duration": "5"},
      {"index": 2, "prompt": "Camera pulls back to reveal a vast mountain panorama", "duration": "5"}
    ]
  }
}

Response Example

{
  "created": 1757169743,
  "id": "task-unified-1757169743-o3t2v",
  "model": "kling-o3-text-to-video",
  "object": "video.generation.task",
  "progress": 0,
  "status": "pending",
  "task_info": {
    "can_cancel": true,
    "estimated_time": 180,
    "video_duration": 5
  },
  "type": "video",
  "usage": {
    "billing_rule": "per_second",
    "credits_reserved": 27.0,
    "user_group": "default"
  }
}