Kling O3 API
Kling O3 (V3 Omni) next-generation video model with text-to-video, image-to-video, reference-to-video, video editing, and custom element creation. Supports 3-15 second videos with per-second billing.
Upload a reference video to guide generation
Upload optional reference images
No sample available
Upload a video for editing (max 100MB)
Click to upload or drag and drop
Supported formats: MP4, MOV
Maximum file size: 100MB; Maximum files: 1
Upload reference images
Click to upload or drag and drop
Supported formats: JPG, JPEG, PNG, WEBP
Maximum file size: 10MB; Maximum files: 4
Click Generate to see preview
Historial
Máx. 20 elementos0 ejecutando · 0 completado
Billing Rules
- •Price shown is per second
- •Duration range: 3-10 seconds
- •Total = price/second × duration
- •Sound is forced off when video input is present
Pricing
| Model | Mode | Quality | Price |
|---|---|---|---|
| Kling O3 Reference to Video | Video Generation | 720p | Popular $0.1125/ second(8.1 Credits) |
| Kling O3 Reference to Video | Video Generation | 1080p | $0.1501/ second(10.8054 Credits) |
If it's down, we automatically use the next cheapest available—ensuring 99.9% uptime at the best possible price.
Kling O3 API for next-generation video creation
Build with the latest Kling V3 Omni model. Generate videos from text, images, or references, and edit existing footage — all through one unified API with 3-15 second output support.

What can you build with the Kling O3 API?
Text-to-video creation
Generate videos directly from text prompts with Kling O3. Describe scenes, actions, and styles in natural language and let the model produce 3-15 second clips ready for marketing, social media, or creative projects.

Image and reference-driven video
Use images or reference videos to guide generation. Kling O3 supports image-to-video and reference-to-video modes, giving teams precise control over visual style, character consistency, and scene composition.

AI-powered video editing
Edit and transform existing footage with Kling O3's video editing mode. Apply style transfers, adjust scenes, and refine content without starting from scratch — ideal for iterating on commercial content at scale.

Why teams choose Kling O3
Kling O3 brings the latest V3 Omni architecture with four specialized modes — text-to-video, image-to-video, reference-to-video, and video editing — in a single model family.
Four specialized modes
Text, image, reference, and editing modes cover the full video creation workflow.
Latest V3 Omni architecture
Built on Kling's newest generation for improved quality and consistency.
Flexible 3-15s output
Generate videos from 3 to 15 seconds with per-second billing.
How to integrate the Kling O3 API
From input to production-ready video in three steps.
Choose your mode
Select text-to-video, image-to-video, reference-to-video, or video editing based on your workflow needs.
Submit a generation task
Send your request with prompts, images, or references. Track the async task until results are ready.
Review and iterate
Download results, compare variations, and reuse the same structure for fast iteration across campaigns.
Core capabilities of the Kling O3 API
Next-generation video AI with four specialized modes
Text-to-video generation
Generate videos purely from text descriptions. Kling O3 interprets natural language prompts to produce dynamic video content without requiring any visual input.
Image-to-video transformation
Transform static images into dynamic videos. Provide reference images and let Kling O3 animate them with natural motion and scene dynamics.
Reference video guidance
Use existing videos as references to guide new generation. This mode helps maintain visual consistency and style across multiple outputs.
AI video editing
Edit and transform existing footage with AI-powered tools. Apply style changes, scene adjustments, and creative transformations without manual editing.
Per-second billing
Pay only for what you generate with per-second billing. Videos range from 3 to 15 seconds, giving teams precise cost control for every project.
V3 Omni architecture
Built on Kling's latest V3 Omni foundation, delivering improved visual quality, better motion coherence, and more accurate prompt following.
Frequently Asked Questions
Everything you need to know about the product and billing.
API Reference
Select endpoint
Authentication
All APIs require Bearer Token authentication.
Authorization:
Bearer YOUR_API_KEY/v1/videos/generationsCreate Video
Kling O3 Reference to Video (kling-o3-reference-to-video) generates videos guided by reference video style and motion features using the V3 Omni model. The reference video serves as a feature reference (not direct editing).
Asynchronous processing mode, use the returned task ID to query status.
Generated video links are valid for 24 hours, please save them promptly.
Important Notes
- A reference video is required (video_url, video_urls, or video).
- Max duration: 10 seconds (shorter than text/image-to-video's 15s).
- Sound is forced off when video input is present — sound parameter is ignored.
- Video format: MP4/MOV, ≤ 200MB, ≥ 3s, 720-2160px, 24-60fps. Max 1 video.
- With video input: images + subjects ≤ 4, no video-character subjects.
Request Parameters
modelstringRequiredDefault: kling-o3-reference-to-videoVideo generation model name.
kling-o3-reference-to-videopromptstringOptionalText prompt describing what kind of video to generate with reference guidance.
Notes
- Max 2500 characters
- Optional
Maintain the same motion style, switch to a snowy background.video_urlstringRequiredReference video URL. At least one of video_url, video_urls, or video must be provided.
Notes
- Priority: video_url and video_urls take the first video; video is lowest priority
- Format: MP4/MOV
- Max size: 200MB
- Duration: ≥ 3 seconds
- Resolution: 720-2160px width/height
- Frame rate: 24-60fps
- Max 1 video (multiple videos only use the first)
https://example.com/reference.mp4image_urlsarrayOptionalOptional reference image URLs for style/scene guidance.
Notes
- Optional, for style/scene/subject reference
- With video: images + subjects ≤ 4
["https://example.com/style.jpg"]keep_original_soundbooleanOptionalDefault: trueWhether to keep the original sound from the reference video.
| Value | Description |
|---|---|
| true | Preserve original audio |
| false | Discard original audio |
truedurationintegerOptionalDefault: 5Specifies the generated video duration in seconds.
Notes
- Range: 3-10 seconds (shorter than text/image-to-video's 15s)
- Base price: 8.1 credits per second
- Minimum billing: 3 seconds
5aspect_ratiostringOptionalVideo aspect ratio.
| Value | Description |
|---|---|
| 16:9 | Landscape video |
| 9:16 | Portrait video |
| 1:1 | Square video |
16:9qualitystringOptionalDefault: 720pVideo resolution quality. Affects billing multiplier.
| Value | Description |
|---|---|
| 720p | Standard 720P (1.0x base) |
| 1080p | High quality 1080P (1.334x base) |
Notes
- Sound forced off — only quality affects the multiplier
720pcallback_urlstringOptionalHTTPS callback address after task completion.
Notes
- Triggered on completion, failure, or cancellation
- HTTPS only, no internal IPs
- Max length: 2048 chars
- Timeout: 10s, Max 3 retries
https://your-domain.com/webhooks/video-task-completedmodel_params.multi_shotbooleanOptionalDefault: falseEnable multi-shot mode for generating videos with multiple camera angles or scenes.
Notes
- When enabled, shot_type and multi_prompt become relevant
truemodel_params.shot_typestringOptionalShot type for multi-shot mode. Required when multi_shot is true.
| Value | Description |
|---|---|
| customize | Custom per-shot prompts and durations |
Notes
- Only effective when multi_shot=true
customizemodel_params.multi_promptarrayOptionalPer-shot prompt array. Required when multi_shot=true and shot_type=customize. Each item defines a shot segment.
Notes
- Format: [{index: number, prompt: string, duration: string}, ...]
- Max 6 shots
- Total duration of all shots should match the requested duration
[{"index": 1, "prompt": "Scene one", "duration": "3"}, {"index": 2, "prompt": "Scene two", "duration": "5"}]model_params.element_listarrayOptionalSubject library list for referencing pre-trained subjects in the video.
Notes
- Format: [{element_id: long}, ...]
- No video-character subjects supported
- With video: images + subjects ≤ 4
- Reference subjects in prompt using <<<element_N>>> placeholder
[{"element_id": 789012}]model_params.watermark_infoobjectOptionalWatermark configuration for the generated video.
Notes
- Format: {enabled: boolean}
{"enabled": false}