OmniHuman 1.5 API

Turn any face and voice into a film-grade talking avatar in minutes, ready for TikTok, Reels, Shorts, and in-app experiences.

Upload audio for lip-sync (max 35 seconds, MP3/WAV)

Upload a portrait image containing a human face

Price per second
12 Credits/s
Billed by audio duration (rounded up to seconds)
Sample Result

Upload audio file (MP3/WAV)

Click to upload or drag and drop

Supported formats: MP3, WAV
Maximum file size: 50MB; Duration: max 35s

Upload reference images

Click to upload or drag and drop

Supported formats: JPG, JPEG, PNG, WEBP
Maximum file size: 10MB; Maximum files: 10

0:00 / 0:00
Audio

History

Max 20 items

0 running · 0 completed

Your generation history will appear here
$

Pricing

Starting from
$0.167
12 Credits
per 1 second
Guaranteed
99.9% Uptime
Powered by 14 redundant providers

OmniHuman 1.5 API for realistic digital humans

Generate expressive, true lip-sync avatar videos from a single photo and audio track, and plug them directly into your social content or SaaS product.

example 1

What is OmniHuman 1.5 API

Film-grade talking avatar from one photo

OmniHuman 1.5 API lets you upload a single human photo and an audio track, then automatically produces a film-grade talking avatar video with natural expressions, gestures, and camera motion that match your script and brand tone. It removes the need for actors, studios, or repeated reshoots, so you can generate consistent digital human content for social media, landing pages, and in-product education while keeping your visual identity fully aligned across every post and channel.

example 2

Emotionally expressive digital humans for social feeds

OmniHuman 1.5 API focuses on performance, not just lip movement, so every video feels like a real person reacting to the message and mood of your audio. The model aligns body language, facial expressions, and timing with the rhythm and meaning of the speech, making your TikTok hooks sharper, your YouTube intros more engaging, and your Instagram Reels more bingeable without forcing you to appear on camera every single day.

example 3

Developer-friendly API for apps and SaaS

OmniHuman 1.5 API is designed for developers who want to add high-quality AI digital humans into products without building a video model from scratch. You can send images and audio through a simple API call, receive generated video files or links, and then embed them into onboarding flows, tutorial hubs, learning platforms, or creator tools, turning static interfaces into living, speaking experiences that feel premium and personalized for every end user.

example 4

Why choose OmniHuman 1.5 API

Pick OmniHuman 1.5 API when you care most about speaking performance, emotion, and on-camera trust.

Built for human-style talking content

Wan2.2-Animate is strong for broad character animation and motion-heavy scenes, but most social and product content still starts with a person talking to camera. OmniHuman 1.5 API is tuned for this use case, so you get stronger lip-sync, more believable eye contact, and emotions that match the script, which matters a lot for sales videos, tutorials, and brand announcements.

Faster path from script to post

With Wan2.2-Animate, you often need to think about reference videos, template motion, and creative camera moves, which is perfect for complex scenes but heavier for daily content. OmniHuman 1.5 API keeps the pipeline simple: write a script, record audio, send one photo and one file, then post the finished talking avatar clip, making it easier to publish consistently on TikTok, Reels, and Shorts.

More trust for brand and education use

When the goal is to build trust—explaining a feature, onboarding new users, or hosting a recurring show—a stable digital human that feels like a real host usually performs better than constantly changing animated characters. OmniHuman 1.5 API helps you lock in one avatar that audiences remember, turning it into a long-term brand asset instead of a one-off visual experiment.

How OmniHuman 1.5 API works in your workflow

Go from idea to ready-to-post digital human video in a few simple steps.

1

Prepare your avatar and script

Choose a clear portrait image for your digital human and record a clean audio track or voice-over that matches the message you want to deliver.

2

Send a request to OmniHuman 1.5 API

From your app, automation, or content tool, send the image and audio to OmniHuman 1.5 API through a simple API call with your preferred settings.

3

Receive, review, and publish your video

Download the generated talking avatar video, review the performance, then export or schedule it directly to TikTok, Reels, Shorts, or your product.

OmniHuman 1.5 API features

Focused on realistic talking avatars that are easy to scale.

Reusable avatar

Single photo, studio-style host

Turn one portrait into a reusable digital human who can deliver scripts again and again, so your content feels consistent without repeated photo or video shoots.

Realistic delivery

True lip-sync and emotion

Get mouth shapes, expressions, and pacing that follow your audio closely, so viewers feel like a real person is speaking directly to them, not a stiff animated mask.

Developer-ready

API-first for apps and SaaS

Call OmniHuman 1.5 API from your product, automation, or internal tools to generate talking avatar clips on-demand for onboarding, updates, and support flows.

Social-first

Optimized for social video

Create short, vertical videos tailored to TikTok, Reels, and Shorts so your digital human fits right into native feeds and keeps watch time high.

Branding

Consistent brand presence

Use the same avatar across ads, tutorials, and help content to build a recognizable face for your brand, even when different people write the scripts.

High throughput

Scales with your content calendar

Once your avatar and audio workflow are set up, you can batch-generate dozens of talking videos, freeing your team to focus on offers, hooks, and distribution.

OmniHuman 1.5 API vs Wan2.2-Animate

Pick the right engine for your avatar videos.

ModelDurationResolutionPriceStrength
OmniHuman 1.5 API30–90s talking avatar clips, ideal for explainers and UGC-style contentHigh-quality social-ready output focused on faces and upper bodyBest value when you mainly need digital human talking videosShines in realistic lip-sync, facial emotion, and human-style delivery for scripts, sales pitches, and tutorials.
Wan2.2-Animate Move5–10s image-to-video or video-to-video character motionHD clips optimized for dynamic motion and camera movesFlexible usage-based pricing depending on the hosting platformGreat for turning a static character into a fully moving figure by copying movements from a reference video or template.
Wan2.2-Animate Replace5–10s character replacement clipsHD output that preserves background and scene lightingBest for campaigns that need many creative variantsIdeal when you want to swap the main character in existing footage while keeping the same scene, camera motion, and mood.

OmniHuman 1.5 API FAQs

Everything you need to know about the product and billing.

OmniHuman 1.5 API is a developer-focused interface that turns a single human photo and audio track into a realistic talking avatar video. It is built for social media creators, marketers, SaaS founders, and product teams who want film-grade digital humans without complex production setups. If you create TikTok tutorials, product explainers, course content, or onboarding flows and need a consistent human-style presence, OmniHuman 1.5 API gives you that through simple API calls instead of cameras and studios.
To generate a video with OmniHuman 1.5 API, you typically need a clear portrait image of the person or character you want to animate and a clean audio file of the speech or message. Once you provide these through an API request, the system generates a talking avatar video that aligns lip movements, expressions, and gestures with your audio. Many users record short scripts specifically tailored for TikTok, Reels, Shorts, or in-app flows so that each output is ready to post or embed with minimal editing.
Many basic talking head tools only move the mouth and maybe tilt the head, which can look robotic and break trust with viewers. OmniHuman 1.5 API focuses on full performance, coordinating lip-sync, facial expressions, and body language with the emotional tone and timing of your voice. This makes jokes land better, serious moments feel more credible, and calls to action more persuasive. For brands and creators who care about quality and binge-worthy content, that emotional realism is a major advantage.
Yes, videos generated with OmniHuman 1.5 API can be adapted for all major social media platforms. Many users create vertical videos for TikTok, Instagram Reels, and YouTube Shorts, while also exporting horizontal versions for long-form YouTube, landing pages, and internal training. Because the avatar and performance are consistent across formats, you can repurpose the same message in multiple places and build a recognizable digital human that followers immediately associate with your brand or channel.
OmniHuman 1.5 API is a strong fit for education and support use cases where a human guide makes information easier to absorb. Course creators can turn lesson scripts into short avatar videos for each module, while SaaS teams can build libraries of talking walkthroughs that explain core features. Support teams can also create reusable answers from first-line questions, making users feel more supported without overwhelming agents. Because the avatar stays consistent, learners quickly get comfortable with the digital instructor or assistant.
OmniHuman 1.5 API is designed to slot into your current tools rather than replace them. You write scripts in your usual docs, record audio with your preferred tools, and then use the API to generate videos at scale. From there, you can push outputs into schedulers, editors, or automation stacks, just like any other asset. Over time, you can automate even more steps, such as generating daily talking avatar videos from newsletter content or product changelog notes, turning written updates into engaging visual stories.
POST
/v1/videos/generations

Create Digital Human Video

OmniHuman 1.5 (omnihuman-1.5) generates realistic digital human videos with audio-driven lip-sync.

Asynchronous processing mode, use the returned task ID to .

Generated video links are valid for 24 hours, please save them promptly.

Important Notes

  • Maximum audio duration is 35 seconds.
  • Billing is based on audio duration (rounded up to the nearest second).
  • Tasks cannot be cancelled once started.
  • Supported audio formats: MP3, WAV.

Request Parameters

modelstringRequiredDefault: omnihuman-1.5

Model name for digital human video generation.

Exampleomnihuman-1.5
audio_urlstringRequired

Audio URL for driving lip-sync and body movements.

Notes
  • Maximum duration: 35 seconds
  • Supported formats: MP3, WAV
  • URL must be directly accessible by the server
Examplehttps://example.com/audio.mp3
image_urlsstring[]Required

Reference image URL array containing the person to animate. OmniHuman uses only the first image.

Notes
  • Should contain a clear human figure
  • Max size: 10MB
  • Formats: .jpg, .jpeg, .png, .webp
  • URL must be directly accessible by the server
Examplehttps://example.com/person.jpg
mask_urlstringOptional

Mask image URL for specifying animation regions. White areas indicate regions to animate.

Notes
  • Optional - use with auto_mask=false for custom control
  • Same dimensions as input image recommended
Examplehttps://example.com/mask.png
subject_checkbooleanOptionalDefault: false

Enable subject detection to verify human presence in the image.

ValueDescription
trueVerify human subject exists
falseSkip subject verification
Exampletrue
auto_maskbooleanOptionalDefault: false

Enable automatic mask generation for the human subject.

ValueDescription
trueAuto-generate mask for animation
falseUse provided mask_url or full image
Exampletrue
pe_fast_modebooleanOptionalDefault: false

Enable fast processing mode for quicker generation.

ValueDescription
trueFaster generation (may reduce quality)
falseStandard quality generation
Examplefalse
seedintegerOptionalDefault: -1

Random seed for reproducible generation. Use -1 for random seed.

Notes
  • Range: -1 to 2147483647
  • Same seed produces consistent results
Example-1
promptstringOptional

Optional text prompt to guide the generation style.

ExampleA person speaking naturally with subtle expressions
callback_urlstringOptional

HTTPS callback address after task completion.

Notes
  • Triggered on completion or failure
  • HTTPS only, no internal IPs
  • Max length: 2048 chars
  • Timeout: 10s, Max 3 retries
Examplehttps://your-domain.com/webhooks/video-task-completed

Request Example

{
  "model": "omnihuman-1.5",
  "audio_url": "https://example.com/audio.mp3",
  "image_urls": ["https://example.com/person.jpg"],
  "subject_check": true,
  "auto_mask": true,
  "pe_fast_mode": false,
  "seed": -1,
  "callback_url": "https://your-domain.com/webhooks/callback"
}

Response Example

{
  "created": 1757169743,
  "id": "task-unified-1757169743-7cvnl5zw",
  "model": "omnihuman-1.5",
  "object": "video.generation.task",
  "progress": 0,
  "status": "pending",
  "task_info": {
    "can_cancel": false,
    "estimated_time": 120,
    "video_duration": 10
  },
  "type": "video",
  "usage": {
    "billing_rule": "per_second",
    "credits_reserved": 120,
    "user_group": "default"
  }
}