Tutorial

Wan 2.7 API Guide: Text-to-Video, Image-to-Video, Reference Video & Video Edit — Complete Integration Handbook

Name: EvoLink AI Model API Platform
Brand: EvoLink
Availability: InStock

EvoLink Team

Product Team

May 22, 2026

18 min read

This is the definitive Wan 2.7 API guide — covering all four video modes, every parameter that matters in production, code examples you can paste into a terminal, real cost math, error handling, and a migration path from Wan 2.6. It is written for developers and engineers who need to ship, not just experiment.

For the product overview and playground, visit the Wan 2.7 model page. For the family-level comparison, visit the Wan API family collection. For the pricing breakdown across the full Wan lineup, visit the Wan API pricing guide.

TL;DR

Wan 2.7 is four models in one endpoint. Text-to-video, image-to-video (with first/last frame control), multi-character reference video (with voice cloning), and instruction-based video editing — all through POST /v1/videos/generations.
Pricing on EvoLink: $0.086/sec at 720p, $0.144/sec at 1080p. A 10-second 720p clip costs $0.86. No subscriptions.
Model IDs: wan2.7-text-to-video, wan2.7-image-to-video, wan2.7-reference-video, wan2.7-video-edit.
Async workflow. Every request returns a task ID immediately. Poll GET /v1/tasks/\{task_id\} for status. Video URLs expire in 24 hours.
What Wan 2.7 adds over Wan 2.6 on EvoLink: Video editing through the Wan 2.7 route, first-and-last-frame control in I2V, and multi-character reference video with voice cloning.
Failed tasks are not billed for reference-video and video-edit modes.

Quick start: your first Wan 2.7 video in 60 seconds
Choose the right model ID
Mode 1: Text-to-video
Mode 2: Image-to-video with frame control
Mode 3: Reference video with voice cloning
Mode 4: Video editing
Pricing and cost math
Async workflow and task management
Error handling and common status codes
Production patterns and guardrails
Migration from Wan 2.6 to Wan 2.7
Parameter reference cheat sheet
FAQ

1. Quick start: your first Wan 2.7 video in 60 seconds

Prerequisites: An EvoLink account and an API key from the dashboard.

Step 1: Generate a video

curl -X POST https://api.evolink.ai/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wan2.7-text-to-video",
    "prompt": "A drone shot over a misty mountain lake at sunrise, slow camera push forward, cinematic color grading",
    "quality": "720p",
    "aspect_ratio": "16:9",
    "duration": 5
  }'

Response:

{
  "id": "task-unified-1757169743-docdemo0",
  "status": "pending",
  "created": 1757169743
}

Step 2: Poll for the result

curl https://api.evolink.ai/v1/tasks/task-unified-1757169743-docdemo0 \
  -H "Authorization: Bearer YOUR_API_KEY"

When status is "completed", the response includes a results array with the video URL. Download it within 24 hours — the link expires.

Step 3: That's it

You just generated a video for ~$0.43 (5 seconds × $0.086/sec). Change the model parameter to switch between the four modes below.

2. Choose the right model ID

Model ID	Mode	Best for	Duration
`wan2.7-text-to-video`	Text → Video	Ad creatives, social clips, script-first generation	2-15 sec
`wan2.7-image-to-video`	Image → Video	Product animations, storyboard-to-video, first/last frame control	2-15 sec
`wan2.7-reference-video`	Reference → Video	Brand spokesperson, multi-character series, voice cloning	2-15 sec (image-only refs), 2-10 sec (with video refs)
`wan2.7-video-edit`	Video → Edited Video	Style transfer, background swap, clothing change, colorization	2-10 sec

All four use the same endpoint: POST /v1/videos/generations. The model parameter is the only thing that changes.

3. Mode 1: Text-to-video

What it does

Generates a video from a text prompt. Supports optional driving audio for lip-sync or music-synced output. Auto-generates background music when no audio is provided.

Key parameters

Parameter	Required	Default	Description
`model`	Yes	—	`wan2.7-text-to-video`
`prompt`	Yes	—	Scene description, up to 5000 characters
`negative_prompt`	No	—	What to exclude, up to 500 characters
`audio_urls`	No	—	Array with 1 driving audio URL (wav/mp3, 2-30 sec, max 15MB)
`quality`	No	`720p`	`720p` or `1080p`
`aspect_ratio`	No	`16:9`	`16:9`, `9:16`, `1:1`, `4:3`, `3:4`
`duration`	No	`5`	2-15 seconds (integer)
`seed`	No	random	1-2147483647 for reproducible output
`prompt_extend`	No	`false`	LLM-powered prompt rewriting (set `true` for short prompts)
`callback_url`	No	—	HTTPS URL for task completion webhook

Multi-shot narrative

Control shot structure directly in the prompt:

{
  "model": "wan2.7-text-to-video",
  "prompt": "A tense detective story. Shot 1 [0-3s] wide angle: rainy night street, neon lights. Shot 2 [3-6s] medium: detective enters old building. Shot 3 [6-9s] close-up: detective's determined eyes. Shot 4 [9-12s] medium: cautious advance through dim corridor. Shot 5 [12-15s] close-up: discovers key clue.",
  "aspect_ratio": "16:9",
  "duration": 15
}

With driving audio

{
  "model": "wan2.7-text-to-video",
  "prompt": "A cartoon general in golden armor on a horse, reciting a classical poem",
  "audio_urls": ["https://your-cdn.com/recital.mp3"],
  "duration": 10
}

Audio truncation rules: if audio is longer than duration, only the first N seconds are used. If shorter, the remaining video portion is silent.

4. Mode 2: Image-to-video with frame control

What it does

Generates video from one or two keyframe images. This is the mode that gives you first-and-last-frame control — define both endpoints and the model infers the motion trajectory in between.

Three generation modes

`generation_mode`	Inputs	Use case
`first_frame`	`image_start` (+ optional `audio_urls`)	Animate a product photo or character illustration
`first_last_frame`	`image_start` + `image_end` (+ optional `audio_urls`)	Define start and end states, model fills the motion
`video_continuation`	`video_urls[0]` (+ optional `image_end`)	Extend an existing clip, optionally specify the ending frame

When generation_mode is omitted, the server infers it from the provided media.

Valid input combinations

image_start only
image_start + audio_urls
image_start + image_end
image_start + image_end + audio_urls
video_urls (continuation)
video_urls + image_end (continuation with ending frame)

Any other combination will be rejected.

Example: First-and-last-frame

curl -X POST https://api.evolink.ai/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wan2.7-image-to-video",
    "generation_mode": "first_last_frame",
    "prompt": "A product bottle rotating 360 degrees with soft studio lighting",
    "image_start": "https://your-cdn.com/bottle-front.jpg",
    "image_end": "https://your-cdn.com/bottle-back.jpg",
    "quality": "1080p",
    "duration": 5
  }'

Example: Video continuation

{
  "model": "wan2.7-image-to-video",
  "generation_mode": "video_continuation",
  "prompt": "The scene continues with the character walking toward the sunset",
  "video_urls": ["https://your-cdn.com/previous-clip.mp4"],
  "image_end": "https://your-cdn.com/sunset-ending.jpg",
  "duration": 5
}

5. Mode 3: Reference video with voice cloning

What it does

Generates new video scenes while preserving the appearance of characters from reference images or videos — and optionally cloning their voice from a short audio sample. This is how you build multi-character video series where each person looks and sounds consistent across episodes.

Key constraints

image_urls + video_urls combined: max 5 items total
image_start and voice audio do not count toward this 5-item limit
Duration: 2-15 sec (image-only references), 2-10 sec (when video references are included)
Billing: input video duration + output video duration. Failed tasks are free.

Character indexing in prompts

Reference characters by their position in the input arrays:

English: Image 1, Image 2, Video 1, Video 2
Chinese: 图1, 图2, 视频1, 视频2

Images and videos are counted independently — Image 1 and Video 1 can coexist.

Voice cloning: two methods

Method 1: voice_bindings (recommended)

Precise key-value mapping between character references and voice audio:

{
  "model": "wan2.7-reference-video",
  "prompt": "Image 1 holds Image 2 and says: 'What lovely sunshine today'",
  "image_urls": [
    "https://your-cdn.com/girl.jpg",
    "https://your-cdn.com/toy.png"
  ],
  "model_params": {
    "voice_bindings": {
      "image1": "https://your-cdn.com/girl-voice.mp3"
    }
  },
  "duration": 10
}

Method 2: audio_urls (legacy positional)

Audio clips aligned by position to image_urls / video_urls. Works but less explicit. Use voice_bindings for new integrations.

Example: Multi-character brand video

{
  "model": "wan2.7-reference-video",
  "prompt": "Image 1 and Image 2 are having a conversation in a modern office. Image 1 explains the product while Image 2 takes notes. The scene is professional and well-lit.",
  "image_urls": [
    "https://your-cdn.com/spokesperson-a.jpg",
    "https://your-cdn.com/spokesperson-b.jpg"
  ],
  "image_start": "https://your-cdn.com/office-wide-shot.jpg",
  "model_params": {
    "voice_bindings": {
      "image1": "https://your-cdn.com/voice-a.mp3",
      "image2": "https://your-cdn.com/voice-b.mp3"
    }
  },
  "quality": "1080p",
  "duration": 10
}

Multi-grid storyboard

For single-image references with multiple panels (e.g., a 3×3 grid of character poses):

{
  "model": "wan2.7-reference-video",
  "prompt": "Reference image. 3D cartoon style. 1. Wide shot of fantasy forest. 2. Boy parts the vines. 3. Robot scans ahead. 4. Close-up of map. 5. Boy's excited face. 6. They leap over roots.",
  "image_urls": ["https://your-cdn.com/storyboard-grid.png"],
  "duration": 15
}

6. Mode 4: Video editing

What it does

Takes an existing video and applies text-guided edits — style transfer, background replacement, clothing changes, colorization, old footage restoration — without re-generating from scratch. On EvoLink's current Wan routes, video editing is exposed through Wan 2.7.

Key parameters

Parameter	Required	Default	Description
`model`	Yes	—	`wan2.7-video-edit`
`prompt`	Yes	—	Natural language edit instruction
`video_urls`	Yes	—	Array with exactly 1 source video (mp4/mov, 2-10 sec)
`image_urls`	No	—	Up to 4 reference images for style/content guidance
`keep_original_sound`	No	`false`	`true` preserves original audio; `false` lets model handle audio
`duration`	No	`0`	`0` = keep original length; explicit values: 2-10 sec
`quality`	No	`720p`	`720p` or `1080p`

Billing: input video duration + output video duration. Failed tasks are free.

Example: Instruction-only style change

curl -X POST https://api.evolink.ai/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wan2.7-video-edit",
    "prompt": "Convert the entire scene to a vintage film look with warm color grading and film grain",
    "video_urls": ["https://your-cdn.com/source-clip.mp4"],
    "keep_original_sound": true,
    "duration": 0
  }'

Example: Reference-guided clothing replacement

{
  "model": "wan2.7-video-edit",
  "prompt": "Replace the girl's outfit with the clothes from the reference image",
  "video_urls": ["https://your-cdn.com/source.mp4"],
  "image_urls": ["https://your-cdn.com/target-outfit.png"]
}

What you can edit

Style transfer: "convert to anime style", "apply watercolor painting effect"
Background swap: "change background to a rain-soaked Tokyo street at night"
Object/clothing change: "swap the jacket to red", "replace the hat with a crown"
Colorization: "convert this black-and-white footage to color"
Lighting: "shift lighting to golden hour"

7. Pricing and cost math

EvoLink Wan 2.7 pricing

Quality	Cost per second	10-second clip
720p	$0.086	$0.86
1080p	$0.144 (1.67× of 720p)	$1.44

No subscriptions, no minimum commitments. You pay only for successfully generated video.

Cost comparison with other providers (as listed on provider pages, May 2026)

Provider	Per-second rate	10-sec 720p cost
EvoLink	$0.086/sec	$0.86
Together AI	$0.10/sec	$1.00
Segmind (720p clip)	~$0.063/sec (based on $0.625/10sec)	$0.625
Segmind (1080p clip)	~$0.094/sec (based on $0.9375/10sec)	$0.9375

Special billing for reference-video and video-edit

These two modes are billed on input video duration + output video duration. If you pass a 5-second reference video and generate a 10-second output, you're billed for 15 seconds. Failed tasks are not billed.

Budget estimation formula

Monthly cost = (avg_duration × cost_per_second × daily_volume × 30)

Example: 100 clips/day × 8 seconds × $0.086/sec × 30 days = $2,064/month at 720p.

8. Async workflow and task management

Every Wan 2.7 request follows the same async pattern:

POST /v1/videos/generations → returns task id + status "pending"
GET /v1/tasks/{task_id} → poll until status is "completed" or "failed"
Download video URL from results array within 24 hours

Task lifecycle

Status	Meaning
`pending`	Task accepted, waiting in queue
`processing`	Task is actively generating
`completed`	Video is ready, URL available in `results` array
`failed`	Generation failed (check error message)

Callback URL (webhook)

Instead of polling, provide a callback_url in your request. EvoLink will POST to this URL when the task completes, fails, or is cancelled. The callback fires after billing confirmation.

Production best practices

Persist the task ID immediately after submission. If your service crashes, you can recover.
Use exponential backoff when polling. Start at 5 seconds, cap at 30 seconds.
Download and archive results immediately. Video URLs expire in 24 hours.
Make submissions idempotent. Hash request payloads and deduplicate to prevent double-billing from retry storms.

9. Error handling and common status codes

HTTP Code	Error Code	Meaning	Action
400	`invalid_request`	Bad parameters	Check model ID, prompt length, duration range, media URLs
401	`unauthorized`	Invalid or expired token	Refresh your API key
402	`insufficient_quota`	Not enough credits	Top up your account
403	`model_access_denied`	Token lacks model access	Check API key permissions
429	`rate_limit_exceeded`	Too many requests	Back off and retry with exponential delay
500	`internal_error`	Server error	Retry after 30 seconds; if persistent, contact support

Common mistakes

Using the wrong model ID spelling. It's wan2.7-text-to-video, not wan-2.7-text-to-video or wan27-t2v. A stale model ID returns a clean 404 with no helpful error.
Sending invalid media combinations in I2V mode. Check the valid input combinations table.
Not downloading results in time. Video URLs expire in 24 hours. Build automatic download into your pipeline.

10. Production patterns and guardrails

Budget guardrails

1. Cap maximum duration server-side (e.g., 10 seconds for social content)
2. Default to 720p unless the use case specifically requires 1080p
3. Track spend by user, feature, and model ID
4. Separate reference-video budgeting (input+output billing) from T2V/I2V
5. Set per-user daily limits before scaling traffic

Reliability patterns

Retry with idempotency key. Hash your request payload and check for existing tasks before resubmitting.
Timeout handling. If a task hasn't completed after 5 minutes, mark it for manual review rather than resubmitting blindly.
Fallback strategy. Consider falling back to Wan 2.6 or Wan 2.5 if Wan 2.7 returns persistent errors on a specific mode.
Asset validation. Validate image dimensions, video duration, and audio format before submission. Bad assets cause failures that look like model quality issues.

Queue architecture

For production systems generating more than 100 videos/day:

User request → validation → job queue → Wan 2.7 API → result handler → CDN archive → notify user

Never call the API directly from user-facing request handlers. Always go through a background job system.

11. Migration from Wan 2.6 to Wan 2.7

What stays the same

API endpoint: POST /v1/videos/generations
Authentication: same API key and Bearer token
Async pattern: same task ID / polling / callback flow
EvoLink billing: same account and credit system

What changes

The IDs below are EvoLink route model IDs, not raw DashScope / Alibaba Cloud model names. If you use Alibaba's API directly, model names follow a different convention (e.g., wan2.7-t2v-2026-04-25).

Aspect	Wan 2.6	Wan 2.7
Model IDs	`wan2.6-text-to-video`, `wan2.6-image-to-video`, `wan2.6-reference-video`	`wan2.7-text-to-video`, `wan2.7-image-to-video`, `wan2.7-reference-video`, `wan2.7-video-edit`
I2V frame control	First frame only (`image_start`)	First AND last frame (`image_start` + `image_end`)
I2V generation modes	Implicit	Explicit `generation_mode` parameter (`first_frame`, `first_last_frame`, `video_continuation`)
Reference video	Single reference, no voice	Up to 5 refs, voice cloning via `voice_bindings`
Video editing	Not available	New: `wan2.7-video-edit`
Multi-shot T2V	Supported	Supported (same prompt syntax)

Step-by-step migration

Change model parameter. Replace wan2.6-text-to-video with wan2.7-text-to-video (same for other modes).
Test with existing prompts. Wan 2.7 handles the same prompt format. No rewriting needed.
Adopt new features gradually. Add generation_mode, image_end, voice_bindings, or video-edit as your workflow requires.
Keep Wan 2.6 as fallback. Both versions run in parallel on EvoLink. You don't have to migrate everything at once.

12. Parameter reference cheat sheet

Shared parameters (all modes)

Parameter	Type	Description
`model`	string	Required. One of the four model IDs
`prompt`	string	Required. Up to 5000 characters
`quality`	string	`720p` (default) or `1080p`
`callback_url`	string	HTTPS webhook for task completion

Text-to-video specific

Parameter	Type	Description
`negative_prompt`	string	Up to 500 characters
`audio_urls`	array	1 driving audio (wav/mp3, 2-30 sec, max 15MB)
`aspect_ratio`	string	`16:9`, `9:16`, `1:1`, `4:3`, `3:4`
`duration`	number	2-15 seconds
`seed`	integer	1-2147483647
`prompt_extend`	boolean	LLM prompt rewriting (default false)

Image-to-video specific

Parameter	Type	Description
`generation_mode`	string	`first_frame`, `first_last_frame`, `video_continuation`
`image_start`	string	First frame image URL
`image_end`	string	Last frame image URL
`video_urls`	array	Source video for continuation
`audio_urls`	array	Driving audio (not for video_continuation)
`duration`	number	2-15 seconds

Reference video specific

Parameter	Type	Description
`image_urls`	array	Reference images (counted toward 5-item limit)
`video_urls`	array	Reference videos (counted toward 5-item limit)
`image_start`	string	Starting frame (not counted toward limit)
`model_params.voice_bindings`	object	Map of reference key to voice audio URL
`audio_urls`	array	Legacy voice binding (positional)
`duration`	number	2-15 sec (image-only) or 2-10 sec (with video refs)

Video edit specific

Parameter	Type	Description
`video_urls`	array	Exactly 1 source video
`image_urls`	array	Up to 4 reference images
`keep_original_sound`	boolean	`true` preserves original audio
`duration`	number	`0` = original length; explicit: 2-10 sec

13. FAQ

How much does Wan 2.7 cost on EvoLink?

$0.086/sec at 720p, $0.144/sec at 1080p. A 10-second 720p clip costs $0.86. No subscriptions or minimum commitments.

What is the difference between Wan 2.7 and Wan 2.6?

On EvoLink, Wan 2.7 exposes video editing, multi-character reference video with voice cloning, and first-and-last-frame control in I2V mode. Wan 2.6 remains useful for cinematic storytelling and Flash variants for faster iteration. Both run in parallel on EvoLink.

Does Wan 2.7 generate audio automatically?

In text-to-video mode, yes — if you don't provide audio_urls, the model auto-generates background music or sound effects matching the visual content.

Are failed tasks billed?

For reference-video and video-edit modes, failed tasks are explicitly not billed. For text-to-video and image-to-video, billing is based on actual generated video duration.

Can I use Wan 2.7 for NSFW content?

No. The model will reject prompts that violate content policies. If your prompt is rejected, you'll receive an invalid_content error.

What audio formats are supported for voice cloning?

wav and mp3. Duration should be 1-10 seconds for voice cloning, 2-30 seconds for driving audio. Maximum file size is 15MB.

How do I handle video URL expiration?

Video URLs expire after 24 hours. Build an automatic download-and-archive step into your pipeline immediately after task completion. Store the final asset in your own CDN or object storage.

Can I migrate from Wan 2.6 without downtime?

Yes. Change the model parameter from wan2.6-* to wan2.7-*. The endpoint, authentication, and async pattern are identical. Both versions run in parallel, so you can migrate route by route.

Next steps

Try the playground: Wan 2.7 model page
Compare Wan models: Wan API family collection
Full pricing breakdown: Wan API pricing guide
Wan 2.6 production patterns: Wan 2.6 API guide
Wan 2.5 review: Wan 2.5 API review

All Posts

#Wan 2.7 #Alibaba Cloud #AI Video #Text-to-Video #Image-to-Video #Video Editing #Reference Video #API Guide

Wan 2.7 API Guide: Text-to-Video, Image-to-Video, Reference Video & Video Edit — Complete Integration Handbook

TL;DR

Table of contents

1. Quick start: your first Wan 2.7 video in 60 seconds

Step 1: Generate a video

Step 2: Poll for the result

Step 3: That's it

2. Choose the right model ID

3. Mode 1: Text-to-video

What it does

Key parameters

Multi-shot narrative

With driving audio

4. Mode 2: Image-to-video with frame control

What it does

Three generation modes

Valid input combinations

Example: First-and-last-frame

Example: Video continuation

5. Mode 3: Reference video with voice cloning

What it does

Key constraints

Character indexing in prompts

Voice cloning: two methods

Example: Multi-character brand video

Multi-grid storyboard

6. Mode 4: Video editing

What it does

Key parameters

Example: Instruction-only style change

Example: Reference-guided clothing replacement

What you can edit

7. Pricing and cost math

EvoLink Wan 2.7 pricing

Cost comparison with other providers (as listed on provider pages, May 2026)

Special billing for reference-video and video-edit

Budget estimation formula

8. Async workflow and task management

Task lifecycle

Callback URL (webhook)

Production best practices

9. Error handling and common status codes

Common mistakes

10. Production patterns and guardrails

Budget guardrails

Reliability patterns

Queue architecture

11. Migration from Wan 2.6 to Wan 2.7

What stays the same

What changes

Step-by-step migration

12. Parameter reference cheat sheet

Shared parameters (all modes)

Text-to-video specific

Image-to-video specific

Reference video specific

Video edit specific

13. FAQ

How much does Wan 2.7 cost on EvoLink?

What is the difference between Wan 2.7 and Wan 2.6?

Does Wan 2.7 generate audio automatically?

Are failed tasks billed?

Can I use Wan 2.7 for NSFW content?

What audio formats are supported for voice cloning?

How do I handle video URL expiration?

Can I migrate from Wan 2.6 without downtime?

Next steps

Related Articles

Wan 2.7 Video Edit API: Instruction-Based Video Editing for Production Workflows

Wan 2.6 API Production Guide: Async Jobs, Budget Guardrails, and Integration for Engineers

How to Use HappyHorse 1.1 API: Complete EvoLink Guide for Video Generation

Ready to Reduce Your AI Costs by 89%?