Seed Audio 1.0 API
Price: $0.0012(~ 0.08 credits) per second
Highest stability with guaranteed 99.9% uptime. Recommended for production environments.
Use the same API endpoint for all versions. Only the model parameter differs.
Reference type. Reference Audio and Reference Image are mutually exclusive.
Click Generate to see preview
History
Max 20 items0 running · 0 completed
Seed Audio 1.0 API for AI Audio Generation
Build creator tools, voice agents, audio drama workflows, and short-video production features with Doubao Seed Audio 1.0 through EvoLink's unified API gateway.

Pricing
| Model | Mode | Price |
|---|---|---|
| Doubao Seed Audio 1.0 | Audio Generation (per second) | $0.0012/ second(0.08 Credits) |
If it's down, we automatically use the next cheapest available—ensuring 99.9% uptime at the best possible price.
What Can You Build with Seed Audio 1.0?
Creator Tools and Audio Workflows
Seed Audio 1.0 is prompt-based AI audio generation, not just text-to-speech. Generate narration, voiceover, and sound design from a single prompt, and use reference audio to keep a consistent voice across an entire production. Ideal for podcast tooling, audiobook pipelines, and short-video content workflows where speech, music, and ambience need to be produced together.

Voice Agents and AI Companions
Give voice agents, assistants, and AI companions an expressive, controllable voice. Adjust speed, pitch, and volume to match each interaction, and pass reference audio to anchor a recurring character voice. Output streams back through the same EvoLink gateway you already use for other models, so you manage usage and cost from one place.

Audio Drama, Games, and Interactive Stories
Compose multi-character dialogue, emotion, and non-verbal expression directly in the prompt to drive audio drama, game scenes, and interactive narratives. Long-form consistency makes it suitable for audiobooks, audio dramas, and episodic content where the same characters must sound consistent across many generations.

Why Use Seed Audio 1.0 via EvoLink?
Seed Audio 1.0 is already live on EvoLink, so you can integrate a new audio model early through one unified gateway.
Fast Model Adoption
Seed Audio 1.0 is live on EvoLink today. Use model ID doubao-seed-audio-1-0 with your existing EvoLink API key to start integrating a new AI audio generation model early — no separate account, contract, or onboarding for a single provider.
Cost Visibility by Output Duration
Seed Audio 1.0 is billed by generated audio duration, charged per second of output. That makes batch workloads easy to estimate before you run them. Check the EvoLink console for the latest unit price, and monitor real usage from the same dashboard as your other models.
Unified Gateway for Audio Models
Access Seed Audio 1.0 alongside other audio models through one EvoLink API. Compare options, manage keys and usage in one place, and route or fall back across models without rewiring your integration for each provider.
How to Integrate Seed Audio 1.0
Three steps to call Doubao Seed Audio 1.0 through EvoLink.

Create an EvoLink API Key
Sign up on EvoLink and generate an API key from the console. The same key gives you access to Seed Audio 1.0 and the other models on the gateway, and lets you set usage limits and monitor consumption from one dashboard.
Use Model ID doubao-seed-audio-1-0
Point your request at model ID doubao-seed-audio-1-0. Provide your text prompt (up to 1.5k characters) and optional reference audio, then set output options such as format, sample rate, speed, pitch, and volume.
Submit an Async Task and Retrieve Audio
Seed Audio 1.0 uses an asynchronous task model: submit the generation request, receive a task ID, then poll the task status endpoint to retrieve the finished audio (up to 120s). Stream, download, or embed the result directly in your product.
Capabilities and Limits
The concrete facts you need before integrating Seed Audio 1.0.
Prompt-Based Audio Generation
Seed Audio 1.0 generates audio from a prompt, optionally guided by reference audio. It goes beyond plain TTS: multi-character dialogue, emotion, and non-verbal expression can be written directly into the prompt.
Reference Audio Support
Provide up to 3 reference audio clips per request, each no longer than 30 seconds, via base64 or URL, to guide timbre and delivery. Reference image and reference audio cannot be supplied in the same request.
Output Limit up to 120s
Each request synthesizes up to 120 seconds of audio. Text input is capped at 1.5k characters, which is convenient for batching long-form content into segments.
Flexible Output Formats
Export audio as wav (default), mp3, pcm, or ogg_opus, so you can match your downstream pipeline without extra transcoding. Explicit and implicit watermarking are supported.
Selectable Sample Rates
Choose 48K, 24K (default), 16K, or 8K sample rates to balance fidelity and file size for web delivery, production, or real-time processing.
Languages and Delivery Controls
Supports Chinese and English, with mainstream domestic accent delivery (pure dialects are not supported). Adjust speed, pitch, and volume per request. SSML is not supported.
Frequently Asked Questions about Seed Audio 1.0
Everything you need to know about the product and billing.
API Reference
Select endpoint
Authentication
All APIs require Bearer Token authentication.
Authorization:
Bearer YOUR_API_KEY/v1/audios/generationsGenerate Audio
Create an audio generation task from a text prompt, optionally guided by reference voices or a reference image.
Asynchronous processing mode, use the returned task ID to .
Result audio URLs are CDN-hosted and long-lived. Billed per output second (up to 120s).
Three Generation Modes
prompt — generate audio directly from the prompt.prompt + audio_references — reference a voice ID or reference audio. Use @音频N in the prompt to reference the N-th item.prompt + image_urls — generate audio guided by a reference image.⚠️ audio_references and image_urls are mutually exclusive — use one or the other.
Request Parameters
modelstringRequiredDefault: doubao-seed-audio-1-0Audio generation model name.
| Value | Description |
|---|---|
| doubao-seed-audio-1-0 | Doubao Seed Audio 1.0 multimodal audio generation |
doubao-seed-audio-1-0promptstringRequiredThe text content to synthesize, or a prompt describing the audio. Use @音频N to reference the N-th item of audio_references.
Notes
- Limited to 1.5k characters
@音频1 Hi there! @音频2 How's your day going?audio_referencesarrayOptionalReference voices. Each item is a voice ID or a reference audio URL (items starting with 'http' are treated as URLs, otherwise as voice IDs). Order maps to @音频1 / @音频2 in the prompt.
Notes
- Up to 3 items; mutually exclusive with image_urls
- Voice IDs look like 'zh_female_xxx'
- Reference audio: each ≤ 30s / ≤ 10MB, wav/mp3/pcm/ogg_opus
["zh_female_example_id", "https://your-bucket.com/ref-voice.mp3"]See Preset Voice IDs in the left sidebar for curated voices and the full catalog link.
image_urlsarrayOptionalReference image URL to drive audio generation.
Notes
- Currently at most 1 image; mutually exclusive with audio_references
- ≤ 10MB, jpeg/png/webp
["https://your-bucket.com/scene.jpg"]speech_ratenumberOptionalDefault: 1.0Speech speed multiplier.
Notes
- Range: 0.5 to 2.0 (1.0 = normal, 2.0 = double speed, 0.5 = half speed)
- Accepts two decimals
1.2loudness_ratenumberOptionalDefault: 1.0Loudness multiplier.
Notes
- Range: 0.5 to 2.0 (1.0 = normal)
- Accepts two decimals
1.0pitch_rateintegerOptionalDefault: 0Pitch adjustment in semitones.
Notes
- Range: -12 to 12 (0 = no change)
0formatstringOptionalDefault: wavOutput audio format.
| Value | Description |
|---|---|
| wav | WAV |
| mp3 | MP3 |
| pcm | PCM |
| ogg_opus | OGG Opus |
mp3sample_rateintegerOptionalDefault: 24000Output sample rate in Hz.
| Value | Description |
|---|---|
| 8000 | 8 kHz |
| 16000 | 16 kHz |
| 24000 | 24 kHz |
| 48000 | 48 kHz |
24000callback_urlstringOptionalHTTPS callback address after task completion.
Notes
- Triggered on completion, failure, or cancellation
- Sent after billing confirmation
- HTTPS only, no internal IPs
- Max length: 2048 chars
https://your-domain.com/webhooks/audio-task-completedRequest Example — Text-to-Speech
Request Example — Voice Cloning (multi-voice)
Response Example
Submit (task created):
Query (completed):