Seed Audio 1.0 API

通过 EvoLink 统一 API 网关接入豆包 Seed Audio 1.0，构建 AI 音频生成能力。模型 ID doubao-seed-audio-1-0，按秒计费，单次最长 120 秒。

模型类型:

✓Doubao Seed Audio 1.0

价格: $0.0012(~ 0.08 credits) per second

稳定性最高，保证 99.9% 可用性。推荐用于生产环境。

所有版本使用同一个 API 端点，仅 model 参数不同。

Prompt*

83 （建议：2,000）

Reference Mode

Reference type. Reference Audio and Reference Image are mutually exclusive.

Click Generate to see preview

History

最多保留20条

0 运行中 · 0 已完成

您的生成历史将显示在这里

Seed Audio 1.0 API：面向开发者的 AI 音频生成模型

Name: EvoLink AI Model API Platform
Brand: EvoLink
Availability: InStock

通过 EvoLink 统一 API 网关接入豆包 Seed Audio 1.0，为创作者工具、Voice Agent、有声剧、短视频和内容生产工作流构建音频生成能力。

Pricing

Model	Mode	Price
Doubao Seed Audio 1.0	Audio Generation (per second)	$0.0012/ second(0.08 Credits)

Doubao Seed Audio 1.0

Audio Generation (per second)

Price:

$0.0012/ second

(0.08 Credits)

If it's down, we automatically use the next cheapest available—ensuring 99.9% uptime at the best possible price.

用 Seed Audio 1.0 能构建什么？

创作者工具与音频工作流

Seed Audio 1.0 是基于 Prompt 的 AI 音频生成模型，而不仅仅是文本转语音。只需一段 Prompt 即可生成旁白、配音与音效设计，并用参考音频在整个作品中保持一致音色。非常适合播客工具、有声书流水线与短视频内容工作流——人声、音乐与环境音可以一并生成。

开始接入

Voice Agent 与 AI 陪伴

为 Voice Agent、助手与 AI 陪伴产品赋予富有表现力、可控的声音。可按每次交互调节语速、音调与音量，并传入参考音频以锚定固定角色音色。输出经由你已在使用的同一个 EvoLink 网关返回，用量与成本在一处统一管理。

有声剧、游戏与互动叙事

直接在 Prompt 中编排多角色对白、情绪与非语言表达，驱动有声剧、游戏场景与互动叙事。长时一致性使其适合有声书、有声剧与连载内容——同一批角色在多次生成中也能保持音色一致。

为什么通过 EvoLink 使用 Seed Audio 1.0？

Seed Audio 1.0 已在 EvoLink 上线，你可以通过一个统一网关提前接入这一新音频模型。

抢先接入新模型

Seed Audio 1.0 现已在 EvoLink 上线。用你已有的 EvoLink API Key 加上模型 ID doubao-seed-audio-1-0 即可开始接入这一新的 AI 音频生成模型——无需为单一服务商单独开户、签约或走接入流程。

按输出时长，成本可见

Seed Audio 1.0 依据生成音频时长、按输出秒数计费，便于在运行前估算批量任务成本。最新单价请以 EvoLink 控制台为准，实际用量与其他模型在同一面板监控。

音频模型统一网关

通过一个 EvoLink API 即可与其他音频模型一起访问 Seed Audio 1.0。在一处比较选型、管理 Key 与用量，并在模型间路由或做 fallback，无需为每个服务商重写接入。

如何接入 Seed Audio 1.0

三步即可通过 EvoLink 调用豆包 Seed Audio 1.0。

创建 EvoLink API Key

在 EvoLink 注册并在控制台生成 API Key。同一个 Key 即可访问 Seed Audio 1.0 与网关上的其他模型，并可设置用量上限、在一个面板监控消耗。

使用模型 ID doubao-seed-audio-1-0

将请求指向模型 ID doubao-seed-audio-1-0。传入文本 Prompt（最长 1.5k 字符）与可选的参考音频，并设置输出格式、采样率、语速、音调、音量等参数。

提交异步任务并获取音频

Seed Audio 1.0 采用异步任务模型：提交生成请求后获得任务 ID，再轮询任务状态端点取回完成的音频（最长 120 秒）。结果可直接在产品中播放、下载或嵌入。

能力与限制

接入 Seed Audio 1.0 前你需要了解的具体事实。

生成

基于 Prompt 的音频生成

Seed Audio 1.0 由 Prompt 生成音频，可选用参考音频引导。它超越普通 TTS：多角色对白、情绪与非语言表达都可直接写入 Prompt。

输入

支持参考音频

每次请求最多传入 3 段参考音频，单条不超过 30 秒，支持 base64 或 URL，用于引导音色与表达方式。参考图片与参考音频不可在同一次请求中同时传入。

限制

输出最长 120 秒

单次请求最多合成 120 秒音频。文本输入上限 1.5k 字符，便于将长篇内容分段批量处理。

格式

灵活的输出格式

支持导出 wav（默认）、mp3、pcm 或 ogg_opus，无需额外转码即可匹配下游流水线。支持显式与隐式水印。

音质

可选采样率

可选 48K、24K（默认）、16K 或 8K 采样率，在保真度与文件体积之间权衡，适配网页分发、生产或实时处理。

控制

语种与表达控制

支持中文与英文，并具备国内主流口音演绎能力（不支持纯方言）。可按请求调节语速、音调与音量。不支持 SSML。

关于 Seed Audio 1.0 的常见问题

Everything you need to know about the product and billing.

Seed Audio 1.0（豆包音频生成模型 1.0，Doubao-Seed-Audio 1.0）是字节跳动基于 Prompt 的 AI 音频生成模型。由文本 Prompt（可选参考音频引导）即可生成语音、多角色对白，以及带情绪和非语言表达的音频。它比传统文本转语音更宽广，面向 AI 音频生成场景设计。

已上线。Seed Audio 1.0 现已在 EvoLink 上线，可用你已有的 API Key 通过 EvoLink 统一 API 网关接入，与平台上的其他模型一起使用。

通过 EvoLink 调用 Seed Audio 1.0 时，请在请求中使用模型 ID doubao-seed-audio-1-0。

Seed Audio 1.0 依据生成音频时长、按输出秒数计费，便于估算批量任务成本。价格可能调整，规模化前请以 EvoLink 控制台与定价页的最新单价为准。

文本输入最长 1.5k 字符。可传入最多 3 段参考音频，单条不超过 30 秒，支持 base64 或 URL。单次请求最多合成 120 秒音频。输出格式为 wav（默认）、mp3、pcm 与 ogg_opus，采样率支持 48K、24K（默认）、16K、8K。参考图片与参考音频不可同时传入；其他限制可能有所不同，请以最新的 EvoLink 控制台与官方文档为准。

不是。它虽然可以从文本合成语音，但 Seed Audio 1.0 是基于 Prompt 的 AI 音频生成。你可以在 Prompt 中编排多角色对白、情绪与非语言表达，并用参考音频引导输出，这远超单一音色的文本转语音引擎。

不支持。SSML 暂不支持，表达通过 Prompt 指令与请求参数（如语速、音调、音量）来控制。

API Reference

Select endpoint

Authentication

All APIs require Bearer Token authentication.

Header

Authorization: 
Bearer YOUR_API_KEY

Get API Key

POST

/v1/audios/generations

Generate Audio

Create an audio generation task from a text prompt, optionally guided by reference voices or a reference image.

Asynchronous processing mode, use the returned task ID to .

Result audio URLs are CDN-hosted and long-lived. Billed per output second (up to 120s).

Three Generation Modes

Text-to-speechPass only prompt — generate audio directly from the prompt.

Voice cloningprompt + audio_references — reference a voice ID or reference audio. Use @音频N in the prompt to reference the N-th item.

Image-guidedprompt + image_urls — generate audio guided by a reference image.

⚠️ audio_references and image_urls are mutually exclusive — use one or the other.

Request Parameters

modelstringRequiredDefault: doubao-seed-audio-1-0

Audio generation model name.

Value	Description
doubao-seed-audio-1-0	Doubao Seed Audio 1.0 multimodal audio generation

Exampledoubao-seed-audio-1-0

promptstringRequired

The text content to synthesize, or a prompt describing the audio. Use @音频N to reference the N-th item of audio_references.

Notes

Limited to 1.5k characters

Example@音频1 Hi there! @音频2 How's your day going?

audio_referencesarrayOptional

Reference voices. Each item is a voice ID or a reference audio URL (items starting with 'http' are treated as URLs, otherwise as voice IDs). Order maps to @音频1 / @音频2 in the prompt.

Notes

Up to 3 items; mutually exclusive with image_urls
Voice IDs look like 'zh_female_xxx'
Reference audio: each ≤ 30s / ≤ 10MB, wav/mp3/pcm/ogg_opus

Example["zh_female_example_id", "https://your-bucket.com/ref-voice.mp3"]

See Preset Voice IDs in the left sidebar for curated voices and the full catalog link.

image_urlsarrayOptional

Reference image URL to drive audio generation.

Notes

Currently at most 1 image; mutually exclusive with audio_references
≤ 10MB, jpeg/png/webp

Example["https://your-bucket.com/scene.jpg"]

speech_ratenumberOptionalDefault: 1.0

Speech speed multiplier.

Notes

Range: 0.5 to 2.0 (1.0 = normal, 2.0 = double speed, 0.5 = half speed)
Accepts two decimals

Example1.2

loudness_ratenumberOptionalDefault: 1.0

Loudness multiplier.

Notes

Range: 0.5 to 2.0 (1.0 = normal)
Accepts two decimals

Example1.0

pitch_rateintegerOptionalDefault: 0

Pitch adjustment in semitones.

Notes

Range: -12 to 12 (0 = no change)

Example0

formatstringOptionalDefault: wav

Output audio format.

Value	Description
wav	WAV
mp3	MP3
pcm	PCM
ogg_opus	OGG Opus

Examplemp3

sample_rateintegerOptionalDefault: 24000

Output sample rate in Hz.

Value	Description
8000	8 kHz
16000	16 kHz
24000	24 kHz
48000	48 kHz

Example24000

callback_urlstringOptional

HTTPS callback address after task completion.

Notes

Triggered on completion, failure, or cancellation
Sent after billing confirmation
HTTPS only, no internal IPs
Max length: 2048 chars

Examplehttps://your-domain.com/webhooks/audio-task-completed

Request Example — Text-to-Speech

{
  "model": "doubao-seed-audio-1-0",
  "prompt": "欢迎使用语音合成服务，今天天气真不错。",
  "format": "mp3",
  "speech_rate": 1.2
}

Request Example — Voice Cloning (multi-voice)

{
  "model": "doubao-seed-audio-1-0",
  "prompt": "@音频1 Hi there! @音频2 How's your day going?",
  "audio_references": [
    "zh_female_example_id",
    "https://your-bucket.com/ref-voice.mp3"
  ]
}

Response Example

Submit (task created):

{
  "id": "task-unified-xxxxxxxx",
  "object": "audio.generation.task",
  "model": "doubao-seed-audio-1-0",
  "type": "audio",
  "status": "processing",
  "progress": 0,
  "task_info": { "can_cancel": false, "estimated_time": 15 }
}

Query (completed):

{
  "id": "task-unified-1782491238-7b6bmmv2",
  "object": "audio.generation.task",
  "model": "doubao-seed-audio-1-0",
  "type": "audio",
  "status": "completed",
  "progress": 100,
  "created": 1782491238,
  "duration": 41,
  "results": ["https://files.evolink.ai/.../seed-audio-xxx.wav"],
  "result_data": [
    {
      "audio_url": "https://files.evolink.ai/.../seed-audio-xxx.wav",
      "duration": 10.18,
      "format": "wav"
    }
  ],
  "task_info": { "can_cancel": false },
  "usage": { "credits_used": 0.88, "original_duration": 10.18 }
}