OpenAI API参考文档 | OpenAI开发文档|OpenAI中文官方文档|ChatGPT中文版|ChatGPT教程

音频

了解如何将音频转换为文本或将文本转换为音频。source

创建语音

POST https://api.openai.com/v1/audio/speech

从输入文本生成音频。source

请求正文

MODEL

字符串source

必填

可用的 TTS 型号之一：tts-1或tts-1-hdsource

输入

字符串source

必填

要为其生成音频的文本。最大长度为 4096 个字符。source

声音

字符串source

必填

生成音频时使用的语音。支持的语音包括alloy,echo,fable,onyx,nova和shimmer.语音预览可在文本到语音转换指南中找到。source

response_format

字符串source

可选

默认为 mp3

音频格式。支持的格式包括mp3,opus,aac,flac,wav和pcm.source

速度

数字source

可选

默认为 1

生成的音频的速度。从中选择一个值0.25自4.0.1.0是默认值。source

音频文件内容。source

示例请求

1
2
3
4
5
6
7
8
9
curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy"
  }' \
  --output speech.mp3

1
2
3
4
5
6
7
8
9
10
from pathlib import Path
import openai

speech_file_path = Path(__file__).parent / "speech.mp3"
response = openai.audio.speech.create(
  model="tts-1",
  voice="alloy",
  input="The quick brown fox jumped over the lazy dog."
)
response.stream_to_file(speech_file_path)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import fs from "fs";
import path from "path";
import OpenAI from "openai";

const openai = new OpenAI();

const speechFile = path.resolve("./speech.mp3");

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: "tts-1",
    voice: "alloy",
    input: "Today is a wonderful day to build something people love!",
  });
  console.log(speechFile);
  const buffer = Buffer.from(await mp3.arrayBuffer());
  await fs.promises.writeFile(speechFile, buffer);
}
main();

创建转录

POST https://api.openai.com/v1/audio/transcriptions

将音频转录为输入语言。source

请求正文

文件

文件source

必填

要转录的音频文件对象（不是文件名），采用以下格式之一：flac、mp3、mp4、mpeg、mpga、m4a、ogg、wav 或 webm。source

MODEL

字符串source

必填

要使用的模型的 ID。只whisper-1（由我们的开源 Whisper V2 模型提供支持）目前可用。source

语言

字符串source

可选

输入音频的语言。以 ISO-639-1 格式提供输入语言将提高准确性和延迟时间。source

提示

字符串source

可选

一个可选文本，用于引导模型的样式或继续上一个音频片段。提示应与音频语言匹配。source

response_format

字符串source

可选

默认为 json

输出的格式，位于以下选项之一中：json,text,srt,verbose_json或vtt.source

温度

数字source

可选

默认为 0

采样温度，介于 0 和 1 之间。较高的值（如 0.8）将使输出更加随机，而较低的值（如 0.2）将使其更加集中和确定。如果设置为 0，则模型将使用对数概率自动增加温度，直到达到某些阈值。source

timestamp_granularities

数组source

可选

默认为 segment

要为此转录填充的时间戳粒度。response_format必须设置verbose_json以使用时间戳粒度。支持以下任一或两个选项：word或segment.注意：段时间戳没有额外的延迟，但生成单词时间戳会产生额外的延迟。source

transcription 对象或详细的 transcription 对象。source

示例请求

1
2
3
4
5
curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="whisper-1"

1
2
3
4
5
6
7
8
from openai import OpenAI
client = OpenAI()

audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
  model="whisper-1",
  file=audio_file
)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import fs from "fs";
import OpenAI from "openai";

const openai = new OpenAI();

async function main() {
  const transcription = await openai.audio.transcriptions.create({
    file: fs.createReadStream("audio.mp3"),
    model: "whisper-1",
  });

  console.log(transcription.text);
}
main();

响应

1
2
3
{
  "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}

创建翻译

POST https://api.openai.com/v1/audio/translations

将音频翻译成英文。source

请求正文

文件

文件source

必填

音频文件对象（不是文件名）将翻译为以下格式之一：flac、mp3、mp4、mpeg、mpga、m4a、ogg、wav 或 webm。source

MODEL

字符串source

必填

要使用的模型的 ID。只whisper-1（由我们的开源 Whisper V2 模型提供支持）目前可用。source

提示

字符串source

可选

一个可选文本，用于引导模型的样式或继续上一个音频片段。提示应为英文。source

response_format

字符串source

可选

默认为 json

输出的格式，位于以下选项之一中：json,text,srt,verbose_json或vtt.source

温度

数字source

可选

默认为 0

翻译后的文本。source

示例请求

1
2
3
4
5
curl https://api.openai.com/v1/audio/translations \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/german.m4a" \
  -F model="whisper-1"

1
2
3
4
5
6
7
8
from openai import OpenAI
client = OpenAI()

audio_file = open("speech.mp3", "rb")
transcript = client.audio.translations.create(
  model="whisper-1",
  file=audio_file
)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import fs from "fs";
import OpenAI from "openai";

const openai = new OpenAI();

async function main() {
    const translation = await openai.audio.translations.create({
        file: fs.createReadStream("speech.mp3"),
        model: "whisper-1",
    });

    console.log(translation.text);
}
main();

响应

1
2
3
{
  "text": "Hello, my name is Wolfgang and I come from Germany. Where are you heading today?"
}

转录对象（JSON）

表示 model 根据提供的输入返回的转录响应。source

文本

字符串source

转录的文本。source

OBJECT 转录对象（JSON）

1
2
3
{
  "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}

转录对象（Verbose JSON）

表示模型根据提供的输入返回的详细 json 转录响应。source

语言

字符串source

输入音频的语言。source

duration

字符串source

输入音频的持续时间。source

文本

字符串source

转录的文本。source

字符

数组source

提取的单词及其相应的时间戳。source

segments

数组source

转录文本的片段及其相应的详细信息。source

OBJECT 转录对象（Verbose JSON）

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
  "task": "transcribe",
  "language": "english",
  "duration": 8.470000267028809,
  "text": "The beach was a popular spot on a hot summer day. People were swimming in the ocean, building sandcastles, and playing beach volleyball.",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 3.319999933242798,
      "text": " The beach was a popular spot on a hot summer day.",
      "tokens": [
        50364, 440, 7534, 390, 257, 3743, 4008, 322, 257, 2368, 4266, 786, 13, 50530
      ],
      "temperature": 0.0,
      "avg_logprob": -0.2860786020755768,
      "compression_ratio": 1.2363636493682861,
      "no_speech_prob": 0.00985979475080967
    },
    ...
  ]
}

音频

创建语音

请求正文

返回

创建转录

请求正文

返回

创建翻译

请求正文

返回

转录对象 （JSON）

转录对象 （Verbose JSON）

转录对象（JSON）

转录对象（Verbose JSON）