audio

Text to speech

Convert text to speech

Request

POST /audio/speech

Request Body

Field	Type	Required	Description
model	string	Yes	TTS model name
input	string	Yes	Text to convert to speech
voice	string (alloy, echo, fable, onyx, nova, shimmer)	Yes	Voice type
response_format	string (mp3, opus, aac, flac, wav, pcm)	No	-
speed	number	No	Speech speed

Request Examples

Simple speech synthesis

{
  "model": "speech-2.6-turbo",
  "input": "Hello, welcome to our service!",
  "voice": "alloy"
}

Speech synthesis with detailed parameters

{
  "model": "speech-2.6-hd",
  "input": "The quick brown fox jumps over the lazy dog.",
  "voice": "nova",
  "response_format": "mp3",
  "speed": 1
}

Fast-paced briefing

{
  "model": "speech-2.6-turbo",
  "input": "Daily update: traffic is clear, weather is sunny, meetings start at 10 AM.",
  "voice": "echo",
  "response_format": "opus",
  "speed": 1.2
}

Successful response

Code Examples

JavaScript (Fetch)

const response = await fetch('https://api.r9s.ai/v1/audio/speech', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
  "model": "speech-2.6-turbo",
  "input": "Hello, welcome to our service!",
  "voice": "alloy"
})
});

const data = await response.json();
console.log(data);

Python (requests)

import requests

url = "https://api.r9s.ai/v1/audio/speech"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

response = requests.post(url, json={
  "model": "speech-2.6-turbo",
  "input": "Hello, welcome to our service!",
  "voice": "alloy"
}, headers=headers)
data = response.json()
print(data)

cURL

curl -X POST "https://api.r9s.ai/v1/audio/speech" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"speech-2.6-turbo","input":"Hello, welcome to our service!","voice":"alloy"}'

Speech to text

Transcribe speech to text. Supports multiple models and output formats.

Supported models:

whisper-1: Supports json, text, srt, verbose_json, vtt formats
gpt-4o-transcribe, gpt-4o-mini-transcribe: Only support json and text formats

Note: timestamp_granularities parameter only works when response_format is set to verbose_json

Request

POST /audio/transcriptions

Request Body

Field	Type	Required	Description
file	string	Yes	Audio file to transcribe
model	string	Yes	Model name
language	string	No	Audio language (ISO-639-1 format)
prompt	string	No	Optional text prompt
response_format	string (json, text, srt, verbose_json, vtt)	No	Output format. Model support varies: - whisper-1: Supports all formats (json, text, srt, verbose_json, vtt) - gpt-4o-transcribe, gpt-4o-mini-transcribe: Only json and text
temperature	number	No	-
timestamp_granularities	Array<string (word, segment)>	No	Timestamp granularity levels to include. Options: word, segment. Important: Only works when response_format is set to verbose_json. Note: segment timestamps have no additional latency, but word timestamps add latency.

Request Examples

Simple speech transcription

{
  "file": "audio.mp3",
  "model": "whisper-1"
}

Speech transcription with parameters

{
  "file": "audio.mp3",
  "model": "whisper-1",
  "language": "en",
  "response_format": "json",
  "temperature": 0
}

Transcription with timestamps

{
  "file": "meeting.wav",
  "model": "gpt-4o-transcribe",
  "language": "en",
  "response_format": "verbose_json",
  "timestamp_granularities": [
    "word"
  ]
}

Successful response

Response Schema

Field	Type	Required	Description
text	string	Yes	Transcribed text
language	string	No	Detected language
duration	number	No	Audio duration (seconds)
words	Array	No	-
segments	Array	No	-

Response Example

{
  "text": "Hello, this is a test transcription of an audio file.",
  "language": "en",
  "duration": 5.2
}

Code Examples

JavaScript (Fetch)

const response = await fetch('https://api.r9s.ai/v1/audio/transcriptions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({})
});

const data = await response.json();
console.log(data);

Python (requests)

import requests

url = "https://api.r9s.ai/v1/audio/transcriptions"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

response = requests.post(url, json={}, headers=headers)
data = response.json()
print(data)

cURL

curl -X POST "https://api.r9s.ai/v1/audio/transcriptions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'

Speech translation

Translate speech from any supported language to English text.

Important: This endpoint only translates audio into English. The source language is automatically detected by the model.

Supported models: whisper-1 (primary), gpt-4o-transcribe (extended support)

Request

POST /audio/translations

Request Body

Field	Type	Required	Description
file	string	Yes	Audio file to translate to English
model	string	Yes	Model name (whisper-1 is primary, gpt-4o-transcribe has extended support)
prompt	string	No	Optional text prompt to guide the model’s style. The source language can be specified in the prompt if needed, though the model will auto-detect it.
response_format	string (json, text, srt, verbose_json, vtt)	No	Output format for the translated text
temperature	number	No	Sampling temperature between 0 and 1

Request Examples

Simple speech translation

{
  "file": "german_audio.mp3",
  "model": "whisper-1"
}

Speech translation with prompt

{
  "file": "french_audio.mp3",
  "model": "whisper-1",
  "prompt": "This is about technology",
  "response_format": "json"
}

Translate meeting recording to English

{
  "file": "meeting_cn.mp3",
  "model": "gpt-4o-transcribe",
  "prompt": "Business meeting, summarize clearly",
  "response_format": "text"
}

Successful response

Response Schema

Field	Type	Required	Description
text	string	Yes	Translated English text
language	string	No	Source language
duration	number	No	Audio duration (seconds)

Response Example

{
  "text": "This is a translation of the audio file into English.",
  "language": "de",
  "duration": 4.8
}

Code Examples

JavaScript (Fetch)

const response = await fetch('https://api.r9s.ai/v1/audio/translations', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({})
});

const data = await response.json();
console.log(data);

Python (requests)

import requests

url = "https://api.r9s.ai/v1/audio/translations"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

response = requests.post(url, json={}, headers=headers)
data = response.json()
print(data)

cURL

curl -X POST "https://api.r9s.ai/v1/audio/translations" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'

Schema Reference

AudioSpeechRequest

Field	Type	Required	Description
model	string	Yes	TTS model name
input	string	Yes	Text to convert to speech
voice	string (alloy, echo, fable, onyx, nova, shimmer)	Yes	Voice type
response_format	string (mp3, opus, aac, flac, wav, pcm)	No	-
speed	number	No	Speech speed

AudioTranscriptionResponse

Field	Type	Required	Description
text	string	Yes	Transcribed text
language	string	No	Detected language
duration	number	No	Audio duration (seconds)
words	Array	No	-
segments	Array	No	-

AudioTranslationResponse

Field	Type	Required	Description
text	string	Yes	Translated English text
language	string	No	Source language
duration	number	No	Audio duration (seconds)

API Overview - Learn about authentication and basic information
models - View models related APIs
chat - View chat related APIs
responses - View responses related APIs
messages - View messages related APIs
completions - View completions related APIs
edits - View edits related APIs
images - View images related APIs
embeddings - View embeddings related APIs
engine-embeddings - View engine-embeddings related APIs
moderations - View moderations related APIs
search - View search related APIs
proxy - View proxy related APIs

audio

Text to speech

Request

Request Body

Request Examples

Simple speech synthesis

Speech synthesis with detailed parameters

Fast-paced briefing

Code Examples

JavaScript (Fetch)

Python (requests)

cURL

Speech to text

Request

Request Body

Request Examples

Simple speech transcription

Speech transcription with parameters

Transcription with timestamps

Response Schema

Response Example

Code Examples

JavaScript (Fetch)

Python (requests)

cURL

Speech translation

Request

Request Body

Request Examples

Simple speech translation

Speech translation with prompt

Translate meeting recording to English

Response Schema

Response Example

Code Examples

JavaScript (Fetch)

Python (requests)

cURL

Schema Reference

AudioSpeechRequest

AudioTranscriptionResponse

AudioTranslationResponse

Related APIs