Skip to content

audio

Convert text to speech

POST /audio/speech
FieldTypeRequiredDescription
modelstringYesTTS model name
inputstringYesText to convert to speech
voicestring (alloy, echo, fable, onyx, nova, shimmer)YesVoice type
response_formatstring (mp3, opus, aac, flac, wav, pcm)No-
speednumberNoSpeech speed
{
"model": "speech-2.6-turbo",
"input": "Hello, welcome to our service!",
"voice": "alloy"
}
{
"model": "speech-2.6-hd",
"input": "The quick brown fox jumps over the lazy dog.",
"voice": "nova",
"response_format": "mp3",
"speed": 1
}
{
"model": "speech-2.6-turbo",
"input": "Daily update: traffic is clear, weather is sunny, meetings start at 10 AM.",
"voice": "echo",
"response_format": "opus",
"speed": 1.2
}

Successful response

const response = await fetch('https://api.r9s.ai/v1/audio/speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
"model": "speech-2.6-turbo",
"input": "Hello, welcome to our service!",
"voice": "alloy"
})
});
const data = await response.json();
console.log(data);
import requests
url = "https://api.r9s.ai/v1/audio/speech"
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
response = requests.post(url, json={
"model": "speech-2.6-turbo",
"input": "Hello, welcome to our service!",
"voice": "alloy"
}, headers=headers)
data = response.json()
print(data)
Terminal window
curl -X POST "https://api.r9s.ai/v1/audio/speech" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"speech-2.6-turbo","input":"Hello, welcome to our service!","voice":"alloy"}'

Transcribe speech to text. Supports multiple models and output formats.

Supported models:

  • whisper-1: Supports json, text, srt, verbose_json, vtt formats
  • gpt-4o-transcribe, gpt-4o-mini-transcribe: Only support json and text formats

Note: timestamp_granularities parameter only works when response_format is set to verbose_json

POST /audio/transcriptions
FieldTypeRequiredDescription
filestringYesAudio file to transcribe
modelstringYesModel name
languagestringNoAudio language (ISO-639-1 format)
promptstringNoOptional text prompt
response_formatstring (json, text, srt, verbose_json, vtt)NoOutput format. Model support varies:
- whisper-1: Supports all formats (json, text, srt, verbose_json, vtt)
- gpt-4o-transcribe, gpt-4o-mini-transcribe: Only json and text
temperaturenumberNo-
timestamp_granularitiesArray<string (word, segment)>NoTimestamp granularity levels to include. Options: word, segment.
Important: Only works when response_format is set to verbose_json.
Note: segment timestamps have no additional latency, but word timestamps add latency.
{
"file": "audio.mp3",
"model": "whisper-1"
}
{
"file": "audio.mp3",
"model": "whisper-1",
"language": "en",
"response_format": "json",
"temperature": 0
}
{
"file": "meeting.wav",
"model": "gpt-4o-transcribe",
"language": "en",
"response_format": "verbose_json",
"timestamp_granularities": [
"word"
]
}

Successful response

FieldTypeRequiredDescription
textstringYesTranscribed text
languagestringNoDetected language
durationnumberNoAudio duration (seconds)
wordsArrayNo-
segmentsArrayNo-
{
"text": "Hello, this is a test transcription of an audio file.",
"language": "en",
"duration": 5.2
}
const response = await fetch('https://api.r9s.ai/v1/audio/transcriptions', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({})
});
const data = await response.json();
console.log(data);
import requests
url = "https://api.r9s.ai/v1/audio/transcriptions"
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
response = requests.post(url, json={}, headers=headers)
data = response.json()
print(data)
Terminal window
curl -X POST "https://api.r9s.ai/v1/audio/transcriptions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{}'

Translate speech from any supported language to English text.

Important: This endpoint only translates audio into English. The source language is automatically detected by the model.

Supported models: whisper-1 (primary), gpt-4o-transcribe (extended support)

POST /audio/translations
FieldTypeRequiredDescription
filestringYesAudio file to translate to English
modelstringYesModel name (whisper-1 is primary, gpt-4o-transcribe has extended support)
promptstringNoOptional text prompt to guide the model’s style.
The source language can be specified in the prompt if needed, though the model will auto-detect it.
response_formatstring (json, text, srt, verbose_json, vtt)NoOutput format for the translated text
temperaturenumberNoSampling temperature between 0 and 1
{
"file": "german_audio.mp3",
"model": "whisper-1"
}
{
"file": "french_audio.mp3",
"model": "whisper-1",
"prompt": "This is about technology",
"response_format": "json"
}
{
"file": "meeting_cn.mp3",
"model": "gpt-4o-transcribe",
"prompt": "Business meeting, summarize clearly",
"response_format": "text"
}

Successful response

FieldTypeRequiredDescription
textstringYesTranslated English text
languagestringNoSource language
durationnumberNoAudio duration (seconds)
{
"text": "This is a translation of the audio file into English.",
"language": "de",
"duration": 4.8
}
const response = await fetch('https://api.r9s.ai/v1/audio/translations', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({})
});
const data = await response.json();
console.log(data);
import requests
url = "https://api.r9s.ai/v1/audio/translations"
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
response = requests.post(url, json={}, headers=headers)
data = response.json()
print(data)
Terminal window
curl -X POST "https://api.r9s.ai/v1/audio/translations" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{}'
FieldTypeRequiredDescription
modelstringYesTTS model name
inputstringYesText to convert to speech
voicestring (alloy, echo, fable, onyx, nova, shimmer)YesVoice type
response_formatstring (mp3, opus, aac, flac, wav, pcm)No-
speednumberNoSpeech speed
FieldTypeRequiredDescription
textstringYesTranscribed text
languagestringNoDetected language
durationnumberNoAudio duration (seconds)
wordsArrayNo-
segmentsArrayNo-
FieldTypeRequiredDescription
textstringYesTranslated English text
languagestringNoSource language
durationnumberNoAudio duration (seconds)