audio
Text to speech
Section titled “Text to speech”Convert text to speech
Request
Section titled “Request”POST /audio/speechRequest Body
Section titled “Request Body”| Field | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | TTS model name |
| input | string | Yes | Text to convert to speech |
| voice | string (alloy, echo, fable, onyx, nova, shimmer) | Yes | Voice type |
| response_format | string (mp3, opus, aac, flac, wav, pcm) | No | - |
| speed | number | No | Speech speed |
Request Examples
Section titled “Request Examples”Simple speech synthesis
Section titled “Simple speech synthesis”{ "model": "speech-2.6-turbo", "input": "Hello, welcome to our service!", "voice": "alloy"}Speech synthesis with detailed parameters
Section titled “Speech synthesis with detailed parameters”{ "model": "speech-2.6-hd", "input": "The quick brown fox jumps over the lazy dog.", "voice": "nova", "response_format": "mp3", "speed": 1}Fast-paced briefing
Section titled “Fast-paced briefing”{ "model": "speech-2.6-turbo", "input": "Daily update: traffic is clear, weather is sunny, meetings start at 10 AM.", "voice": "echo", "response_format": "opus", "speed": 1.2}Successful response
Code Examples
Section titled “Code Examples”JavaScript (Fetch)
Section titled “JavaScript (Fetch)”const response = await fetch('https://api.r9s.ai/v1/audio/speech', { method: 'POST', headers: { 'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json' }, body: JSON.stringify({ "model": "speech-2.6-turbo", "input": "Hello, welcome to our service!", "voice": "alloy"})});
const data = await response.json();console.log(data);Python (requests)
Section titled “Python (requests)”import requests
url = "https://api.r9s.ai/v1/audio/speech"headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"}
response = requests.post(url, json={ "model": "speech-2.6-turbo", "input": "Hello, welcome to our service!", "voice": "alloy"}, headers=headers)data = response.json()print(data)curl -X POST "https://api.r9s.ai/v1/audio/speech" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"speech-2.6-turbo","input":"Hello, welcome to our service!","voice":"alloy"}'Speech to text
Section titled “Speech to text”Transcribe speech to text. Supports multiple models and output formats.
Supported models:
- whisper-1: Supports json, text, srt, verbose_json, vtt formats
- gpt-4o-transcribe, gpt-4o-mini-transcribe: Only support json and text formats
Note: timestamp_granularities parameter only works when response_format is set to verbose_json
Request
Section titled “Request”POST /audio/transcriptionsRequest Body
Section titled “Request Body”| Field | Type | Required | Description |
|---|---|---|---|
| file | string | Yes | Audio file to transcribe |
| model | string | Yes | Model name |
| language | string | No | Audio language (ISO-639-1 format) |
| prompt | string | No | Optional text prompt |
| response_format | string (json, text, srt, verbose_json, vtt) | No | Output format. Model support varies: - whisper-1: Supports all formats (json, text, srt, verbose_json, vtt) - gpt-4o-transcribe, gpt-4o-mini-transcribe: Only json and text |
| temperature | number | No | - |
| timestamp_granularities | Array<string (word, segment)> | No | Timestamp granularity levels to include. Options: word, segment. Important: Only works when response_format is set to verbose_json. Note: segment timestamps have no additional latency, but word timestamps add latency. |
Request Examples
Section titled “Request Examples”Simple speech transcription
Section titled “Simple speech transcription”{ "file": "audio.mp3", "model": "whisper-1"}Speech transcription with parameters
Section titled “Speech transcription with parameters”{ "file": "audio.mp3", "model": "whisper-1", "language": "en", "response_format": "json", "temperature": 0}Transcription with timestamps
Section titled “Transcription with timestamps”{ "file": "meeting.wav", "model": "gpt-4o-transcribe", "language": "en", "response_format": "verbose_json", "timestamp_granularities": [ "word" ]}Successful response
Response Schema
Section titled “Response Schema”| Field | Type | Required | Description |
|---|---|---|---|
| text | string | Yes | Transcribed text |
| language | string | No | Detected language |
| duration | number | No | Audio duration (seconds) |
| words | Array | No | - |
| segments | Array | No | - |
Response Example
Section titled “Response Example”{ "text": "Hello, this is a test transcription of an audio file.", "language": "en", "duration": 5.2}Code Examples
Section titled “Code Examples”JavaScript (Fetch)
Section titled “JavaScript (Fetch)”const response = await fetch('https://api.r9s.ai/v1/audio/transcriptions', { method: 'POST', headers: { 'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json' }, body: JSON.stringify({})});
const data = await response.json();console.log(data);Python (requests)
Section titled “Python (requests)”import requests
url = "https://api.r9s.ai/v1/audio/transcriptions"headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"}
response = requests.post(url, json={}, headers=headers)data = response.json()print(data)curl -X POST "https://api.r9s.ai/v1/audio/transcriptions" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{}'Speech translation
Section titled “Speech translation”Translate speech from any supported language to English text.
Important: This endpoint only translates audio into English. The source language is automatically detected by the model.
Supported models: whisper-1 (primary), gpt-4o-transcribe (extended support)
Request
Section titled “Request”POST /audio/translationsRequest Body
Section titled “Request Body”| Field | Type | Required | Description |
|---|---|---|---|
| file | string | Yes | Audio file to translate to English |
| model | string | Yes | Model name (whisper-1 is primary, gpt-4o-transcribe has extended support) |
| prompt | string | No | Optional text prompt to guide the model’s style. The source language can be specified in the prompt if needed, though the model will auto-detect it. |
| response_format | string (json, text, srt, verbose_json, vtt) | No | Output format for the translated text |
| temperature | number | No | Sampling temperature between 0 and 1 |
Request Examples
Section titled “Request Examples”Simple speech translation
Section titled “Simple speech translation”{ "file": "german_audio.mp3", "model": "whisper-1"}Speech translation with prompt
Section titled “Speech translation with prompt”{ "file": "french_audio.mp3", "model": "whisper-1", "prompt": "This is about technology", "response_format": "json"}Translate meeting recording to English
Section titled “Translate meeting recording to English”{ "file": "meeting_cn.mp3", "model": "gpt-4o-transcribe", "prompt": "Business meeting, summarize clearly", "response_format": "text"}Successful response
Response Schema
Section titled “Response Schema”| Field | Type | Required | Description |
|---|---|---|---|
| text | string | Yes | Translated English text |
| language | string | No | Source language |
| duration | number | No | Audio duration (seconds) |
Response Example
Section titled “Response Example”{ "text": "This is a translation of the audio file into English.", "language": "de", "duration": 4.8}Code Examples
Section titled “Code Examples”JavaScript (Fetch)
Section titled “JavaScript (Fetch)”const response = await fetch('https://api.r9s.ai/v1/audio/translations', { method: 'POST', headers: { 'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json' }, body: JSON.stringify({})});
const data = await response.json();console.log(data);Python (requests)
Section titled “Python (requests)”import requests
url = "https://api.r9s.ai/v1/audio/translations"headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"}
response = requests.post(url, json={}, headers=headers)data = response.json()print(data)curl -X POST "https://api.r9s.ai/v1/audio/translations" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{}'Schema Reference
Section titled “Schema Reference”AudioSpeechRequest
Section titled “AudioSpeechRequest”| Field | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | TTS model name |
| input | string | Yes | Text to convert to speech |
| voice | string (alloy, echo, fable, onyx, nova, shimmer) | Yes | Voice type |
| response_format | string (mp3, opus, aac, flac, wav, pcm) | No | - |
| speed | number | No | Speech speed |
AudioTranscriptionResponse
Section titled “AudioTranscriptionResponse”| Field | Type | Required | Description |
|---|---|---|---|
| text | string | Yes | Transcribed text |
| language | string | No | Detected language |
| duration | number | No | Audio duration (seconds) |
| words | Array | No | - |
| segments | Array | No | - |
AudioTranslationResponse
Section titled “AudioTranslationResponse”| Field | Type | Required | Description |
|---|---|---|---|
| text | string | Yes | Translated English text |
| language | string | No | Source language |
| duration | number | No | Audio duration (seconds) |
Related APIs
Section titled “Related APIs”- API Overview - Learn about authentication and basic information
- models - View models related APIs
- chat - View chat related APIs
- responses - View responses related APIs
- messages - View messages related APIs
- completions - View completions related APIs
- edits - View edits related APIs
- images - View images related APIs
- embeddings - View embeddings related APIs
- engine-embeddings - View engine-embeddings related APIs
- moderations - View moderations related APIs
- search - View search related APIs
- proxy - View proxy related APIs