responses
Create response
Section titled “Create response”Create a response with streaming support. This endpoint corresponds to OpenAI’s Responses API.
Request
Section titled “Request”POST /responsesRequest Body
Section titled “Request Body”| Field | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model name |
| input | string | Yes | Input content, required parameter. Can be: - String: Single text input - Message array: Structured conversation history Important limitations: - Messages only support basic fields (role, content, name) - Does not support tool_calls, tool_call_id and other tool-related fields - content field is required and cannot be null - To use tools, define them in the top-level tools parameter; model will call them on first response Note: Responses API has deprecated messages parameter, now uses input parameter uniformly |
| instructions | string | No | System-level instructions to guide model behavior and response style (similar to system message) |
| temperature | number | No | Controls output randomness, higher values mean more random |
| top_p | number | No | Nucleus sampling parameter, controls output diversity |
| max_output_tokens | integer | No | Maximum number of tokens to generate |
| stream | boolean | No | Whether to enable streaming |
| modalities | Array<string (text, audio)> | No | Response modality types |
| tools | Array<ResponseTool> | No | Available tools list (using flat format) |
| tool_choice | string (none, auto, required) | No | Tool selection strategy |
| parallel_tool_calls | boolean | No | Whether to enable parallel function calling during tool use. When false, ensures exactly zero or one tool is called. |
| text | object | No | Text output configuration |
| previous_response_id | string | No | The ID of a previous response to continue the conversation from. This allows you to chain responses together and maintain conversation state. When using previous_response_id, the model will automatically have access to all previously produced reasoning items and conversation history. |
| store | boolean | No | Whether to store the generated model response for later retrieval via API. Defaults to true. Set to false to disable storage (required for ZDR organizations). |
| background | boolean | No | Whether to run the model response in the background asynchronously. Useful for long-running tasks. |
| reasoning | object | No | Configuration for reasoning models (e.g., o1, o3, gpt-5). Controls how the model uses reasoning tokens to “think” through the problem. |
| truncation | string (auto, disabled) | No | The truncation strategy to use for the model response. - auto: If input exceeds context window, truncate by dropping items from beginning - disabled: Request fails with 400 error if input exceeds context window (default) |
| stop | string | No | Up to 4 sequences where the API will stop generating further tokens |
| metadata | object | No | Additional metadata for tracking and organization purposes |
Request Examples
Section titled “Request Examples”Simple text input
Section titled “Simple text input”{ "model": "gpt-4o-mini", "input": "Tell me a joke about programming", "instructions": "You are a funny assistant", "max_output_tokens": 500, "temperature": 0.7}Using message array (recommended)
Section titled “Using message array (recommended)”{ "model": "gpt-4o-mini", "input": [ { "role": "user", "content": "Hello, how are you?" } ], "instructions": "You are a helpful assistant", "max_output_tokens": 1000}Multi-turn conversation
Section titled “Multi-turn conversation”{ "model": "qwen-plus", "input": [ { "role": "user", "content": "What is artificial intelligence?" }, { "role": "assistant", "content": "Artificial intelligence (AI) is..." }, { "role": "user", "content": "Can you give me some examples?" } ], "instructions": "You are a knowledgeable AI tutor", "max_output_tokens": 2000, "stream": true}Request with tool calls
Section titled “Request with tool calls”{ "model": "gpt-4o-mini", "input": [ { "role": "user", "content": "What's the weather like in San Francisco?" } ], "instructions": "You are a helpful assistant with access to tools", "max_output_tokens": 2000, "temperature": 0.7, "modalities": [ "text" ], "tools": [ { "type": "function", "name": "get_weather", "description": "Get the current weather in a location", "parameters": { "type": "object", "properties": { "location": { "type": "string" } }, "required": [ "location" ] } } ]}Multiple tool calls scenario
Section titled “Multiple tool calls scenario”{ "model": "gpt-4o-mini", "input": [ { "role": "user", "content": "Book a flight from NYC to London on Dec 25th and check the weather there" } ], "instructions": "You are a travel assistant. Use tools to help users with travel planning.", "tools": [ { "type": "function", "name": "search_flights", "description": "Search for available flights", "parameters": { "type": "object", "properties": { "origin": { "type": "string", "description": "Departure city" }, "destination": { "type": "string", "description": "Arrival city" }, "date": { "type": "string", "description": "Travel date in YYYY-MM-DD format" } }, "required": [ "origin", "destination", "date" ] } }, { "type": "function", "name": "get_weather", "description": "Get weather forecast", "parameters": { "type": "object", "properties": { "location": { "type": "string" }, "date": { "type": "string" } }, "required": [ "location" ] } } ], "tool_choice": "auto", "max_output_tokens": 3000}Streaming with tool calls
Section titled “Streaming with tool calls”{ "model": "gpt-4o-mini", "input": [ { "role": "user", "content": "Calculate 15% tip on a $85.50 bill and tell me the total" } ], "instructions": "You are a helpful calculator assistant", "stream": true, "tools": [ { "type": "function", "name": "calculate", "description": "Perform mathematical calculations", "parameters": { "type": "object", "properties": { "expression": { "type": "string", "description": "Mathematical expression to evaluate" } }, "required": [ "expression" ] } } ], "max_output_tokens": 1000}Required tool usage
Section titled “Required tool usage”{ "model": "gpt-4o-mini", "input": "Search for recent news about artificial intelligence", "instructions": "You must use the search tool to find current information", "tools": [ { "type": "function", "name": "web_search", "description": "Search the web for information", "parameters": { "type": "object", "properties": { "query": { "type": "string" }, "num_results": { "type": "integer" } }, "required": [ "query" ] } } ], "tool_choice": "required", "max_output_tokens": 2000}Request with metadata
Section titled “Request with metadata”{ "model": "gpt-4o-mini", "input": "Summarize the key points from our discussion", "instructions": "You are a meeting assistant", "max_output_tokens": 1500, "temperature": 0.5, "top_p": 0.9, "metadata": { "user_id": "user_12345", "session_id": "session_abc", "conversation_id": "conv_xyz" }}Basic streaming response
Section titled “Basic streaming response”{ "model": "gpt-4o-mini", "input": "Write a short poem about the ocean", "instructions": "You are a creative poet", "stream": true, "max_output_tokens": 500, "temperature": 0.9}JSON mode output
Section titled “JSON mode output”{ "model": "gpt-4o-mini", "input": "Extract person information and return as JSON: John Smith is 35 years old and works as a software engineer in San Francisco", "instructions": "Extract structured data and output in JSON format", "text": { "format": { "type": "json_object" } }, "max_output_tokens": 500}Structured JSON with schema
Section titled “Structured JSON with schema”{ "model": "gpt-4o-mini", "input": "Generate a user profile for software developer Alice Chen in JSON format", "instructions": "Create a detailed user profile following the schema", "text": { "format": { "type": "json_schema", "name": "user_profile", "schema": { "type": "object", "properties": { "name": { "type": "string" }, "age": { "type": "integer" }, "occupation": { "type": "string" }, "location": { "type": "string" }, "skills": { "type": "array", "items": { "type": "string" } } }, "required": [ "name", "age", "occupation", "location", "skills" ], "additionalProperties": false }, "strict": true } }, "max_output_tokens": 800}Complete tool call flow with result
Section titled “Complete tool call flow with result”{ "model": "gpt-4o-mini", "input": [ { "role": "user", "content": "What's the weather in Tokyo?" }, { "role": "assistant", "content": "I'll check the weather for you." }, { "role": "user", "content": "The weather tool returned: temperature 22°C, condition sunny, humidity 60%" } ], "instructions": "You are a helpful assistant. Synthesize tool results naturally.", "max_output_tokens": 500}Chained conversation with previous_response_id
Section titled “Chained conversation with previous_response_id”{ "model": "gpt-4o-mini", "input": "Can you elaborate more on the second point?", "instructions": "You are a helpful assistant", "previous_response_id": "resp_abc123xyz456", "max_output_tokens": 1000}Background asynchronous task
Section titled “Background asynchronous task”{ "model": "gpt-4o-mini", "input": "Analyze this large dataset and provide insights: [dataset details...]", "instructions": "You are a data analyst", "background": true, "max_output_tokens": 5000, "temperature": 0.3}Reasoning mode for complex problems
Section titled “Reasoning mode for complex problems”{ "model": "gpt-5-codex", "input": "A farmer needs to transport a fox, a chicken, and a bag of grain across a river. The boat can only carry the farmer and one item. If left alone, the fox will eat the chicken, and the chicken will eat the grain. How can the farmer get everything across safely?", "instructions": "Think through this step by step", "reasoning": { "effort": "high" }, "max_output_tokens": 3000}Successful response
Response Schema
Section titled “Response Schema”| Field | Type | Required | Description |
|---|---|---|---|
| id | string | Yes | Unique identifier for the response |
| object | string | Yes | - |
| created_at | integer | Yes | Unix timestamp when the response was created |
| status | string (in_progress, completed, incomplete, failed, cancelled) | Yes | The status of the response |
| background | boolean | No | Whether the response is running in the background |
| billing | object | No | Billing information |
| completed_at | object | No | Unix timestamp when the response was completed |
| error | object | No | Error information if the response failed |
| incomplete_details | object | No | Details about why the response is incomplete |
| instructions | string | No | System-level instructions that guided the model’s behavior |
| max_output_tokens | integer | No | Maximum number of tokens to generate |
| max_tool_calls | object | No | Maximum number of tool calls allowed |
| model | string | Yes | The model used for the response |
| output | Array<ResponseOutputItem> | No | Array of output items produced by the model |
| parallel_tool_calls | boolean | No | Whether parallel tool calls are enabled |
| previous_response_id | object | No | ID of the previous response in a chain |
| prompt_cache_key | object | No | Key for prompt caching |
| prompt_cache_retention | object | No | Prompt cache retention policy |
| reasoning | object | No | Reasoning configuration |
| safety_identifier | object | No | Safety identifier for the response |
| service_tier | string (auto, default) | No | Service tier used |
| store | boolean | No | Whether to store the response |
| temperature | number | No | Temperature parameter used |
| text | object | No | Text format configuration |
| tool_choice | string (none, auto, required) | No | Tool choice strategy used |
| tools | Array<ResponseTool> | No | Tools that were available |
| top_logprobs | integer | No | Number of top log probabilities |
| top_p | number | No | Top-p sampling parameter used |
| truncation | string (auto, disabled) | No | Truncation strategy used |
| user | object | No | User identifier |
| metadata | object | No | Additional metadata |
| usage | object | No | Usage statistics (null when response is still in progress) |
Response Examples
Section titled “Response Examples”Simple text response
Section titled “Simple text response”{ "id": "resp_0af56b0fcfec6be7006949f1d2bf7881a1ac983a51aa13d9e9", "object": "response", "created_at": 1766453714, "status": "completed", "background": false, "billing": { "payer": "developer" }, "completed_at": 1766453715, "error": null, "incomplete_details": null, "instructions": "You are a funny assistant", "max_output_tokens": 500, "max_tool_calls": null, "model": "gpt-4o-mini-2024-07-18", "output": [ { "id": "msg_0af56b0fcfec6be7006949f1d3675881a1a35d943785fb4b8b", "type": "message", "status": "completed", "content": [ { "type": "output_text", "annotations": [], "logprobs": [], "text": "Why do programmers prefer dark mode?\n\nBecause light attracts bugs!" } ], "role": "assistant" } ], "parallel_tool_calls": true, "previous_response_id": null, "prompt_cache_key": null, "prompt_cache_retention": null, "reasoning": { "effort": null, "summary": null }, "safety_identifier": null, "service_tier": "default", "store": true, "temperature": 0.7, "text": { "format": { "type": "text" }, "verbosity": "medium" }, "tool_choice": "auto", "tools": [], "top_logprobs": 0, "top_p": 1, "truncation": "disabled", "usage": { "input_tokens": 22, "input_tokens_details": { "cached_tokens": 0 }, "output_tokens": 13, "output_tokens_details": { "reasoning_tokens": 0 }, "total_tokens": 35 }, "user": null, "metadata": {}}Response with tool calls
Section titled “Response with tool calls”{ "id": "resp_tool_abc123", "object": "response", "created_at": 1766453800, "status": "completed", "background": false, "billing": { "payer": "developer" }, "completed_at": 1766453802, "error": null, "incomplete_details": null, "instructions": "You are a helpful assistant with access to tools", "max_output_tokens": 2000, "model": "gpt-4o-mini-2024-07-18", "output": [ { "id": "msg_tool_output_123", "type": "message", "status": "completed", "content": [ { "type": "output_text", "text": "I'll check the weather for you." } ], "role": "assistant" }, { "id": "call_weather_001", "type": "function_call", "status": "completed", "call_id": "call_abc123", "name": "get_weather", "arguments": "{\"location\": \"San Francisco, CA\", \"unit\": \"celsius\"}" } ], "parallel_tool_calls": true, "tools": [ { "type": "function", "name": "get_weather", "description": "Get current weather", "parameters": { "type": "object", "properties": { "location": { "type": "string" }, "unit": { "type": "string" } } } } ], "tool_choice": "auto", "temperature": 0.7, "usage": { "input_tokens": 45, "output_tokens": 28, "total_tokens": 73 }, "metadata": {}}JSON formatted response
Section titled “JSON formatted response”{ "id": "resp_json_xyz789", "object": "response", "created_at": 1766453900, "status": "completed", "background": false, "completed_at": 1766453901, "instructions": "Extract structured data", "max_output_tokens": 500, "model": "gpt-4o-mini-2024-07-18", "output": [ { "id": "msg_json_output", "type": "message", "status": "completed", "content": [ { "type": "output_text", "text": "{\"name\": \"John Smith\", \"age\": 35, \"occupation\": \"software engineer\", \"location\": \"San Francisco\"}" } ], "role": "assistant" } ], "text": { "format": { "type": "json_object" } }, "usage": { "input_tokens": 32, "output_tokens": 24, "total_tokens": 56 }}Incomplete response (token limit)
Section titled “Incomplete response (token limit)”{ "id": "resp_incomplete_456", "object": "response", "created_at": 1766454000, "status": "incomplete", "background": false, "completed_at": 1766454002, "error": null, "incomplete_details": { "reason": "max_output_tokens" }, "instructions": "Write a detailed essay", "max_output_tokens": 100, "model": "gpt-4o-mini-2024-07-18", "output": [ { "id": "msg_incomplete", "type": "message", "status": "incomplete", "content": [ { "type": "output_text", "text": "Artificial intelligence is a rapidly evolving field that encompasses machine learning, neural networks, and..." } ], "role": "assistant" } ], "usage": { "input_tokens": 15, "output_tokens": 100, "total_tokens": 115 }}Failed response (error state)
Section titled “Failed response (error state)”{ "id": "resp_failed_789", "object": "response", "created_at": 1766454100, "status": "failed", "background": false, "completed_at": null, "error": { "type": "server_error", "message": "Model service temporarily unavailable" }, "instructions": "Answer the question", "max_output_tokens": 1000, "model": "gpt-4o-mini-2024-07-18", "output": [], "usage": { "input_tokens": 10, "output_tokens": 0, "total_tokens": 10 }}Background task response
Section titled “Background task response”{ "id": "resp_background_999", "object": "response", "created_at": 1766454200, "status": "in_progress", "background": true, "completed_at": null, "instructions": "Analyze large dataset", "max_output_tokens": 5000, "model": "gpt-4o-mini-2024-07-18", "output": [], "usage": null}Code Examples
Section titled “Code Examples”JavaScript (Fetch)
Section titled “JavaScript (Fetch)”const response = await fetch('https://api.r9s.ai/v1/responses', { method: 'POST', headers: { 'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json' }, body: JSON.stringify({ "model": "gpt-4o-mini", "input": "Tell me a joke about programming", "instructions": "You are a funny assistant", "max_output_tokens": 500, "temperature": 0.7})});
const data = await response.json();console.log(data);Python (requests)
Section titled “Python (requests)”import requests
url = "https://api.r9s.ai/v1/responses"headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"}
response = requests.post(url, json={ "model": "gpt-4o-mini", "input": "Tell me a joke about programming", "instructions": "You are a funny assistant", "max_output_tokens": 500, "temperature": 0.7}, headers=headers)data = response.json()print(data)curl -X POST "https://api.r9s.ai/v1/responses" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"gpt-4o-mini","input":"Tell me a joke about programming","instructions":"You are a funny assistant","max_output_tokens":500,"temperature":0.7}'Schema Reference
Section titled “Schema Reference”ResponseRequest
Section titled “ResponseRequest”| Field | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model name |
| input | string | Yes | Input content, required parameter. Can be: - String: Single text input - Message array: Structured conversation history Important limitations: - Messages only support basic fields (role, content, name) - Does not support tool_calls, tool_call_id and other tool-related fields - content field is required and cannot be null - To use tools, define them in the top-level tools parameter; model will call them on first response Note: Responses API has deprecated messages parameter, now uses input parameter uniformly |
| instructions | string | No | System-level instructions to guide model behavior and response style (similar to system message) |
| temperature | number | No | Controls output randomness, higher values mean more random |
| top_p | number | No | Nucleus sampling parameter, controls output diversity |
| max_output_tokens | integer | No | Maximum number of tokens to generate |
| stream | boolean | No | Whether to enable streaming |
| modalities | Array<string (text, audio)> | No | Response modality types |
| tools | Array<ResponseTool> | No | Available tools list (using flat format) |
| tool_choice | string (none, auto, required) | No | Tool selection strategy |
| parallel_tool_calls | boolean | No | Whether to enable parallel function calling during tool use. When false, ensures exactly zero or one tool is called. |
| text | object | No | Text output configuration |
| previous_response_id | string | No | The ID of a previous response to continue the conversation from. This allows you to chain responses together and maintain conversation state. When using previous_response_id, the model will automatically have access to all previously produced reasoning items and conversation history. |
| store | boolean | No | Whether to store the generated model response for later retrieval via API. Defaults to true. Set to false to disable storage (required for ZDR organizations). |
| background | boolean | No | Whether to run the model response in the background asynchronously. Useful for long-running tasks. |
| reasoning | object | No | Configuration for reasoning models (e.g., o1, o3, gpt-5). Controls how the model uses reasoning tokens to “think” through the problem. |
| truncation | string (auto, disabled) | No | The truncation strategy to use for the model response. - auto: If input exceeds context window, truncate by dropping items from beginning - disabled: Request fails with 400 error if input exceeds context window (default) |
| stop | string | No | Up to 4 sequences where the API will stop generating further tokens |
| metadata | object | No | Additional metadata for tracking and organization purposes |
Message
Section titled “Message”| Field | Type | Required | Description |
|---|---|---|---|
| role | string (system, user, assistant, tool) | Yes | Message role |
| content | object | No | Message content. Can be null when assistant message contains tool_calls. - user/system messages: Required, contains text or multimodal content - assistant messages: Optional when tool_calls is present; can be null or omitted - tool messages: Required, contains tool return results (usually JSON string) Important: In /v1/responses API, content field must exist and cannot be null. For /v1/chat/completions, content can be null when tool_calls is present. |
| name | string | No | Sender name |
| reasoning_content | string | No | Reasoning content |
| tool_calls | Array<ToolCall> | No | Tool calls list |
| tool_call_id | string | No | Tool call ID |
MessageContent
Section titled “MessageContent”| Field | Type | Required | Description |
|---|---|---|---|
| type | string (text, image_url) | Yes | - |
| text | string | No | - |
| image_url | object | No | - |
ImageURL
Section titled “ImageURL”| Field | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | - |
| detail | string (auto, low, high) | No | - |
ToolCall
Section titled “ToolCall”| Field | Type | Required | Description |
|---|---|---|---|
| id | string | Yes | - |
| type | string | Yes | - |
| function | object | Yes | - |
FunctionCall
Section titled “FunctionCall”| Field | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | - |
| arguments | string | No | - |
ResponseTool
Section titled “ResponseTool”| Field | Type | Required | Description |
|---|---|---|---|
| type | string (function) | Yes | Tool type, currently only supports function |
| name | string | Yes | Function name |
| description | string | No | Function description, helps model understand when to call this function |
| parameters | object | No | Function parameter definition in JSON Schema format |
ToolChoice
Section titled “ToolChoice”| Field | Type | Required | Description |
|---|---|---|---|
| type | string | Yes | Tool type |
| function | object | Yes | - |
ResponseObject
Section titled “ResponseObject”| Field | Type | Required | Description |
|---|---|---|---|
| id | string | Yes | Unique identifier for the response |
| object | string | Yes | - |
| created_at | integer | Yes | Unix timestamp when the response was created |
| status | string (in_progress, completed, incomplete, failed, cancelled) | Yes | The status of the response |
| background | boolean | No | Whether the response is running in the background |
| billing | object | No | Billing information |
| completed_at | object | No | Unix timestamp when the response was completed |
| error | object | No | Error information if the response failed |
| incomplete_details | object | No | Details about why the response is incomplete |
| instructions | string | No | System-level instructions that guided the model’s behavior |
| max_output_tokens | integer | No | Maximum number of tokens to generate |
| max_tool_calls | object | No | Maximum number of tool calls allowed |
| model | string | Yes | The model used for the response |
| output | Array<ResponseOutputItem> | No | Array of output items produced by the model |
| parallel_tool_calls | boolean | No | Whether parallel tool calls are enabled |
| previous_response_id | object | No | ID of the previous response in a chain |
| prompt_cache_key | object | No | Key for prompt caching |
| prompt_cache_retention | object | No | Prompt cache retention policy |
| reasoning | object | No | Reasoning configuration |
| safety_identifier | object | No | Safety identifier for the response |
| service_tier | string (auto, default) | No | Service tier used |
| store | boolean | No | Whether to store the response |
| temperature | number | No | Temperature parameter used |
| text | object | No | Text format configuration |
| tool_choice | string (none, auto, required) | No | Tool choice strategy used |
| tools | Array<ResponseTool> | No | Tools that were available |
| top_logprobs | integer | No | Number of top log probabilities |
| top_p | number | No | Top-p sampling parameter used |
| truncation | string (auto, disabled) | No | Truncation strategy used |
| user | object | No | User identifier |
| metadata | object | No | Additional metadata |
| usage | object | No | Usage statistics (null when response is still in progress) |
ResponseOutputItem
Section titled “ResponseOutputItem”| Field | Type | Required | Description |
|---|---|---|---|
| id | string | Yes | Unique identifier for output item |
| type | string (message, function_call, reasoning) | Yes | Output type (message for final response, function_call for tool calls, reasoning for reasoning trace) |
| status | string (completed, in_progress, incomplete) | No | Output status |
| role | string (user, assistant) | No | Message role |
| content | Array | No | Content array |
| call_id | string | No | Function call ID |
| name | string | No | Function name |
| arguments | string | No | Function arguments (JSON string) |
| output | string | No | Function output |
| summary | object | No | Natural-language summary of reasoning (for reasoning type), can be string or array |
| encrypted_content | object | No | Encrypted reasoning tokens for stateless workflows (for reasoning type) |
ResponseUsage
Section titled “ResponseUsage”| Field | Type | Required | Description |
|---|---|---|---|
| input_tokens | integer | Yes | Number of tokens in the input |
| input_tokens_details | object | No | Details about input tokens |
| output_tokens | integer | Yes | Number of tokens in the output |
| output_tokens_details | object | No | Details about output tokens |
| total_tokens | integer | Yes | Total number of tokens (input + output) |
Related APIs
Section titled “Related APIs”- API Overview - Learn about authentication and basic information
- models - View models related APIs
- chat - View chat related APIs
- messages - View messages related APIs
- completions - View completions related APIs
- edits - View edits related APIs
- images - View images related APIs
- embeddings - View embeddings related APIs
- engine-embeddings - View engine-embeddings related APIs
- moderations - View moderations related APIs
- audio - View audio related APIs
- search - View search related APIs
- proxy - View proxy related APIs