Skip to content
- Securely Store Keys: Never hard-code API keys in client-side code. It is recommended to use environment variables or a key management service provided by a cloud provider to store them.
- Implement Graceful Retries: For transient errors such as
429 and 5xx, it is recommended to implement a retry logic with exponential backoff on the client side.
- Make Good Use of Streaming Responses: For interactive chat applications, it is strongly recommended to use streaming responses (
stream: true), which can significantly reduce the response time of the first token and improve user experience.
- Configure Cost Budgets and Alerts: Set your cost budget and alert thresholds in the R9S console. When usage approaches the budget, the system will automatically notify you to avoid accidental overspending.
- Utilize Caching: For queries with high repetition and content that does not change frequently, consider adding a caching layer on your application side to effectively reduce API call costs and latency.