Table of Contents

Rate Limit Best Practices

Overview

When using Authlete’s API endpoints you should consider Authlete’s Rate Limit Policy and develop your applications considering resilient strategies to prevent service degradation, avoid self-inflicted Distributed Denial of Service (DDOS) patterns, and ensure continuous operation during peak load. The following technical best practices are essential we recommend our customers to consider.

Implement Intelligent Caching for Query Endpoints

Caching is the primary defense against unnecessary API calls and rate-limit violations. In the case of Authlete there are endpoints that are used regularly and return information that do not change that often and can be cached to improve performance and to reduce the risk of being rate limited by external factor. The following table is an non-exhaustive list of endpoints that should be considered when planning caching in your application.

Endpoint	Usage	Comments
Get Service Configuration `/api/{serviceId}/service/configuration`	This API is intended to be called from within the implementation of the configuration endpoint of the service where the service that supports OpenID Connect and OpenID Connect Discovery 1.0.	Many OpenID SDKs and libraries use the discovery endpoint to simplify its configuration and usage. When this libraries are deployed as part of a web application or site your OpenID Provider might receive a large number of requests. The configuration of a service does not change often so this is an endpoint that can be safely cached.
Get JWK Set `/api/{serviceId}/service/jwks/get`	This API is intended to be called from within the implementation of the jwk set endpoint of the service where the service that supports OpenID Connect must expose its JWK Set information.	Many OpenID SDKs and libraries query the JWKs endpoint to get the public keys of the OpenID Provider and simplify its configuration. When this is applied in mobile or web apps it the JWKs endpoint of the OpenID Provider. Key rotation happens in longer periods of time, weeks or months, so it is safe to cache the result of this endpoint.
Get Client `/api/{serviceId}/client/get/{clientId}`	This API retrieves the details and configuration of a client.	Unlike the previous endpoints this endpoint does not back an specific endpoint but it is commonly used in some processes. If you need to use the Client metadata as part of a regular flow we suggest you cache the response of this endpoint.
Get Verifiable Credential Issuer Metadata `/api/{serviceId}/vci/metadata`	This API is intended to be called from within the implementation of the metadata endpoint of a service that supports OpenID for Verifiable Credentials.	The metadata endpoint is designed to be consumed by the Wallets and therefore its usage can grow depending on usage. Like other service configuration endpoints the content included in the response does not change often so this can be cached.
Get JSON Web Key Set `/api/{serviceId}/vci/jwks`	This API is intended to be called from within the implementation of the JWKs endpoint of the service where SD-JWT is implemented.	Similar to the OpenID JWKs the public keys are requested regularly by Wallets and that might lead to a large number of requests. Usually key rotation happens in longer periods of time, weeks or months, so it is safe to cache the result of this endpoint.
Process Introspection Request `/api/{serviceId}/auth/introspection`	This API is intended to be called from within the implementations of protected resource endpoints of the authorization server implementation in order to get information about the access token which was presented by the client application.	Getting the information associated with a token might happen multiple times depending on how the applications and resource servers communicate with each other. In some use cases, caching responses from Authlete’s introspection endpoint improves performance of response at resource server APIs and reduce the load on the Authlete service. For more details read this article
Process OAuth 2.0 Introspection Request `/api/{serviceId}/auth/introspection/standard`	This API is intended to be called from within the implementations of the introspection endpoint of your service	This endpoint in similar in intent to the previous one with the difference that it supports a standard way for third party controlled Resource Servers to verify the status of a token. Because there is no control on the Resource Servers using the standard introspection endpoint it is possible that they try to verify the same token multiple times in very short periods of time. In this scenarios caching might be possible.

When implementing caching your application keep in mind the following:

Cache Query Responses: Store mainly successful responses from idempotent (safe and repeatable) endpoints for a defined period. Use an in memory cache or an internal cache system (e.g., Redis) to serve repeated queries, significantly reducing the load on the Authlete API.
Configuration Review: Ensure caching infrastructure is robustly configured to handle traffic patterns, including setting adequate limits for maximum connections to prevent resource exhaustion and cascading failures during high-traffic events.
Time-to-Live (TTL) Management: Define appropriate TTLs based on the data’s volatility. Non-critical or static data can tolerate longer TTLs (e.g., 5-10 minutes).

Employ Conditional Retry Logic

Applications using Authlete’s API must employ a retry logic that is focused only on transient failures and that allows the system to process all requests. Applications that retry executing all requests immediately compound errors and can overwhelm system resources, leading to connection exhaustion, timeouts and even a self-inflicted DDoS.

When implementing Authlete retries in your application keep in mind the following:

Retry Only on Transient Errors: Limit automatic retries to specific, transient HTTP status codes indicating temporary server issues:
- 429 Too Many Requests: Indicates the rate limit has been exceeded. A retry should only occur after the duration specified in the Ratelimit-Reset header.
- 503 Service Unavailable: Indicates the Authlete API is temporarily overloaded or down for maintenance.
- 502 Bad Gateway Error: Indicates that the Authlete API is down for maintenance.
- Other 5xx errors (e.g., 500): Only retry if the error is suspected to be non-permanent and be specific to the endpoint where the error is coming from.
Avoid Retrying Permanent Errors: Do not implement retries for permanent errors (4xx) like 400 Bad Request, 401 Unauthorized or 403 Forbidden as these will never succeed without modifying the content of the request.
Avoid Layering Retries: Make sure that your application tracks retries across the context of a request to avoid retrying an operation that has already been retried before in the context. By doing this check you will avoid extending retries unnecessarily.

The _Ratelimit-Reset_ header is active and it is present in response for our Shared Cloud environments. Dedicated Cloud environments running older versions might not include the header until strict rate limits are rolled out.

Introduce Delays with Exponential Backoff

To prevent a sudden surge of requests (a “retry storm”), that may only worsen a rate-limit violation, introduce carefully controlled delays between retry attempts.

Use Exponential Backoff: Implement an exponential backoff algorithm where the delay between retries increases exponentially with each consecutive failure. This mechanism gradually reduces the request rate, giving Authlete API time to process other requests. We recommend that the first retry waits at least 500 milliseconds, 2nd after 1 second, 3rd after 2 seconds, etc., potentially with a small random jitter to prevent synchronized retries.
Define a Max Retries and Jitter: Set a maximum number of retry attempts, or time spent in retries, and incorporate a randomized “jitter” factor into the delay calculation. Jitter prevents all failed clients from retrying at the exact same moment, which mitigates the risk of a renewed traffic spike.

Implement a Circuit Breaker Pattern

Even with exponential backoff and conditional retries, a persistent or high-volume series of transient errors can cause a major service disruption on your application. Continuously attempting failed operations in this scenario can deplete your application resources and exacerbate the load on the Authlete API, leading to self-inflicted outages. When implementing the Circuit Breaker pattern in your application:

Define a Failure Threshold: Configure the circuit breaker to “open” (stop routing requests) when the rate or number of transient errors (e.g., 429 Too Many Requests, 503 Service Unavailable) exceeds a predefined threshold within a short time window.
Implement the Three States (Open, Half-Open, Closed): Closed: Normal operation. Monitor error count. Open: Immediately fail requests without calling the backend. Use a short, fixed time-out (e.g., 60 seconds) before transitioning to Half-Open. Half-Open: Allow a limited number of test requests to pass through to determine if the backend has recovered. If the test requests succeed, return to Closed; otherwise, return to Open.
Use Fast-Fail Fallbacks: When the breaker is Open, provide an immediate and non-retriable fallback response (e.g., serving stale data for endpoints that can use cache) to maintain a degraded but functional user experience without burdening the application or the failing service.
Isolate Resources: Apply circuit breakers at a granular level (e.g., per endpoint or per service dependency) to isolate failures. This can allow you to prioritize some flows critical to your application (e.g. introspection or refresh token exchange) while limiting access to others (e.g. client management)

This approach provides a final layer of defense by quickly halting operations against an unhealthy service, preserving your application’s stability and protecting the overall system from cascading failures.