Rate Limiting for Sentinel LDK CL Service (Thales-Hosted)

Thales offers a solution called Sentinel LDK CL Service. This is a hosted cloud licensing service for vendors who use Sentinel LDK and who subscribe to Sentinel EMS. When you subscribe to Sentinel LDK CL Service, Thales hosts a high-availability license manager to serve cloud licenses to your customers. You create and manage cloud licenses and client identities using this hosted service.

Your license agreement with Thales includes limitations ("rate limiting") on the number of Licensing API calls that will be served. For details, see Knowledge Base article KB0018630.

Complying With Rate Limits

To ensure that API calls from your licensed users do not exceed the rate limits specified in the Knowledgebase article referenced above, you can configure Sentinel Licensing API or Sentinel Licensing REST API to limit the frequency with which API calls are sent to the Sentinel LDK CL service. (Rate limiting only applies to API calls that use client identities.)

Best practices for complying with the rate limits include:

> Avoid unnecessary but repetitive API calls, such as calling encrypt/decrypt every few second to keep a session alive.

>Set the interval for periodic background checks for a valid license (using Sentinel Licensing API or Sentinel LDK Envelope) to a minimum of 10 minutes.

>By default, Sentinel EMS is configured to allow licensing detaching for accounts. To offload the load on the licensing server, accept this default.

Your license agreement with Thales is enforced by an identity-based rate limiting policy using a token bucket algorithm (https://en.wikipedia.org/wiki/Token_bucket), as follows:

>Each active instance that runs protected applications using cloud licensing is assigned a bucket. The bucket is assigned a starting number of tokens. The number assigned is also the maximum number of tokens that the bucket can contain.

>New tokens are added periodically to the bucket.

>Each of the following types of Licensing API calls consumes one or more tokens from the bucket and is affected by exceeding the rate limit: hasp_login, hasp_logout, hasp_encrypt, hasp_decrypt, hasp_read, hasp_write, hasp_get_rtc, and hasp_update_session. These calls can fail with the error HASP_IDENTITY_RATE_EXCEEDED.

Each Licensing REST API consumes one token from the bucket.

>When the bucket is empty, the rate limit is considered to be exceeded.

The rate limit forces you to implement protection of your application using a limited number of API calls.

NOTE To reduce token consumption, Thales highly recommends that you use the latest version of Sentinel Licensing API.

For more information, see Sentinel Licensing API C Reference.

Rate Limiting Mechanism

The following mechanisms exist for implementation of rate limiting:

>With Sentinel Licensing API 9.12 or later:

The bucket of tokens is stored in the identity session. This means that all the applications that were started for a given identity share the same bucket. The license server always fulfills the API calls, but if the rate limit is exceeded, the license server notifies the Licensing API how long to wait before making a new call. The Licensing API then causes the API call to fail on the client side, without any connection with the license server, until the stated time has elapsed.

>With Sentinel Licensing API version 8.51 or earlier:

It is not possible to make the API call to fail on the client side. Therefore, the license server makes the API call fail and returns the error HASP_IDENTITY_RATE_EXCEEDED to the client.

NOTE This method of failing the API call is inefficient. It saves only a fraction of the server work, as the server still has to process the API call. The older Licensing API also consumes additional API calls.

>With Sentinel Licensing REST API:

Once rate limiting is triggered, the license server makes the API call fail and returns the error HASP_IDENTITY_RATE_EXCEEDED at the LDK level, and returns the error 429 at the HTTP level. A Retry-After header is included to this response, indicating how long to wait before making a new API call.