Relevant APIs:
- All ElevenLabs generation endpoints:
https://api.elevenlabs.io/v1/text-to-speech/{voice_id}
https://api.elevenlabs.io/v1/speech-to-speech/{voice_id}/stream - https://api.elevenlabs.io/v1/speech-to-speech/{voice_id}
- https://api.elevenlabs.io/v1/speech-to-speech/{voice_id}/stream
- https://api.elevenlabs.io/v1/voice-generation/generate-voice
Prerequisites
- Have the Lunar proxy installed and configured in your environment - tutorial
- You have the Lunar interceptor installed at the service from which you are consuming the ElevenLabs API.
- [REQUEST] Concurrent Limit - This processor is responsible for limiting the number of concurrent requests it passes through, and to queue pending requests. It has several parameters you can customize:
- Group: the group that the total count of requests will count as. You can set this if you want to limit the total number of concurrent requests across different flows.
- Timeout: the maximum time a request can wait in the queue before timing out entirely. Remember that upstream services may have their own timeout, so make sure your timeout here isn’t too long compared to those.
- Limit: the maximum number of concurrent requests that can be made to the downstream API (ElevenLabs). Customize this based on your plan.
- Type: what type of queue you want to use. FIFO (first in, first out) is the most logical choice in most cases
- [RESPONSE] Send Response - If a request waits too long in the queue, we send back an error message to the upstream service. You can also customize the flow to instead pass the request to a different API, or do anything else with it…
When to Use This Flow:
Because of ElevenLabs unique pricing structure that limits the number of concurrent requests you can make using your ElevenLabs API key, this flow makes sense whenever you have an app that can make multiple simultaneous requests to the ElevenLabs API. This can be either applications where end users trigger requests to the API - and thus multiple users can make a request at the same time, or applications where different services consume the API with the same subscription.
For instance if your average number of concurrent requests is low (1-2) but it may happen that more requests (5+) are made to the API, this flow can help “smooth out” the curve and queue any extra requests made - a technique known as spike arresting.
ElevenLabs' pricing model is distinctively designed to regulate the quantity of concurrent API requests permissible under a single ElevenLabs API key. This arrangement is particularly advantageous for applications capable of executing multiple simultaneous requests to the ElevenLabs API. The relevance of this model extends to two primary scenarios. First, it accommodates applications that enable end users to directly initiate API requests. In such cases, the likelihood of concurrent requests escalates as multiple users may interact with the API simultaneously. Second, it benefits systems where various services, under a unified subscription, rely on the ElevenLabs API for data retrieval or functionality, thereby generating simultaneous demands on the API.
This flow is especially useful if you anticipate peaks in your applications usage of the ElevenLabs API. Specifically, if an application typically experiences a low volume of concurrent requests (for example, 1-2), but occasionally faces surges in demand leading to a significantly higher number of requests (5 or more). Through what is known as spike arresting, this approach effectively queues excess requests beyond the permitted concurrent limit. By doing so, it not only ensures a more stable and predictable usage pattern. Consequently, this flow facilitates a more resilient and efficient interaction with the ElevenLabs API, making it an essential consideration for developers and businesses aiming to leverage the API in their applications or services.
About the Eleven Labs API:
ElevenLabs stands at the forefront of voice synthesis technology, offering cutting-edge solutions through its API to businesses and developers seeking to harness the power of artificial intelligence for voice generation. The ElevenLabs API provides a comprehensive suite of tools designed for creating highly realistic and customizable voice outputs. With an emphasis on naturalness and flexibility, the API allows for the transformation of text into speech in a wide array of languages and accents, catering to a diverse global audience. Its capabilities extend beyond mere voice generation to include voice cloning, enabling users to create bespoke voice models that closely mimic specific voices, thereby opening new avenues for personalized audio content creation.
The platform is engineered to support a variety of applications, ranging from audiobook production and podcast creation to the enhancement of virtual assistant functionalities and the development of accessible technologies for those with visual impairments or reading difficulties. By leveraging advanced machine learning algorithms, ElevenLabs ensures that its voice synthesis is not only of high quality but also adaptable to different contexts and use cases. This makes the ElevenLabs API a powerful tool for creators and businesses alike, looking to innovate and improve user experiences with synthetic voice technology.
Eleven Labs Concurrency Limits:
The ElevenLabs API limits how many concurrent requests you can make in parallel. The specific limit is based on your subscription tier - see the following table for limits:
You can read more about the ElevenLabs API limits here:
https://help.elevenlabs.io/hc/en-us/articles/14312733311761-How-many-requests-can-I-make-and-can-I-increase-it
About Request Throttling and Spike Arresting:
Request throttling and spike arresting are crucial mechanisms in managing API traffic, ensuring the stability and reliability of services by regulating the rate at which requests are processed. Throttling is a proactive measure that limits the number of requests an API will accept over a given timeframe, thus preventing any single user or service from consuming disproportionate resources. This is especially important in a shared environment where system capacity must be equitably allocated among multiple users. Throttling helps maintain optimal performance and avoids overloading the system, which can lead to degraded service for all users.
Spike arresting, on the other hand, is a reactive measure designed to handle unexpected surges in traffic, effectively "smoothing out" spikes in demand. During periods of unusually high activity, spike arresting temporarily queues requests above a certain threshold, processing them at a controlled rate to prevent system overload. This ensures that the API remains responsive and available, even under strain. Both request throttling and spike arresting are essential for maintaining the quality of service, preventing system failures, and ensuring a fair and efficient distribution of resources among users and applications. Together, they provide a robust framework for managing API traffic, enhancing user experience, and safeguarding the integrity of digital services.
About Lunar.dev:
Lunar.dev is your go to solution for Egress API controls and API consumption management at scale.
With Lunar.dev, engineering teams of any size gain instant unified controls to effortlessly manage, orchestrate, and scale API egress traffic across environments— all without the need for code changes.
Lunar.dev is agnostic to any API provider and enables full egress traffic observability, real-time controls for cost spikes or issues in production, all through an egress proxy, an SDK installation, and a user-friendly UI management layer.
Lunar.dev offers solutions for quota management across environments, prioritizing API calls, centralizing API credentials management, and mitigating rate limit issues.