How We Benchmark at Lunar

At Lunar, we pride ourselves on really getting latency; our offering solves hard API consumption problems, and so we ourselves strive to have a minuscule footprint on our users’ existing latency. In this post we break down how we do it.

Eliav Lavi, Software Engineer

May 20, 2023

Latency

Benchmark

Time is one of our most valuable resources. In the era of API economy, providing and consuming APIs with satisfactory response times is essential and expected. As API providers are often measured across response time percentiles, API consumers are becoming increasingly aware of this and might define SLAs around these metrics.

At Lunar, we take latency seriously; our offering solves hard API consumption problems, and in order to do that it is located right in the middle between the API provider and the API consumer. Obviously, we strive to have a minuscule footprint on our users’ existing latency. As developers, we knew from the get-go that this is what we ourselves would have wanted as users!

To achieve that minimal latency footprint, we did a lot of research into picking the right stack and architecture. However, any assumption must always be tested in order to be proven valid, and this is where latency benchmarking comes in.

In this blog post, we will discuss the process of latency-benchmarking Lunar’s offering, taking into account the architecture of our system; we will define clear goals for our little research; we will dive deeper into how exactly is our latency benchmark set-up and run; and, of course, we will present and analyze the results we’ve arrived at, reflecting on the initially defined goals.

Let’s kick it off with a brief review of the system under investigation - Lunar Proxy.

Lunar Solution’s Architecture

Lunar’s main offering is Lunar Proxy, which is a compact component that runs right next to our users’ applications, with the goal of handling all outgoing traffic which is targeted at 3rd-party providers. While out-of-the-box it simply passes requests to the outside world and seamlessly returns their responses, as forward proxies do, it really shines when configured with remedies and diagnosis plugins.

Remedy plugins are aimed at solving actual API consumption problems: avoiding reaching providers’ rate limits, managing a cache layer, apply retry mechanisms, and more.

Diagnosis plugins allows users to dig into their HTTP transactions (i.e. request & response) and extract data and insights from them so they can be analyzed later on. For example, we support an Obfuscated HAR Extractor as one such plugin, which will allow Lunar & users to tailor remedy plugins in a data-driven manner.

While remedy plugins can affect & change requests and responses (e.g. when response is returned from cache, there is no need to issue an actual request to the provider, as the response is already available at hand), diagnosis plugins cannot.

Goals

Now that we know the system we’d like to benchmark, there are several things we’d like to know at the end of the process.

First, we’d like to find out what is the latency footprint Lunar Proxy has on response time percentiles compared to directly calling the same API provider, without using Lunar. We’d measure everything from the client application point of view. We’d like to examine a two scenarios here:

Simply passing traffic via Lunar’s offering, with no remediation or diagnosis in place
Using a simple remedy plugin on the passed traffic - one that doesn’t short circuit requests
Using a simple diagnosis plugin on the passed traffic - namely, our Obfuscated HAR Extractor

For this set of experiments, we decided to use a provider with a constant response time of 150ms which is considered quite a fast response time for a web-based API.

Furthermore, we’d like to examine if there’s any correlation between the provider’s response time to Lunar Proxy’s latency footprint, or whether (hopefully!) it is a constant one.

Figuring these out is vital for gaining assurance in our offering. We’d like Lunar Proxy to have a minimal latency footprint, regardless of the API provider’s response time.

Setting Up

Benchmarking is always an emulation of real-world scenarios. As such, great care should be given to emulating scenarios with a setup that is as realistic as possible.

To do so, we figured out that it is best if we run our experiments in a similar manner as in which our users would use our product. This section is dedicated to describing how we tackled this - we will detail where our experiments run, which components take part in them, and how all this is triggered in an organized, trackable manner.

Of course we ran our experiments in the cloud and not on our local machines. The latter are busy with doing many other things, hence less reliable for this purpose.

Topology

We allocated two different AWS EC2 instances of type c5a.large for this purpose - one dedicated to the the provider only, and another one dedicated to the client application and Lunar Proxy. This is key: there will always be real network time when calling API providers! hence, the separate EC2 instances are crucial here. On the contrary, Lunar’s product is designed to be located as close as possible to the client application which integrates with it, so it makes sense to place these two on the same EC2 instance.

Client

For the client side, which represents for Lunar users’ point of view, we used Apache AB: a simple yet powerful command line tool which is capable of issuing a finite number of parallel HTTP calls to a given URL and gather the results in a CSV file. Out-of-the-box, it reports the response time per percentile, which is exactly what we need!

We routed Apache AB to call Lunar Proxy, which would, in its turn, forward those calls to the Provider. This emulates the way client applications would use with Lunar Proxy. Since updating the configuration of our product is a breeze, it was easy for us to toggle its state between employing a remedy, a diagnosis, or none of them.

To emulate the baseline scenario, which doesn’t uses Lunar at all, and to which we would compare our results, we simply made the calls from Apache AB directly to the Provider.

We ran Apache AB directly from the command line and Lunar Proxy as a docker container.

Provider

For the provider side, we used go-httpbin, a Go port of the well-known httpbin.org , packed as a docker image. The nice thing about it is that by calling the /delay/{seconds} endpoint, we could emulate different provider runtimes with no effort: /delay/1.5 would take about 150ms to return; any value up to 10 (which is 10000ms) is supported. See the official documentation here.

Orchestration

To run all this and gather the results so we can later analyze them, we crafted a modest bash script that would set up the client side in the appropriate fashion - whether it should call the provider directly, or via Lunar Proxy, either with remediation/diagnosis in place, or not. This script works as follows:

Clean up the client environment by restarting Lunar Proxy’s container with the appropriate configuration in place
Trigger running a warm-up round of the experiment for a small number of requests - we set this number to 1K
Trigger running the full experiment. We set it to issue 100K requests with concurrency factor of 100
Copy the results, a CSV file containing the response time percentiles, into a dedicated folder on a local machine
Append a metadata file so it’s clear what exactly was under test in this specific experiment. This would help us during the investigation phase, which we’ll dive into in the next section.

Experiments & Results

As mentioned in the Goals section above, we wanted to figure out Lunar’s latency footprint from client applications’ point of view. We wanted to compare different modes of using Lunar, and also to investigate whether or not there is a correlation between some API provider’s response time to Lunar’s latency footprint.

To drill down into the data we’ve accumulated in these experiments, we created a simple Python-based Jupyter Notebook in which we were able to research into whatever we wanted. If you haven’t worked with it in the past, Jupyter Notebook is a pleasure to use as it allows you to write simple Python, using any library you might want, and get to visualizations and graphs in no-time! We used pandas, numpy and seaborn in our explorations, all are very popular tools with good documentation and which are intuitive and fun to use. So, without further ado, let’s dive into the results.

Different Modes of Using Lunar

For our first research goal, we compared three experiments:

Making direct calls to the provider (i.e. direct)
Making calls to the provider via Lunar Proxy, without remediation or diagnosis (i.e. with-lunar-installed)
Making calls to the provider via Lunar Proxy, with remediation (i.e. with-lunar-remedy)
Making calls to the provider via Lunar Proxy, with diagnosis (i.e. with-lunar-diagnosis)

For reasons mentioned above, these three experiments were all made against a constant provider response time of 150ms, by calling the /direct/1.5 path.

As you can see in the visualization below, which charts percentiles on the X axis and runtime in milliseconds on the Y axis, the differences between each experiment compared to the baseline direct experiment are rather small, across percentiles:

To make this notion clearer, let’s have a look at the same graph only with the Y axis representing the delta from the baseline, per experiment per percentile. Of course, for the direct experiment, that line would be 0 all along, as its delta from itself is 0 at every percentile:

In simple words, on the 99th percentile, Lunar Proxy adds between 22ms (with-lunar-installed) to 37ms (with-lunar-diagnosis) to the overall response time of HTTP calls, depending on the usage. On the 95th percentile, this range is runs between 4ms and 13ms, respectively.

As you might have noticed, with-lunar-diagnosis seems to have a slightly higher latency footprint in the upper percentiles, compared to with-lunar-remedy. In fact, this was surprising for us to find out, and while the increase is still of only a relatively small amount of milliseconds - even in the 99th percentile - we are currently investigating this further, hoping to lower this number a little.

Correlation to Provider’s Response Time

Moving on to our second goal, in order to find out if there is a correlation between the response time of the provider and Lunar’s latency footprint, we ran 4 pairs of experiments. Each pair had its own provider runtime - 200ms, 300ms, 500ms and 1000ms - in both the baseline direct mode and in with-lunar-remedy.

Once we’ve gathered all 8 experiment results, we calculated the delta from the baseline per pair. As before, the number we’ve got represents Lunar’s latency footprint’s in absolute milliseconds per percentile. When plotted on a single chart, it looks like this:

If there was a correlation between provider’s response time and Lunar’s latency footprint, we would have seen the lines covering significantly more area of the chart as provider time goes up; however, in fact, the lines are very close to each other with no meaningful order among them. The delta seems pretty low all along the way, across percentiles.

What’s Next?

Benchmarking is an endless journey: as systems are continuously changing and evolving, the need to run latency benchmarks periodically and analyze them is always a real one. At Lunar, we identified this as a crucial quality factor of our offering, so we facilitated the benchmark process we’ve just described at an early stage. If ensuring satisfactory latency is a top concern for you as well, it might be beneficial to have your own process around it; ideally, it should be easy to run and manage!

Also, there’s more than just latency when it comes to benchmarking. Capacity benchmarks - how many requests can a component handle per second - is also a relevant quality factor that should be taken into account. We might detail on this in the future as well in a separate write-up.

Whenever something comes up during a benchmark session that we conclude as sub-optimal, we’d kick off some internal tasks to research and modify our offering accordingly; we would then re-run a new experiment which takes changes we’ve made into account. This process might take several iterations and some time until it converges into satisfying results. We didn’t just stumble upon the above results - which we’re proud of - straight away: it took several rounds of fixes of various magnitude to achieve these numbers!

‍

I hope this blog post was insightful and useful for you.

Feel free to comment below or hit me up over eliavl@lunar.dev with anything you’ve got on your mind!

Happy benchmarking! 🚀

Ready to Start your journey?

Manage a single service and unlock API management at scale

How We Benchmark at Lunar

Eliav Lavi, Software Engineer

May 20, 2023

Latency

Benchmark

Lunar Solution’s Architecture

Goals

Setting Up

Topology

Client

Provider

Orchestration

Experiments & Results

Different Modes of Using Lunar

Correlation to Provider’s Response Time

What’s Next?

Ready to Start your journey?

The Missing Layer in AI Infrastructure: Aggregating Agentic Traffic

Wrapping Untrusted MCP Tools with Lunar MCPX Tool Customization

Safely Running Untrusted MCP Servers with Lunar MCPX and Gateway Integration

How We Benchmark at Lunar

Eliav Lavi, Software Engineer

May 20, 2023

Latency

Benchmark

Lunar Solution’s Architecture

Goals

Setting Up

Topology

Client

Provider

Orchestration

Experiments & Results

Different Modes of Using Lunar

Correlation to Provider’s Response Time

What’s Next?

Ready to Start your journey?

The Missing Layer in AI Infrastructure: Aggregating Agentic Traffic

Wrapping Untrusted MCP Tools with Lunar MCPX Tool Customization

Safely Running Untrusted MCP Servers with Lunar MCPX and Gateway Integration

Get Early Access