How Does Golang’s Garbage Collection Affect Latency? An Informal Analysis

Wenbo Zong
4 min readJun 21, 2020

--

In this post, I will share an informal analysis on how the Golang’s garbage collection (GC) affects the latency. The analysis is informal as it is not a rigorous mathematical formulation and only serves as a ballpark estimate of the latency’s lower bound.

Before we start, it is important to emphasise that the analysis in this post is only applicable to the high-throughput applications. On the other hand, if the application has low throughput, GC rarely plays a significant role in latency.

Garbage Collection Metrics

In order to understand how Golang garbage collection works and interpret the GC traces, here’s an excellent article if you are not familiar with it.

I’ve written a simple Jupyter notebook to process the GC traces (https://gn jbithub.com/zongwb/golang-gc-analysis). For this analysis, we will be looking at three metrics:

  • Average GC cycle: the average duration of a GC cycle, measured from the start of one GC to the start of the next GC. This is the wall clock time.
  • Average GC CPU time (Normalized): the average CPU time spent on GC in one cycle, normalized/converted to wall clock time.
  • % of GC CPU time: percentage of CPU time spent on GC.

To illustrate what these metrics mean, let’s consider a contrived example. Assume the throughput is 10,000 QPS (Note: It has to be high throughput for our analysis to make sense), and more importantly assume the requests arrive at evenly spaced intervals, i.e. constant arrival rate. This means there are 10 requests every millisecond, as visualised in the following diagram.

Visualising the requests arrival at evenly space time intervals

Let’s further assume the average GC cycle is 1 second, and average GC CPU time is 10 milliseconds. It follows that 1% of CPU time (10ms / 1000ms=1%) is spent on GC. When GC is taking place, the requests that arrive during this window will be not be processed until the GC is completed. That means the GC will introduce extra latency for these requests. With the uniform distribution assumption, the requests that arrive during the active GC window will suffer a varying amount of latency, as illustrated in the following diagram:

Latency of requests arriving at different time instance (relative to GC start)
Latency of requests arriving at different time instance (relative to GC start)

Therefore, we can reason that half of the requests arriving in this 10-millisecond GC window, which is equivalent to 0.5%, will suffer an extra 5ms latency at the minimum. In other words, the 99.5-percentile latency will be at least 5ms.

Analysis of an IO-Bound Application

I ran profiling on a simple application that was designed to benchmark the redis GET throughput. The latency is calculated by the Prometheus library. The results with one and two CPUs are summarised as follows:

We can draw two observations from this experiment:

  • There is no significant difference in performance between one- and two-CPU cases. This roughly shows that most of the time is spent on network IO, instead of computation.
  • We’d expect about 0.27% (average of 0.53% and 0.55%, divided by 2)of requests would suffer at least 4ms (average 8.36ms and 7.79ms, divided by 2) latency due to GC (taking the average of the two runs). The actual latency is of course higher, with 99.5-percentile latency at 25ms.

CPU-Bound Applications

It would be incomplete if we don’t consider CPU-bound applications. Generally speaking, the latency of CPU-bound applications is less affected by GC because computation is generally quicker to complete than a network RPC, and GC may not be able to interrupt the computation (I’m not too sure about this, though).

I ran profiling on a sample application that computes the signature using the ecsda package. To avoid contention between goroutines, I forced the number of computing goroutines to be the same as CPU cores. The result of running with 1-, 2-, 4-CPU is shown below (Note the latency is expressed in microseconds):

Had we reasoned the latency using the same argument as before, we’d expect the 99th-percentile latency to be around 1.32ms for the 1-CPU case. However, the actual figure is only 127 microseconds, an order of magnitude lower. My explanation is that the computations are not interrupted by GC (at least most of the time), hence the GC does not affect the latency in this application .

Wrap up

Latency is probably related to garbage collection in very complicated ways. While it’s difficult to establish a direct relationship between them, we may estimate the latency’s lower bound for an application under certain assumptions: (1) high throughput; (2) requests at constant arrival rate.

Keep in mind that this analysis is more applicable to IO-bound applications than to CPU-bound applications. Even for IO-bound applications, the estimate should only be taken as a lower bound, and is usually way below the actual latency.

--

--

Responses (1)