Task Queue and Worker Pool: How to Configure Them?

Wenbo Zong
4 min readOct 2, 2019

This essay discusses some practical tips for configuring the task queue and worker pool.

Background

Task queue is a common tool for traffic shaping. Basically, a task queue acts as a buffer between the requests (tasks) and processors to smooth the intermittent heavy (bursty) loads. Such traffic shaping has a few important benefits:

  • Maximise the system throughput by reducing unnecessary overhead (e.g. context switching).
  • Protect the backend resources, as burst loads can cause (1) processes/services to fail or even crash (e.g. OOM), or (2) requests to time out due to slow response. This is often an overlooked aspect!
  • Make systems behaviour more predictable and easier to reason about, and hence allow optimal configuration (instead of guess work).

Typically, a task queue is used together with a worker pool, in a producer-consumer pattern, as illustrated below.

Typical structure of a task queue with a worker pool

With a task queue, it’s easy to apply the “fail fast, fail early” principle.

  • Enqueue logic: If queue is full, drop the request and return an error to client immediately. Fail early!
  • Dequeue logic: Check if there is enough time to process the request; if not, discard and return error to client. Fail fast!

Queue Occupancy

Let’s look at the various state the queue may be in depending on the traffic volume.

  • Normal traffic: Queue should be empty, as every incoming request should be immediately picked up by a worker. In this case, the client only experiences processing delay.
  • Heavy traffic: Queue should be occupied but not fully and all workers are busy. In this case, the client experiences both waiting and processing delay.
  • Bursty traffic: queue is quickly filled up then clears up slowly. All workers are busy, then some become idle. The client experiences both waiting and processing delay.
  • Overwhelming traffic: queue is fully occupied, and all workers are busy. The client experiences the maximum waiting delay and processing delay.

Design Parameters

Now, we turn to the main topic of this essay: How to configure the task queue and worker pool? The following parameters need to be considered carefully:

  • Number of workers
  • Length of the queue
  • Client timeout (only for synchronous request-response type)

Here, is it true that the more, the merrier? Meaning, should we always aim to have longer queues, more workers?

Number of workers

It depends on the nature of the application. For computation-bound applications, the number of workers should be directly proportional to the number of CPU cores; I’ve seen suggestions to set to 2x the number of CPU cores. For example, if you are running on a 16-core computer, you can set to 32 workers.

For IO-bound applications, it depends on the bandwidth of the hard disk. My own experience shows that 10~20 concurrent IO threads can maximise the hard disk bandwidth.

For network-bound applications, it depends on the external service’s capacity. Normally, the workers should not saturate the external service’s capacity (especially if there are other applications using that same service).

Queue length and client timeout

These two parameters are closely related and need to satisfy some constraints.

  • avg_proc_delay (aka 1/throughput_per_worker): Average time needed to process a task (after dequeuing from the queue)
  • max_wait_in_queue: When dequeuing a task, discard the task if waiting time > max_wait_in_queue
  • client_timeout: ~= max_wait_in_queue + avg_proc_delay + network_delay
  • queue_length: Satisfies queue_length / (num_workers * throughput_per_worker) < client_timeout

Let’s use one example to illustrate how to arrive at the right values. Imagine a computation bound application running on a 24-core server. Assume that

  • throughput_per_worker = 100 QPS (aka avg_proc_delay = 10ms)
  • num_workers = 24 (i.e. same as the number of cores)
  • client_timeout = 5sec (arbitrary, but sensible)

Then, we would have

max_wait_in_queue ~= 5sec-10ms = 4.99sec (not accounting for network_delay!), and

queue_length~= 4.99 * (100*24) = 11,976.

Wrap up

It must be emphasised that the above descriptions assume the tasks are uniform in nature, meaning they take roughly the same time to process. If the tasks are diverse, it becomes almost impossible to estimate the optimal settings as the processing delay for a worker changes dynamically. Then it would be advisable to consider to use a different worker pool for each type of tasks, something like below:

That’s it. Thanks for reading!

--

--