Part 2: Making the Most of Async in Node.js

In Part 1 we discussed how to use p-limit and how to avoid some common pitfalls. Today we are going to go over capacity planning, and cover one last common pitfall - retry logic.

How to Size Your Queues

Now that we have addressed many of the gotchas with single threaded async code, we should discuss queue sizing to be a nice neighbor.

We should already have an idea what too slow and too fast look like if the job is not brand new. But in my limited sample size, I found that letting your process consume around 1/10, possibly 1/8th, of the bandwidth of any one service is generally sustainable. To figure that we’re going to need one more concept, Little’s Law

Little’s Law

With apologies to anyone who doesn’t need Little’s Law explained to them, it’s an observation about capacity that is extremely useful for capacity planning, which this question essentially is. It’s an idea from queueing theory, so it is state in terms of the number of people/tasks waiting in a queue based upon the average number of people/tasks that arrive per unit time (λ), how long each person/tasks needs to be waited upon (W), which then determines how many employees/processes you need to stay caught up with the the observation that the rate of task arrival, times the average latency, tells you the amount of requests that will be in-flight at any given moment. That gives you a rough indication of the amount of processes you need to fulfill those requests. 1000/s and an average response time of 0.1s means you are working on 100 requests on average at any time.

L = λW

So if we want to issue 1/10 of the capacity of the existing system, we can either figure out how many CPUs it has available and divide by ten, or we can work it out from the telemetry dashboards and work out the 7 day averages (or for a cron job, during the time when it will run) and go from there.

 limit = request rate x p50 response time / 10

So let’s say it’s currently handling 6000 req/s and takes 50 ms on average to respond. That’s 300 requests at a time, so we want a limit of 30.

  parallelism = 6000 * 0.05 / 10

You will want to keep track of the people running that service in order to adapt if they manage to optimize the service, or go on a cost savings initiative to downsize their clusters, but 30 will likely hold for quite some time without resulting in chatter on the devops chat channels.

You will want to experiment with a range roughly between 50% and 150% of this number, because you will rarely see exactly 30 requests in flight at a time since the rest of your computations will cause a bit of latency when starting the next request. You may find the run time is exactly the same at parallelism = 24. To be a better neighbor, it’s better to err on the side that reduces your peak load on the service but still accomplishes a similar run time for your job.

Retry Logic

One of the toughest forms of coordination is between humans. Especially in very large organizations where the left hand does not know what the right hand is doing. Just because your service is playing nice in the data center, doesn’t mean everyone else’s are, and you will eventually end up running multiple jobs hitting the same cluster at the same time. When that happens you’re more likely to encounter timeouts due to firing request faster than the service can fulfill them. Even with our sophisticated tuning logic from the previous section.

This gets very tricky with something like p-limit because we aren’t rate-limiting. So if calls start returning 502 or 429 responses very quickly, we may end up slamming through all of our requests in a second instead of paced out over two minutes.

We can solve this by running retry logic within the limit queue rather than outside of it. As long as one request is timing out, any other request is likely to as well. So better to stop here than keep powering down the work queue. For this, there is p-retry, by the same author. It does an exponential backoff which should typically suffice.

  import pLimit from "p-limit";
  import pRetry from "p-retry";
  
  //...

  const limit = pLimit(10);

  const responses = entries.map((entry) => pRetry(async () => {
      const response = await limit(() => getData(entry));
  
      const data = parseResponse(response);
      // Do some work...
    },
    { 
      retries: 5, 
      minTimeout: 500,
      onFailedAttempt: (({error, attemptNumber, retriesLeft, retriesConsumed}) => {
        console.log(`Attempt ${attemptNumber} failed. ${retriesLeft} retries left. ${retriesConsumed} retries consumed.`);
      }),  
    }));

  return Promise.all(responses);

Next Steps

As promised, Part 3 will move on to heterogenous workloads, and how to smooth them to achieve better throughput.