Part 1: Making the Most of Async in Node.js

In the previous article, I very briefly described a bunch of concepts in distributed computing, but before we get to the meat of this series, there’s an elephant in the room that I have so far ignored.

The Global Interpreter Lock.

With very few exceptions, languages like Ruby or Node.js can really only do one thing at a time. Node.js uses libuv to do some blocking system calls and parts of network requests in the background, but everything else happens in a single thread. You cannot make a fetch while processing the result of another, and so there are certain sorts of latency you have to watch out for. All you have available to you is cooperative multitasking, and if you want to do anything more performant than some basic async code, accomplishing this without making your code look completely alien to all of your current and future coworkers, if not also yourself, is quite challenging.

This is where libraries like p-limit or throat come in.

p-limit

p-limit takes an async arrow function and return a promise. It can run N functions at the same time, as Promise resolution allows, and any additional functions go into a queue. When any one of those N functions resolves, then it starts the next function that was added to the queue. So for the cost of an additional arrow function, you get some of the benefits of cooperative multitasking without having to further pollute your code base.

Additional patterns

There are, however, a number of limitations and gotchas which the following code patterns will help you reduce or eliminate.

Keep Your Function Short and Simple

Anything in your arrow function stops any new requests from starting. That is simple enough to understand. But any complex actions performed within the limit() call can also lead to deadlock situations. One common use case for generating lots of network requests in batch processing is to load a tree of values. When writing naive Node.js code, we often lazily evaluate these calls, so that we have a promise being resolved, resulting in several more requests, which in turn resolve and then generate k² additional requests in turn. If you do this inside of a queue, eventually - or more likely, very quickly - the queue will fill up and then your earlier async functions can never resolve because they are gumming up the queue.

Instead, you should make the limit() function very short, and deal with any consequences of that resolve() on the promise that is returned from limit. That way your initial call leaves the queue, and can be replaced by any recursive calls that are made, either to the same endpoint or to others.

Promise.all/for of:

  const limit = pLimit(10);

  const responses = entries.map((entry) => limit(() => getData(entry)));

  for (let promise of responses) {
    const data = parseResponse(await promise);

    // Now the response has been removed from the queue, and the next getData call can fire.
    
    const children = data.children.map(limit(async (child) => getData(child)));
    // Do some work...
  }

This will prevent deadlocks, however it will now also cause the entire job to run as a breadth-first search; all but N of the top level requests will resolve, and then the first child request will run. Composing several instances of p-limit can be useful here, and is easier to see when working with dependent requests.

Avoid Sequential Processing of Responses

I played a mean trick in the previous example. My code example only processes the promises in first in, first out order. By processing the results strictly in the order that the requests were made, you create a kind of Head-of-Line problem. If there is a lot of variability in the response times (eg, server lag, or some queries being many times more expensive than others), you will end up with a bunch of fulfilled responses stacked up behind one slow one. When that one finishes, you will immediately lock up the CPU processing all of the resolved responses behind it and you may end up not firing any new requests until that logjam clears. While the promise resolution for the completed requests is monopolizing the event loop, no new requests are going out. So then the queue experiences another cold start (empty queue).

Every cold start issues N new requests at practically the same time, causing many of those requests to resolve at practically the same time. This stutter will continue until variability in the response times causes the promises to smooth out so that only a couple new requests go out for every couple of responses that are received.

So you really want to do out-of-order processing whenever your workflow allows it. But also avoid doing it within the limit() call. It’s not particularly difficult to do, it’s just a problem you should watch out for.

  const limit = pLimit(10);

  const responses = entries.map(async (entry) => {
    const response = await limit(() => getData(entry));

    const data = parseResponse(response);
    // Do some work...
  });

  return Promise.all(responses);

Composition: One Queue Per Endpoint

Rarely will you have two services with the same tolerance for traffic spikes. But even if you do, you should use a separate instance of p-limit per service, because one, things may not always stay that way, two, your task will likely get more complex over time (which could lead to the deadlock mentioned above), and three, generally two services will experience separate slowdowns and you want to leverage spikes in back pressure for each one separately.

When you are fulfilling an online request by making one call each to two services, all that matters is the p50 time at the exact moment that you make those requests. When you’re doing batch processing, it’s the average response time for the entire duration that matters, and that will jump up and down from moment to moment. You want to “make hay while the sun is shining’ - fulfill as many requests as you can during every momentary lull in the average traffic on that service by keeping the pipeline full. And if that service hiccups, make progress on the other service.

So you retire the p-limit call to the first endpoint ASAP, process the data off of the queue (like we discussed above), then issue the second request on a separate queue.

  const serviceALimit = pLimit(10);
  const serviceBLimit = pLimit(14);
  
  const responses = entries.map(async (entry) => {
    const response = await serviceALimit(() => getData(entry));

    const data = parseResponse(response);

    const children = data.children.map(async (child) => {
      const childResponse = await serviceBLimit(() => getInfo(child));
      
      // Do some work...
    });
    
    // Do some more work...
  });

  return Promise.all(responses);

Next Steps

This is more or less a good length for a post, so I will end here and next time we can go over tuning and error recovery. After that, we can move on to more challenging scenarios, such as heterogeneous workloads.