Part 4: Making the Most of Async in Node.js

In Part 3 we discussed heterogeniety and some other tuning tricks. Now it’s time for some of the rest of the story.

In the time since I started this, series I’ve landed a PR in p-limit that reduces the bookkeeping overhead by around 25%. This might be as tight as this code can ever hope to get, without either help from v8, or becoming arcane code.

The Mopping Up

None of the concepts I’ve discussed are a magic wand to deal with the limitations of the single threaded model in Node.js. Task switching is more administrative overhead, as it is in every other programming language. You only get more work done when you have tasks alternating between being IO-bound and CPU bound, and the factor is ofter relatively small, on the order of 2x. So take some care with your chunk sizes, in both directions.

Arrange

In Node.js 24, awaiting a function is still about 90x slower than a synchronous function return, even when the function called is itself synchronous. This is down from > 100x in previous versions of v8. You’re going to pay a tax on mixing local cache lookups with remote calls, but you can tweak that a bit by making the call early but not awaiting it until it is actually needed.

Don’t await any promises until first use.

Routinely Confirm Your Tuning Parameters

If a job you are running can finish in the same time using 10 parallel requests or 20 parallel requests, it is a better neighbor if it is configured for 10 tasks instead of 20. That’s less jitter for all of your peers, and it costs you very little.

However, every time that service gets upgraded or the cluster size changed, that optimal number can go up or it can go down. So it is important to document your findings for how changing the settings by ±50% should affect runtime, and occasionally run that experiment again to see if the numbers still hold. Maybe increasing it used to only speed up your job by 5% and trigger more alerts. Maybe now it has no effect, or doesn’t trigger any alerts.

Be Aware of AsyncLocalStorage

AsyncLocalStorage is better than Domains, which unfortunately I have more experience with than I care to ever think about. Promise caches should be resolved down to results rather than retained in perpetuity. It uses extra memory, usually a little but but once in a while great heaping gobs of it. A giant context can end up retaining an entire previous request for as long as the cache entry isn’t evicted. Once in a while v8 will hold onto a context even though a code review tells you that none of this data should be captured by the closure. It doesn’t happen often but I had one bit of code where I had to extract a functor because nothing I did convinced v8 to drop a very large piece of context that showed up in the heap dump but not in the debugger.

But VM quirks aside, once a response crosses a request boundary, any context from the originator is now incorrect as far as any subsequent stats or logs are concerned. You want to wipe that context and substitute your own, even if storing the old context were free, which it is not.

    const promise = someExpensiveAsyncCall(...); 
    CACHE[key] = promise;
    
    let value = await promise;
    if (CACHE[key] === promise) { // Don't clobber an entry that is newer than ours
      CACHE[key] = value;
    }
    
    return value;

Acknowledgements

Special thanks to Sindre Sorhus, an open source author who demonstrates Single Responsibility Principle in his Node.js modules with a consistency I have rarely seen in the wild. Please check out his work, particularly p-limit and p-retry, which feature prominently in this series.

Posted in Async, Concurrency, Development, Performance