Loving streams in node.js

Node.js is a great platform for writing scripts. Besides being Javascript and besides having access to npm it lends it very well to data processing as it’s completely async unless you specifically tell it not to be. One of the best aspects in my opinion about node.js as a data processing language is the concepts of streams and using streams to process data. Using streams can drastically lower the memory consumption by processing data as it comes down the stream instead of keeping everything in memory at any one time. Think SAX instead of DOM parsing.

In node.js using streams is easy. Basically data flows from Readable streams to Writable streams. Think producers of data and consumers of data. Buffering is handled automatically (at least in the built in streams) and if a down stream consumer stops processing the upstream producer will stop producing. Elegant and easy. Readable streams can be stuff like files or network sockets and Writeable streams stuff like files or network sockets… Or stdout which in node.js also implement the Writable stream API. Working with streams is like being a plumber so piping (using the pipe method) is how you connect streams.

An example always helps – the below example reads from alphabet.txt and pipes the data to stdout.

const fs = require('fs')
const path = require('path')

fs.createReadStream(path.join(__dirname, 'alphabet.txt'))
> a
> b
> c

Simple example but works with small and big files without too much of a difference in memory consumption.

Sometimes processing is required and for this we use Transform streams (these are basically streams that can read and write). Say that we want to uppercase all characters. It’s easy by piping through a Transform stream and then on to the Writable stream (stdout):

const {Transform} = require('stream')
const fs = require('fs')
const path = require('path')

fs.createReadStream(path.join(__dirname, 'alphabet.txt'))
  .pipe(new Transform({
    transform(chunk, encoding, callback) {
      // chunk is a Buffer 
      let data = chunk.toString().toUpperCase() 
      callback(null, data) 
> A
> B
> C

It’s easy to see how streams are very composeable and adding processing steps are easy. the pipe could even be determined at runtime. The above examples use strings but streams can also work on objects if required.

Streams are beautiful but can take some time to master. I highly recommend reading up on streams and start getting to know them. The “Node.js Streams: Everything you need to know” post is very nice and provides a good overview.

Happy coding!