How to download files using Node.js#

There are three approaches to writing a file downloader using Node:

  1. Using HTTP.get
  2. Using curl
  3. Using wget

I have created functions for all of them. To get the examples working makes sure you have the dependencies and the app variables set up and defined. Read the comments in the code thoroughly, you will not only learn how to download files, but will also learn more about Node's child_process, fs, Buffer, and Stream modules.

Let's start with HTTP.get.

Downloading using HTTP.get#

HTTP.get() is Node's built-in method for making HTTP GET requests, which can also be used for downloading files using the HTTP protocol. The advantage of using HTTP.get() is that you don't rely on any external programs to download the files.

// Dependencies
var fs = require('fs');
var url = require('url');
var http = require('http');
var exec = require('child_process').exec;
var spawn = require('child_process').spawn;

// App variables
var file_url = 'http://upload.wikimedia.org/wikipedia/commons/4/4f/Big%26Small_edit_1.jpg';
var DOWNLOAD_DIR = './downloads/';

// We will be downloading the files to a directory, so make sure it's there
// This step is not required if you have manually created the directory
var mkdir = 'mkdir -p ' + DOWNLOAD_DIR;
var child = exec(mkdir, function(err, stdout, stderr) {
  if (err) throw err;
  else download_file_httpget(file_url);
});

// Function for downloading file using HTTP.get
var download_file_httpget = function(file_url) {
  var options = {
    host: url.parse(file_url).host,
    port: 80,
    path: url.parse(file_url).pathname
  };

  var file_name = url.parse(file_url).pathname.split('/').pop();
  var file = fs.createWriteStream(DOWNLOAD_DIR + file_name);

  http.get(options, function(res) {
    res.on('data', function(data) {
      file.write(data);
    }).on('end', function() {
      file.end();
      console.log(file_name + ' downloaded to ' + DOWNLOAD_DIR);
    });
  });
};

This is what the function above does: Make a HTTP.get() request and create a writable stream using fs.createWriteStream. Since the HTTP.get()'s response is a stream, it has the data event, which carries the chunks of data sent by the server. One each data event, write the data to the writeable stream. Once the server finishes sending data, close the instance of fs.createWriteStream.

If you use fs.write() or fs.writeFile() or any of their variants, they will fail for medium to large files. Use fs.createWriteStream instead for reliable results.

Downloading using curl#

To download files using curl in Node.js we will need to use Node's child_process module. We will be calling curl using child_process's spawn() method.

We are using spawn() instead of exec() for the sake of convenience - spawn() returns a stream with data event and doesn't have buffer size issue unlike exec(). That doesn't mean exec() is inferior to spawn(); in fact we will use exec() to download files using wget.

// Function for downloading file using curl
var download_file_curl = function(file_url) {
  // extract the file name
  var file_name = url.parse(file_url).pathname.split('/').pop();
  // create an instance of writable stream
  var file = fs.createWriteStream(DOWNLOAD_DIR + file_name);
  // execute curl using child_process' spawn function
  var curl = spawn('curl', [file_url]);
  // add a 'data' event listener for the spawn instance
  curl.stdout.on('data', function(data) { file.write(data); });
  // add an 'end' event listener to close the writeable stream
  curl.stdout.on('end', function(data) {
    file.end();
    console.log(file_name + ' downloaded to ' + DOWNLOAD_DIR);
  });
  // when the spawn child process exits, check if there were any errors and close the writeable stream
  curl.on('exit', function(code) {
    if (code != 0) {
      console.log('Failed: ' + code);
    }
  });
};

The way data was written to the instance of fs.createWriteStream is similar to way we did for HTTP.get. The only difference is that the data and end events are listened on the stdout object of spawn(). Also we listen to spawn()'s exit event to make note of any errors.

Downloading using wget#

Although it says downloading using wget, this example applies to downloading using curl with the -O option too. This method of downloading looks the most simple from coding point of view.

// Function for downloading file using wget
var download_file_wget = function(file_url) {
  // extract the file name
  var file_name = url.parse(file_url).pathname.split('/').pop();
  // compose the wget command
  var wget = 'wget -P ' + DOWNLOAD_DIR + ' ' + file_url;
  // excute wget using child_process' exec function

  var child = exec(wget, function(err, stdout, stderr) {
    if (err) throw err;
    else console.log(file_name + ' downloaded to ' + DOWNLOAD_DIR);
  });
};

In the method above, we used child_process's exec() method to run wget. Why exec() and not spawn()? Because we just want wget to tell us if the work was done properly or not, we are not interested in buffers and streams. We are making wget do all the dirty work of making request, handling data, and saving the file for us. As you might have guessed, this method is the fastest among the three methods I described.

So now the question is - which method is the best? The answer - whatever suits your need. The wget method is probably the best is you want to save the files to the local disk, but certainly not if you want to send those files as a response to a current client request; for something like that you would need to use a stream. All the three methods have multiple options, you choice will ultimately depend on what your needs are.

References#

  1. Node.js HTTP
  2. Node.js fs
  3. Node.js Child Processes
  4. Node.js Buffers
  5. Node.js Streams