Four Kitchens
Insights

Migrating HTML files into Drupal with jsdom

3 Min. ReadDevelopment

Not only do we build big websites at Four Kitchens, but we also migrate them. We’ve seen all kinds of migrations and have had to overcome different kinds of challenges to deal with them. In a previous blog installment, my colleague Mark Theunissen discussed Migrating Old HTML Files Into Drupal. In this post he touted the wonderful QueryPath library and what it enables you to do for HTML migrations. I too am a fan of QueryPath, but my biggest complaint is that it’s not quite jQuery.

Most seasoned web developers will know a good deal of jQuery and be comfortable with its callbacks and the way it behaves. Its selectors and return values are known factors and are things web developers have embraced over the years. QueryPath gets close, but due to PHP’s and more specifically Drupal’s procedural programming flow you often find yourself doing a dance to deal with data from QueryPath that would otherwise be simple to access in jQuery.

Thankfully there’s a JavaScript solution to this problem: jsdom. jsdom is an implementation of the DOM completely in JavaScript for use with node.js applications. That’s cool and all, but what does that really give you? Well, you can load any arbitrary JavaScript files along with an HTML file in jsdom, and in our specific use case this means jQuery.

Got your attention yet? Good, let’s look at an example. Let’s say you had N HTML documents like this:

Foo
Hoodie non sem mustache pellentesque eros specs tellus sagittis
8-bit ut molestie cred malesuada tellus viral leo. Porta 8-bit eu
ultricies vegan urna sit vinyl bibendum pharetra bicycle sagittis
malesuada artisan risus gravida fixie. Pellentesque eget San
Francisco lorem ut bahn mi quam ultricies Brooklyn morbi ipsum
DIY bibendum ligula keytar sed tempus VHS. Lorem vivamus
indie sapien lorem brunch cursus leo cred gravida congue keytar
non risus VHS in odio Austin eu.

Using jsdom you could cut up the documents with jQuery and prepare them for a Drupal migration import based solely on database queries. Observe:

var jsdom = require('jsdom');
var files = [ 'path/to/file1', 'path/to/file2' ];
files.forEach(function(file) {
  jsdom.env(
    file,
    [ 'http://code.jquery.com/jquery-1.8.0.min.js' ],
    function (errors, window) {
      if (errors) {
        // Bail, yo!
        // An intelligent developer would throw an error here.
        return;
      }

      (function($) {
        // Get the content.
        var title = $('.title').html();
        var body = $('.content').html();

        // Prep the values.
        var values = {
          guid: file,
          title: title,
          body: body
        };

        // Write to a migration DB that's shared with Drupal.
        abstractedDatabaseWrite(values);
      }(window.jQuery));
    }
  );
});

Now that you’ve loaded up your migration table with content from the source files your Drupal migration code only needs to query a database to create content.

If you’re not quite sold yet, consider the case where your only migration source list is an HTML file with a menu. If you don’t use jsdom you’d need to extract the contents of that menu and put them into a source map that would be read into a database table that would then be used by migration classes to parse individual files with QueryPath (yuck). Alternatively, you could use jsdom to parse the menu, open each individual file, and create database records for the content you need to migrate.

Oh, and did I mention that it’s fast? jsdom + jQuery isn’t nearly as fast as native DOM implementations with jQuery, but in our tests it’s been faster than QueryPath and the code is generally easier to follow for people that are familiar with jQuery. I can see it now, the back-end development team convinces the front-end guys to write a script for them to cut up all the content that they need to migrate… With an offering of beer, of course. Prost!