Drop that cron; use Hudson instead

For years, I used cron (sometimes anacron) without asking questions. The method was simple enough, and every project requiring cron-related capabilities documented the setup.

There is a much better way, and it involves Hudson. I introduced “Hudson for cron” as a sidebar at the Drupal Scalability and Performance Workshop a few weeks ago. To my surprise, several of the attendees remarked on their feedback questionnaires that it was one of the most valuable things they picked up that day. So, I’ve decided to write this up for everyone.

Feast your eyes (and votes) on our web chefs' tasty DrupalCon San Francisco session proposals


We are still 62 days away from DrupalCon San Francisco 2010, but in order to get truly excited about the event, you’ll want to start thinking about the amazing sessions you’ll attend. Beginning today, attendees are allowed to vote on those sessions they most want to see on the final schedule in April.

This year, Four Kitchens is both sponsoring DrupalCon San Francisco and offering to share our experience and knowledge with the community. If you’d like to see any of the sessions listed below, please vote! (And tell your co-workers, friends, and pets* to vote, too.)

Making Drupal and Pressflow more mundane

Drupal and Pressflow have too much magic in them, and not the good kind. On the recent Facebook webcast introducing HipHop PHP, their PHP-to-C++ converter, they broke down PHP language features into two categories: magic and mundane. The distinction is how well each capability of PHP, a dynamic language, translates to a static language like C++. “Mundane” features translate well to C++ and get a big performance boost in HipHop PHP. “Magic” features are either unsupported, like eval(), or run about as fast as today’s PHP+APC, like call_user_func_array().

Intelligent memcached and APC interaction across a cluster

Anyone experienced with high-performance, scalable PHP development is familiar with APC and memcached. But used alone, they each have serious limitations:


  • Advantages
    • Low latency
    • No need to serialize/unserialize items
    • Scales perfectly with more web servers
  • Disadvantages
    • No enforced consistency across multiple web servers
    • Cache is not shared; each web server must generate each item


  • Advantages
    • Consistent across multiple web servers
    • Cache is shared across all web servers; items only need to be generated once
  • Disadvantages
    • High latency
    • Requires serializing/unserializing items
    • Easily shards data across multiple web servers, but is still a big, shared cache

Combining the two

Traditionally, application developers simply think about consistency needs. If consistency is unnecessary (or the scope of the application is one web server), APC is great. Otherwise, memcached is the choice. There is, however, a third, hybrid option: use memcached as a coordination system for invalidation with APC as the main item cache. This functions as a loose L1/L2 cache structure. To borrow terminology from multimaster replication systems, memcached stores “tombstone” records.

Update from the Drupal 7 Contributed Modules Sprint

chx and I gathered last week in Vancouver’s West End for a two-person performance sprint during the final code slush days, allowing us to finish several key improvements to Drupal’s database layer. Right afterward, many more people joined us for another sprint to port key modules to Drupal 7. People worked in-person, voluntarily over IRC, and involuntarily over IRC (lost passport).

I can say — without reservation — that our work was successful. We kicked off the weekend with Drupal 6 versions of Coder and Views. (Though there had been a touch of prior work on the Views port to Drupal 7’s new database layer.)

Benchmarks, hot off the Pressflow

Independent benchmarks by Josh Koenig from Chapter Three show a 3000x throughput increase and a 40x load reduction moving from plain Drupal to Pressflow + Varnish. His testing was performed on a small Amazon EC2 instance, demonstrating how Pressflow can deliver internet-scale performance on modest, inexpensive hardware. Pressflow is able to deliver this class of performance because it’s optimized to support Varnish and other enterprise-grade web infrastructure tools in ways that standard Drupal cannot.

With Pressflow’s API compatibility with Drupal, Josh’s move from Drupal to Pressflow on his project didn’t require any coding or extensive testing. He just replaced Drupal core with Pressflow. (It’s no harder than a minor Drupal update.)

For single-server setups in the Amazon EC2 cloud, Josh’s Project Mercury AMI provides a click-and-run, configured setup with the Pressflow + Varnish stack. For more complex setups, Four Kitchens provides infrastructure consulting services on the Pressflow system.

Need to scale Drupal on EC2? Check out Chapter Three's Mercury project

Josh Koenig from Chapter Three has made pre-release EC2 AMIs (pre-packaged virtual machine images) for Mercury, a project to combine Four Kitchens’ Drupal-derived, high-performance Pressflow with Varnish, Cache Router, and memcached. Initial results show it easily saturating the EC2’s pipe. Mercury instances directly update their Pressflow releases from the Four Kitchens Bazaar server.

Mercury is an exciting project for anyone who needs to run a high-traffic, Drupal-based site without having to configure a bunch of caching systems.