Here at Four Kitchens we make BIG websites — a lot of them. In the past, keeping up with the development environments for all of these sites added a lot of overhead, which ultimately means more time managing code than writing it. So like good developers we asked ourselves: “how can we make this better, lower maintenance, and reusable?” What we came up with isn’t necessarily anything new, but has completely changed how our internal development process works.
Problems with traditional pagination
- Because pagination uses row offsets into the results, browsing listings where newly published items get added to the beginning of the results creates “page drift.” Page drift is where a user already browsing through paginated results sees, for example, items E, D, and C on page one, waits awhile, clicks to the next page, and sees items C, B, and A. Going back to page one again shows F (newly published), E, and D. Item C “drifted” to page two while the user was reading page one. If new items are published frequently enough, pagination can become unusable due to this drifting effect.
- Even if content and ordering are fully indexed, jumping n rows into the results remains inefficient; it scales linearly with depth into pagination.
- Paginating sets where the content and ordering are not fully indexed is even worse, often to the point of being unusable.
- The design is optimized around visiting arbitrary page offsets, which does not reflect user needs. Users only need to make relative jumps in pagination of up to 10 pages (or so) in either direction or to start from the end of the results. (If users are navigating results by hopping to arbitrary pages to drill down to what they need, there are other flaws in the system.)
Anyone experienced with high-performance, scalable PHP development is familiar with APC and memcached. But used alone, they each have serious limitations:
- Low latency
- No need to serialize/unserialize items
- Scales perfectly with more web servers
- No enforced consistency across multiple web servers
- Cache is not shared; each web server must generate each item
- Consistent across multiple web servers
- Cache is shared across all web servers; items only need to be generated once
- High latency
- Requires serializing/unserializing items
- Easily shards data across multiple web servers, but is still a big, shared cache
Combining the two
Traditionally, application developers simply think about consistency needs. If consistency is unnecessary (or the scope of the application is one web server), APC is great. Otherwise, memcached is the choice. There is, however, a third, hybrid option: use memcached as a coordination system for invalidation with APC as the main item cache. This functions as a loose L1/L2 cache structure. To borrow terminology from multimaster replication systems, memcached stores “tombstone” records.
I downloaded and installed OpenSolaris to experiment with its LAMP stack implementation. I eventually want to set up a headless, SSH-accessible box I can use to experiment with unique OpenSolaris features like dtrace on PHP and MySQL.
DrupalCamp Stockholm 2009
- Is Drupal secure: the Drupal project’s responses to the web’s most common software vulnerabilities
- Scalable Drupal infrastructure: a guide to planning, deploying, and scaling big websites
Drupalcon DC 2009
On December 2, 2008, customers of SonicWALL woke up to broken firewalls. This wasn’t the result of a real problem in the firewalls; it was a result of SonicWALL’s DRM server malfunctioning and deactivating all customer firewalls.
The relationship between customers and vendors of proprietary software is fundamentally adversarial: proprietary vendors have business models where customer activity (like installing Windows on a desktop) requires payment to the vendor. Because the activity happens entirely on the customer side and paying the vendor conflicts with the customer’s desire to save money, proprietary software vendors don’t trust their customers to pay them.
Nodes have evolved remarkably over Drupal’s history. In Drupal 4.7, node types were typically created by modules that “owned” their node types. There was no way to create a node type without a module behind it. Modules creating node types would implement hook_node_info() and directly handle the the main loading, saving, and editing of the node type. Drupal core handled the loading and saving of the title and body. Modules doing this were effectively subclassing a pseudo-abstract node class (a class containing title and body only) in core and adding their own fields.