Unit testing hints that you might have missed the first time

After spending all too many hours toiling away on Drupal 5 sites without the benefit of unit testing, I’m finally getting to spend some quality time with Drupal 6’s SimpleTest module. Things have certainly gotten a lot better in the Drupal testing world over the last year or two.

As I work to develop my testing chops it’s hard to avoid pining for Ruby, where the community has been obsessed with testing for years, where lots of experiments have been tried, and where the language syntax actually works with you to make test code more elegant. Even something basic, like asserting the presence of an exception in PHP:

$threw = FALSE;
try {
    something_broken();
}
catch (MyKindOfException $e) {
   $threw = TRUE;
}
$this->assertTrue($threw);

…becomes so much nicer in Ruby test/unit (which is hardly the last word in Ruby testing utilities):

assert_raise MyKindOfError do
    something_broken()
end

But, though we can’t hope to mimic the elegance of Ruby’s block syntax in PHP, we can at least try to take advantage of the Ruby community’s hard-won experience with testing methodologies and design patterns. One of my favorite Ruby bloggers was Jay Fields — he’s been working more in Java lately, but his old Rails links live on:

  • First, the disclaimer: Even the “industry experts” on testing are still trying to figure it out, and there is no silver bullet: the right kind of testing to do depends on what you’re trying to write.

  • Having said that: Setup and teardown methods make your tests less readable and maintainable, and the guy who designed the NUnit library wishes he hadn’t supported them at all.

  • The Rails community has experimented with many ways of representing the data that your tests will operate on. Rails 1.0 provided fixtures; after much trial and error, the consensus seems to be that they suck and that you will eventually figure that out. The folks I follow have dabbled with Object Mothers before settling (for the moment) on the Test Data Builders pattern; the fairly popular Factory Girl library appears to be a Rails implementation of the Test Data Builder concept.

  • In production software, duplicated code is generally considered harmful. In test code, duplicated code is often your friend. It is more important for a test routine to be easily understood in isolation than for it to avoid duplicating other test routines. Test code is not production code: The tests run in isolation, are generally read in isolation, and benefit from being written in isolation.

  • While we’re on the subject of isolation: write one assertion per test. I like this idea, but — alas! — it is an even more radical idea in Drupal than it is in Rails or Java, because SimpleTest is (a) wicked slow at the moment and (b) linearly dependent on the number of test methods. (Or such is my impression — insert a caveat about the importance of formal benchmarking here.) I believe SimpleTest does a lot of tedious DB setup and teardown for every test method, and it doesn’t have transactional tests. One of us needs to be chained to a desk next to boombatower and forced to implement some speed boosts.

But, even if we are compelled to compromise in practice, we can still hold the ideal in our hearts: one assertion per test is great, multiple assertions per test is a compromise, and tests which interdepend on each other in a complicated way are… not good.

  • Here’s one I need to think about: writing tests without names. Not that we will be doing this in PHP soon; even in Ruby it looks a little weird. But if zero names might be right, then two names is almost certainly one too many. I’ve been thinking about the Drupal SimpleTest convention of equipping every assertion with a gloriously descriptive, HTML-formatted comment:

    $this→assertTrue($bool, t(‘The bool should be true’));

and noticing that the presence of the comment makes the test method’s name entirely redundant. Maybe I’ll experiment with using tiny, inconspicuous method names.

The other half of the word "Wikipedia"

In Tom Geller’s recent grumble about the official Drupal Handbooks on Drupal.org, he notes a problem (the tree of docs is huge and is difficult to reorganize or prune) but — like most of the rest of us — he isn’t quite certain how to solve it. So he offers a prayer to the gods of WIkipedia, as so many have done before:

Converting Drupal.org’s documentation into a wiki(-like) format might help “crowdsource” the task.

Funny thing about this recommendation, which all of us have been tempted to issue at one time or another: The docs are in a “wiki-like format”. Any registered user on Drupal.org can edit a page. All of its revisions are accessible. You can show diffs between revisions.

And yet the problem is not solved. The Drupal docs are still nowhere near as good as a wiki. Or — important distinction — they are nowhere near as good as Wikipedia, which is what most people think of when they think of a wiki.

You can’t spell Wikipedia without “Pedia”

But Wikipedia is not the typical wiki. (My impression is that the average wiki on the Internet kind of sucks — the Rails wiki, for example, was legendarily disorganized and out of date until recently.)

Wikipedia’s secret is that it isn’t just a wiki. It’s an encyclopedia, an old and familiar genre with well-understood semantics. When you see a Wikipedia link to a term like “bluegrass music”, you see more than just a link: You already know what is on the other end.

  • The topic of the article is more-or-less completely defined by the title of the link.

  • The entry will be written for a general audience and will not contain excessive detail or be too long.

  • The entry may relate to other entries, but is designed to stand alone. No prerequisites will be assumed.

It is this implicit context that makes WIkipedia so usable. That’s why you can read a Wikipedia page filled with hundreds of internal WIkipedia links and not get lost. You know what each of those links are. And that’s why each external link gets special formatting on Wikipedia: It’s a warning. “This is not like the other links; you don’t know what’s out there. Prepare to be rickrolled.”

The encyclopedia is flat

One of Wikipedia’s other friendly features is that the Wikipedia architecture is a big flat list. True, there are “superarticles”, like disambiguation pages. And there are pages with subtopics. But there are very few special structures for creating such pages, and no real provision at all for creating trees with formally defined “parent” pages and “child” pages. So, nobody does that.

This is great because it frees people from worrying about organization. You can start a new Wikipedia article by searching for the intended title and clicking “create the article”. No need to worry about where it fits in some grand Wikipedia schema. No need to figure out how to edit the Music category to indicate that progressive bluegrass is a kind of bluegrass. No need to figure out what kind of music the Flecktones play. (Is “blu-bop” a real word? It is now!)

There is also no danger of accidentally creating two entries for the term “bluegrass music”.

A Wikipedia editor only needs to edit the content. A Drupal handbook editor must not only edit the content but also arrange it in a structure. That’s a whole new level of difficulty and intimidation.

What does this mean for Drupal?

It would actually be really nice to have a Drupal Encyclopedia — a collection of pages, designed to be read independently, which each cover a single topic in encyclopedia fashion for a general audience.

But, as a general-purpose organizational scheme for the Handbooks, the encyclopedia model doesn’t work. Sometimes the Handbooks need to contain pages with obsessively detailed instructions. Sometimes we need explanations of complex topics that go on for more than one page and need to be read in order. Often it’s impossible to capture a page’s entire context in its title, so we need to organize pages into topical sections and subsections and chapters and books.

In other words, WIkipedia’s design is not a silver bullet. We’re not trying to write an encyclopedia. We’re trying to write something different, and more complicated. No wonder it’s so much harder to get right.

In praise of ancient paper technology

I’m gearing up to comment on Tom Geller’s recent grumbles about the Drupal free documentation, but first I feel that I should point out the obvious: Drupal has some great paid documentation. Some highlights that I know about:

  • People who want to know what all the fuss is about can pick up The Lullabots’ Using Drupal, which I’m using as light bedtime reading.

  • Hard-core coders who contemplate deploying Drupal should definitely read Pro Drupal Development, a truly indispensable guide to Drupal’s innards.

  • Javascript is a Drupal component, too. An increasingly important one, in fact. Which gives me an excuse to plug one of my favorite software books ever: Crockford’s Javascript: The Good Parts. A blessedly short but dense book. Just the sort of thing that Tom Geller might like, I suspect.

The open-source revolution has proceeded so swiftly that some people seem to have forgotten that all documentation used to come in big paper books that cost money. Yet that technology is still surprisingly useful! And it really isn’t that much money, compared to the medical expense of continually banging your head on your desk.

Tags:

A Very Subtle Git-CVSImport Bug, and Its Workaround

I’ve been trying to follow Mikkel Hoegh’s handy instructions to mirror some of the Drupal CVS repositories onto github, so that git users like myself can make better use of our favorite tool. But I hit a snag last month when I found a bug in my Drupal git mirror. A commit from last December had failed to go through, breaking all the Drupal 7 commits since then.

I’ve spent some time off and on trying to figure out what went wrong, since a mirror with occasional errors and omissions isn’t very useful. Just today I found a very handy post on Stack Overflow that appears to have solved my problem.

I had tried using Mikkel’s command to mirror a CVS repository with git:

git-cvsimport -v -d:pserver:anonymous@cvs.drupal.org:/cvs/drupal -o upstream drupal

Which (oddly) seems to work for him — his own Drupal mirror on github works just fine. It didn’t work for me, though. Neither did this:

git cvsimport -p x -v -k -o cvshead -s _ -d :pserver:anonymous@cvs.drupal.org:/cvs/drupal -C drupal drupal

But it turns out that the -p x flag, which means “pass the -x flag through to cvsps”, isn’t correct. You have to say -p -x, like this:

git cvsimport -p -x -v -k -o cvshead -s _ -d :pserver:anonymous@cvs.drupal.org:/cvs/drupal -C drupal drupal

One tiny dash makes the difference. My new mirror appears to be correct. I’m glad someone else found this workaround, since I would probably have never found it myself.

Having said that, I’m actually going to take down my own Drupal mirror on github and recommend that everyone use Mikkel’s instead. It’s correct and, so long as he is maintaining it, it makes no sense to waste resources maintaining it twice. What I’m going to do is put up some mirrors of common Drupal modules to go along with it.

Tags:

Getting Git Together with Drupal

Any programming project — including Drupal projects — should use a version control system. My favorite such system is git. If you haven’t tried it I recommend that you learn all about it at the tutorial section of github, or from Peepcode’s git screencast.

Assuming that you understand the basics of git, let’s apply it to a Drupal project. The simplest strategy is to create a single git repository that holds everything in your project. You download Drupal core and modules (using FTP, the Update Status module, or drush) and you check them into git as you install them. Your custom changes get checked into the same git repository.

Here’s a couple of hints:

Use branches to separate your code from contrib

You want to make it easy to distinguish changes that you make from those made by others. Creating branches is a good way to accomplish this. When you first set up your project, create:

  • A branch called core. You should check the Drupal core code into this branch.

  • A branch called modules, based on core. When you install a third party module, you should do so in this branch. (Actually, it would be ideal to create a separate branch for each module, but that’s a bit of work to manage so I’ll hold off on recommending that. I’d hate to scare you away on the first day.)

  • Branches for your site’s code, based on modules. You’ll probably want a development branch (I tend to name this devel) and a production branch (for which I often use master).

When you need to update core, you do so in the core branch, then merge core into modules and any other branch that depends on it. When you need to update a module, you do so by switching to the module branch, performing and committing the update, and then merging module into your development branches.

Tell git to ignore certain files

There are certain files in your Drupal installation that you should probably not have under version control at all:

  • The files directory, which contains uploaded files.

  • Any settings.php files.

  • Utility files used by your editor or IDE: .project files, TAGS files, etc.

There are two ways to make git ignore certain files. One is to put the names of those files in a file called .gitignore in the base directory of the respository, right next to the .git directory. The other is inside the .git directory itself: If you add the name of a file to .git/info/exclude it will be excluded from the repository.

No matter which method you use, git won’t delete the ignored files — it will just pretend that they aren’t there. You can ignore whole directories, and you can use wildcards to make git ignore entire sets of files with similar names.

How do you choose which ignoring method to use? The idea is that .gitignore is part of your project: You check it in to git, and it gets copied around wherever your project goes (e.g. to your development server). So if you want to ignore a file across all servers (like the files directory, which will exist everywhere you install the code), you should put that file’s name in .gitignore. Whereas .git/info/exclude is for files that only occur in your local repository and aren’t expected to be anywhere else, like editor settings files, or the directory beneath sites which corresponds to your local machine’s test domain.

Example

Here’s a set of example commands for building a new project. We’ll assume you’ve already downloaded the necessary Drupal tar files to ~/Downloads.

mkdir myproject
cd myproject
git init

# make .gitignore right away. 
# This will be handy later; it also
# gives us something to check in immediately. 
# You can't start creating branches without
# checking something in first

echo "files" > .gitignore
git add .gitignore
git commit -m "New Drupal project: myproject"

# create the core branch and install drupal core
git branch core
git checkout core
tar xvfz ~/Downloads/drupal-6.9.tar.gz
git add .
git commit -m "Installed Drupal 6.9 core"
git tag DRUPAL-6-9

# create the modules branch and install a 
# couple of modules.
# This time let's make the branch and check it out 
# in one command.
git checkout -b modules core
mkdir sites/all/modules
cd sites/all/modules
tar xvfz ~/Downloads/views-6.x-2.2.tar.gz
git add .
git commit -m "Installed Views 6.x-2.2"
tar xvfz ~/Downloads/cck-6.x-2.1.tar.gz
# add and commit in one command
git commit -a -m "Installed CCK 6.x-2.1"

# now get to work on your own code
git checkout -b devel modules

When it comes time to upgrade core:

git checkout core
tar xvfz ~/Downloads/drupal-6.10.tar.gz
git commit -a -m "Upgraded to Drupal 6.10"
git tag DRUPAL-6-10
git checkout modules
git merge core
git checkout devel # or any other branch
git merge modules

Now, if you need to know the difference between the current version of core and the previous one:

git diff core^ core

If you need to know the difference between the current official version of the Views module and the one in your code (perhaps because you’ve made a few patches):

git diff modules devel sites/all/modules/views

Or, for all the work you’ve done on the Views module in the devel branch since the last time you merged modules and devel:

git diff modules...devel sites/all/modules/views
Tags:

Celebrating Drupal 6 With A Test Post

The blog has now been updated to Drupal 6. There aren’t a lot of visible differences yet (unless you are an IE6 user… you folks have been officially deprecated) — the very nice Drupal 6 theming improvements made it pretty easy to duplicate my old style, while cleaning it up a lot at the same time.

Some people have accused D6 of being both easier and harder to theme, but so far all I can perceive is the “easier” part, perhaps because I’ve been elbow-deep in the Drupal 5 theming functions and know how complicated that was.

Tags:

Did you know Internet Explorer 6 limits how many stylesheets are loaded?

As a matter of fact, I did not.

Worse, IE7 apparently does the same thing.

It’s funny how much time I spent looking for a much more obscure IE6 problem (you know, one like the other 1,001 IE6 problems) when the evidence of this one was staring me in the face. It took a while before I thought of googling “IE6 too many stylesheets bug”, because who would impose an arbitrary thirty-stylesheet limit? What would be the point of that?

I never saw it coming. Touché, Microsoft engineers!

Checking out an open-source git project: the right way

(Paraphrased from Long Nguyen’s guide to checking out projects from GitHub — which is excellent, except that Long uses the name “long” for too many different things, making the code hard for mere mortals to read.)

Suppose you would like to work on the open-source Insoshi project. You have the URL of its official git repository:

git://github.com/insoshi/insoshi.git

You could simply clone this URL to your own machine:

$ git clone git://github.com/insoshi/insoshi.git

after which you could mess around with your local copy as much as you like. But it’s hard to share a local repository with your friends, or among several machines, so perhaps you’d rather create a private fork on a server somewhere. It’s easy to do this with a GitHub-hosted project like Insoshi — you get a GitHub account, then press GitHub’s “fork” button, and you rewarded with the URL for your own personal GitHub-hosted fork:

git@github.com:mechfish/insoshi.git

Now you could make a local clone of that remote fork:

$ git clone git@github.com:mechfish/insoshi.git

but there are disadvantages to that. One is that the local clone will have a different parent: its origin will be remote fork, while the remote fork’s origin is the official project repository. Ideally, you would want the two repositories to be interchangeable, so that you won’t get confused when comparing them, and so you can upload the local version to replace the remote version if it gets lost or corrupted.

You’d also like to be able to pull changes from the official repository directly into your local copy, then push them up to the remote fork, rather than having to pull every change forward through the chain: official → remote fork → local copy. That’s painful.

So here’s what to do:

$ git clone git://github.com/insoshi/insoshi.git
$ cd insoshi
$ git branch devel master
$ git checkout devel
$ git remote add fork git@github.com:mechfish/insoshi.git
$ git fetch fork
$ git push fork devel:refs/heads/devel
$ git config branch.devel.remote fork
$ git config branch.devel.merge refs/heads/devel

Here’s what these commands accomplish:

  • Clone the official repository and cd into the local copy.
  • Create a devel branch to develop on. Note that if we want to base our development branch on the “edge” version of the project instead of the stable version we should do this instead: $ git branch --track edge origin/edge; git branch devel edge
  • Checkout our new devel branch.
  • Add a remote named “fork” that points to our private fork on the remote server.
  • Fetch the remote fork’s contents.
  • Push our new devel branch to the remote fork, where it should have the same name.
  • Set the config for the devel branch so that it automatically pushes changes to the devel branch on the remote fork when we do a plain git push, and automatically pulls from the devel branch on the remote fork when we do a plain git pull.

Note that with this setup you automatically get the feature that a git push will push up all changes to all branches, including the master branch. (That’s because the default refspec for remotes is refs/heads/*:refs/heads/*.) So you can pull changes from origin directly to your local copy, with the assurance that those changes will be automatically migrated to your remote fork repository when you push.

Tags:

Installing PHP, Apache, MySQL, and Ruby on Mac OS 10.5 Leopard

If you want to get up and running quickly using Drupal on Mac OS 10.5 your best bet is to use MAMP, as John VanDyk suggests in his comments. There are MAMP setup instructions on Drupal.org.

But I decided to try moving away from MAMP. One reason is that MAMP doesn’t come with certain features — for example, I’m told that it’s hard to get SSL working. Another is that the upgrade cycle for PHP and MySQL is different from that of MAMP, and I sometimes want to be able to roll my own upgrade on my own timeframe. In addition, I want to play with Ruby technologies like Merb and Rack and Thin, and I didn’t relish trying to hack them into MAMP’s config files — though it would probably be possible to do so, the Ruby docs and blogs tend to be aimed at people who are working outside of MAMP.

So I’ve just gotten everything installed. Here’s a quick sketch of the process:

  • Follow Dan Benjamin’s instructions for compiling MySQL. This will involve creating /usr/local and adding it to your path, assuming you haven’t already done so.

    I’m with VanDyk, who suggests running mysql_secure_installation once you have MySQL installed.

  • Follow Benjamin’s instructions for compiling your own Ruby and Rubygems. This is a more dubious move, since Apple has provided preinstalled versions of Ruby and Gems in Leopard, and they work pretty well. But I decided to install my own version so that I can keep control over the upgrades.

  • I recommend avoiding the stock Apache. I spent a while trying to compile a PHP that would work with it, and ran into trouble. Instead, install MacPorts and use that to install Apache, then download and compile PHP.

Note that I installed MacPorts’ Apache, then downloaded and built the PHP source myself, using the following ./configure options:

 ./configure --prefix=/usr/local/php \
--with-apxs2=/opt/local/apache2/bin/apxs \
--with-mysql=/usr/local/mysql \
--with-curl --with-pgsql=shared \
--with-pdo-pgsql=shared,/usr/local/pgsql \
--with-pdo-mysql=shared,/usr/local/mysql \
--with-mcrypt=shared,/opt/local \
--with-openssl=shared,/opt/local \
--with-pear \
--with-gd \
--with-jpeg-dir=/opt/local \
--with-png-dir=/opt/local \
--without-iconv 

I’m not sure there’s a good reason to do this instead of just using MacPorts’ PHP. For that matter, I’m not sure that using your own MySQL is better than having MacPorts install MySQL, or using the binary distribution from MySQL AG. There are lots of options. But these options seemed to work for me.

Of course, you’ve also got to set up a php.ini and an httpd.conf and get PHP and Apache pointing at them. And I’ve still got eAccelerator and XDebug to install. It’s a bit of work. I’d advise anybody who wants to avoid trouble to stick with the MAMP option.

Tags:

Actuarial

As usual, today’s XKCD makes you think.

  • Life-Line, the story of an inventor who learns how to predict the time of a person’s death to within a single hour, was the first story that Robert Heinlein ever sold, in 1939. As in all Heinlein stories, the great part of this story is the portrait of a character: The inventor, who successfully (and a bit annoyingly) remains calm and rational about the fact of death, including his own death, even as everyone around him is freaking out.

Asimov treated the story of the soothsayer as a grand historical epic (Foundation); Clarke wrote stories full of spiritual awe (e.g. The Nine Billion Names of God, 2001), but Heinlein boiled it all down to this one guy in a shabby office. He was the Raymond Chandler of science fiction — until the 1960s, at any rate.

  • Kevin Kelly has a countdown clock of the days he has left to live:

    My friend Stewart Brand, who is now 69, has been arranging his life in blocks of 5 years. Five years is what he says any project worth doing will take. From moment of inception to the last good-riddance, a book, a campaign, a new job, a start-up will take 5 years to play through. So, he asks himself, how many 5 years do I have left? He can count them on one hand even if he is lucky. So this clarifies his choices. If he has less than 5 big things he can do, what will they be?

I appear to have eight big things I can do — forty years, 14691 days.

  • I think that a timeline which plotted the lifetimes of famous people — including the expected lifetimes of people who are alive right now — would be a great tool. I’ve loved timelines ever since I thumbed through The Timetables of History as a kid.

You need to occasionally think on a timescale that is longer than a single human life. You need to accept that people’s lives have arcs, like stories. You need to remember that the world will go on after you.

Tags: