I’m off to the Business of Software conference in Boston, which
the organizers have been kind enough to bring to my home city even
though they appear to live in another country altogether.
After spending all too many hours toiling away on Drupal 5 sites
without the benefit of unit testing, I’m finally getting to spend some
quality time with Drupal 6’s
SimpleTest module. Things have
certainly gotten a lot better in the Drupal testing world over the
last year or two.
As I work to develop my testing chops it’s hard to avoid pining for
Ruby, where the community has been obsessed with testing for years,
where lots of experiments have been tried, and where the language
syntax actually works with you to make test code more elegant. Even
something basic, like asserting the presence of an exception in PHP:
…becomes so much nicer in Ruby test/unit (which is hardly the last
word in Ruby testing utilities):
123
assert_raiseMyKindOfErrordosomething_broken()end
But, though we can’t hope to mimic the elegance of Ruby’s block syntax
in PHP, we can at least try to take advantage of the Ruby community’s
hard-won experience with testing methodologies and design
patterns. One of my favorite Ruby bloggers was
Jay Fields – he’s been working more in
Java lately, but his old Rails links live on:
The Rails community has experimented with many ways of
representing the data that your tests will operate on. Rails 1.0
provided
fixtures; after
much trial and error, the consensus seems to be that
they suck
and that you will eventually figure that out. The folks I follow
have dabbled with
Object Mothers before
settling (for the moment) on the
Test Data Builders
pattern; the fairly popular
Factory Girl
library appears to be a Rails implementation of the Test Data
Builder concept.
In production software, duplicated code is generally considered
harmful. In test code,
duplicated code is often your friend. It
is more important for a test routine to be easily understood in
isolation than for it to avoid duplicating other test
routines. Test code is not production code: The tests run in
isolation, are generally read in isolation, and benefit from being
written in isolation.
While we’re on the subject of isolation:
write one assertion per test. I
like this idea, but – alas! – it is an even more radical idea in
Drupal than it is in Rails or Java, because SimpleTest is (a)
wicked slow at the moment and (b) linearly dependent on the number
of test methods. (Or such is my impression – insert a caveat
about the importance of formal benchmarking here.) I believe
SimpleTest does a lot of tedious DB setup and teardown for every
test method, and it doesn’t have
transactional tests. One
of us needs to be chained to a desk next to
boombatower and forced to
implement some speed boosts.
But, even if we are compelled to compromise in practice, we can still
hold the ideal in our hearts: one assertion per test is great,
multiple assertions per test is a compromise, and tests which
interdepend on each other in a complicated way are… not good.
Here’s one I need to think about:
writing tests without names. Not
that we will be doing this in PHP soon; even in Ruby it
looks a little weird. But
if zero names might be right, then two names is almost certainly
one too many. I’ve been thinking about the Drupal SimpleTest
convention of equipping every assertion with a gloriously
descriptive, HTML-formatted comment:
$this->assertTrue($bool, t('The bool should be true'));
and noticing that the presence of the comment makes the test
method’s name entirely redundant. Maybe I’ll experiment with using
tiny, inconspicuous method names.
In
Tom Geller’s recent grumble about the official Drupal Handbooks on Drupal.org,
he notes a problem (the tree of docs is huge and is difficult to
reorganize or prune) but – like most of the rest of us – he isn’t
quite certain how to solve it. So he offers a prayer to the gods of
WIkipedia, as so many have done before:
Converting Drupal.org’s documentation into a wiki(-like) format might help “crowdsource” the task.
Funny thing about this recommendation, which all of us have been
tempted to issue at one time or another: The docs are in a
“wiki-like format”. Any registered user on
Drupal.org can edit a page. All of its revisions
are accessible. You can show diffs between revisions.
And yet the problem is not solved. The Drupal docs are still nowhere
near as good as a wiki. Or – important distinction – they are
nowhere near as good as Wikipedia, which is what most people think of
when they think of a wiki.
You can’t spell Wikipedia without “Pedia”
But Wikipedia is not the typical wiki. (My impression is that the
average wiki on the Internet kind of sucks – the Rails wiki, for
example, was
legendarily disorganized and out of date
until recently.)
Wikipedia’s secret is that it isn’t just a wiki. It’s an
encyclopedia, an old and familiar genre with well-understood
semantics. When you see a Wikipedia link to a term like
”bluegrass music”, you
see more than just a link: You already know what is on the other
end.
The topic of the article is more-or-less completely defined by the
title of the link.
The entry will be written for a general audience and will not
contain excessive detail or be too long.
The entry may relate to other entries, but is designed to stand
alone. No prerequisites will be assumed.
It is this implicit context that makes Wikipedia so usable. That’s
why you can read a Wikipedia page filled with
hundreds of internal Wikipedia links
and not get lost. You know what each of those links are. And that’s
why each external link gets special formatting on Wikipedia: It’s
a warning. “This is not like the other links; you don’t know what’s
out there. Prepare to be rickrolled.”
The encyclopedia is flat
One of Wikipedia’s other friendly features is that the Wikipedia
architecture is a big flat list. True, there are “superarticles”,
like
disambiguation pages). And
there are pages with subtopics. But there are very few special
structures for creating such pages, and no real provision at all for
creating trees with formally defined “parent” pages and “child”
pages. So, nobody does that.
This is great because it frees people from worrying about
organization. You can start a new Wikipedia article by searching for
the intended title and clicking “create the article”. No need to worry
about where it fits in some grand Wikipedia schema. No need to figure
out how to edit the Music category to indicate that progressive
bluegrass is a kind of bluegrass. No need to figure out what kind
of music
the Flecktones
play. (Is “blu-bop” a real word? It is now!)
There is also no danger of accidentally creating two entries for the
term “bluegrass music”.
A Wikipedia editor only needs to edit the content. A Drupal handbook
editor must not only edit the content but also arrange it in a
structure. That’s a whole new level of difficulty and intimidation.
What does this mean for Drupal?
It would actually be really nice to have a Drupal Encyclopedia –
a collection of pages, designed to be read independently, which each
cover a single topic in encyclopedia fashion for a general audience.
But, as a general-purpose organizational scheme for the Handbooks,
the encyclopedia model doesn’t work. Sometimes the Handbooks need to
contain pages with obsessively detailed instructions. Sometimes we
need explanations of complex topics that go on for more than one page
and need to be read in order. Often it’s impossible to capture a
page’s entire context in its title, so we need to organize pages into
topical sections and subsections and chapters and books.
In other words, Wikipedia’s design is not a silver bullet. We’re
not trying to write an encyclopedia. We’re trying to write something
different, and more complicated. No wonder it’s so much harder to get
right.
I’m gearing up to comment on Tom Geller’s recent grumbles about the Drupal free documentation, but first I feel that I should point out the obvious: Drupal has some great paid documentation. Some highlights that I know about:
People who want to know what all the fuss is about can pick up The Lullabots’ Using Drupal, which I’m using as light bedtime reading.
Hard-core coders who contemplate deploying Drupal should definitely read Pro Drupal Development, a truly indispensable guide to Drupal’s innards.
Javascript is a Drupal component, too. An increasingly important one, in fact. Which gives me an excuse to plug one of my favorite software books ever: Crockford’s Javascript: The Good Parts. A blessedly short but dense book. Just the sort of thing that Tom Geller might like, I suspect.
The open-source revolution has proceeded so swiftly that some people seem to have forgotten that all documentation used to come in big paper books that cost money. Yet that technology is still surprisingly useful! And it really isn’t that much money, compared to the medical expense of continually banging your head on your desk.
I’ve been trying to follow
Mikkel Hoegh’s handy instructions
to mirror some of the Drupal CVS repositories onto github, so that git
users like myself can make better use of our favorite tool. But I hit
a snag last month when I found a bug in my Drupal git mirror. A commit
from last December had failed to go through, breaking all the Drupal 7
commits since then.
I’ve spent some time off and on trying to figure out what went wrong,
since a mirror with occasional errors and omissions isn’t very
useful. Just today I found
a very handy post on Stack Overflow
that appears to have solved my problem.
I had tried using Mikkel’s command to mirror a CVS repository with git:
One tiny dash makes the difference. My new mirror appears to be
correct. I’m glad someone else found this workaround, since I would
probably have never found it myself.
Having said that, I’m actually going to take down my own Drupal mirror
on github and recommend that everyone use
Mikkel’s instead. It’s
correct and, so long as he is maintaining it, it makes no sense to
waste resources maintaining it twice. What I’m going to do is put up
some mirrors of common Drupal modules to go along with it.
UPDATE:This system was an experiment and I no longer use it. I feel
it’s too complicated to be worth the effort.
Any programming project – including Drupal projects – should use a
version control system. My favorite such system is
git. If you haven’t tried it I recommend that
you learn all about it at the tutorial section of
github, or from Peepcode’s
git screencast.
Assuming that you understand the basics of git, let’s apply it to a
Drupal project. The simplest strategy is to create a single git
repository that holds everything in your project. You download Drupal
core and modules (using FTP, the
Update Status module, or
drush) and you check them into git
as you install them. Your custom changes get checked into the same git
repository.
Here’s a couple of hints:
Use branches to separate your code from contrib
You want to make it easy to distinguish changes that you make from
those made by others. Creating branches is a good way to accomplish
this. When you first set up your project, create:
A branch called core. You should check the Drupal core code into this branch.
A branch called modules, based on core. When you install a third
party module, you should do so in this branch. (Actually, it would
be ideal to create a separate branch for each module, but that’s a
bit of work to manage so I’ll hold off on recommending that. I’d
hate to scare you away on the first day.)
Branches for your site’s code, based on modules. You’ll probably
want a development branch (I tend to name this devel) and a
production branch (for which I often use master).
When you need to update core, you do so in the core branch, then
merge core into modules and any other branch that depends on
it. When you need to update a module, you do so by switching to the
module branch, performing and committing the update, and then
merging module into your development branches.
Tell git to ignore certain files
There are certain files in your Drupal installation that you should
probably not have under version control at all:
The files directory, which contains uploaded files.
Any settings.php files.
Utility files used by your editor or IDE: .project files, TAGS
files, etc.
There are two ways to make git ignore certain files. One is to put the
names of those files in a file called .gitignore in the base
directory of the respository, right next to the .git directory. The
other is inside the .git directory itself: If you add the name of a
file to .git/info/exclude it will be excluded from the repository.
No matter which method you use, git won’t delete the ignored files –
it will just pretend that they aren’t there. You can ignore whole
directories, and you can use wildcards to make git ignore entire sets
of files with similar names.
How do you choose which ignoring method to use? The idea is that
.gitignore is part of your project: You check it in to git, and it
gets copied around wherever your project goes (e.g. to your
development server). So if you want to ignore a file across all
servers (like the files directory, which will exist everywhere you
install the code), you should put that file’s name in
.gitignore. Whereas .git/info/exclude is for files that only occur
in your local repository and aren’t expected to be anywhere else, like
editor settings files, or the directory beneath sites which
corresponds to your local machine’s test domain.
Example
Here’s a set of example commands for building a new project. We’ll
assume you’ve already downloaded the necessary Drupal tar files to
~/Downloads.
mkdir myproject
cd myproject
git init
# make .gitignore right away. # This will be handy later; it also# gives us something to check in immediately. # You can't start creating branches without# checking something in firstecho"files" > .gitignore
git add .gitignore
git commit -m "New Drupal project: myproject"# create the core branch and install drupal coregit branch core
git checkout core
tar xvfz ~/Downloads/drupal-6.9.tar.gz
git add .
git commit -m "Installed Drupal 6.9 core"git tag DRUPAL-6-9
# create the modules branch and install a # couple of modules.# This time let's make the branch and check it out # in one command.git checkout -b modules core
mkdir sites/all/modules
cd sites/all/modules
tar xvfz ~/Downloads/views-6.x-2.2.tar.gz
git add .
git commit -m "Installed Views 6.x-2.2"tar xvfz ~/Downloads/cck-6.x-2.1.tar.gz
# add and commit in one commandgit commit -a -m "Installed CCK 6.x-2.1"# now get to work on your own codegit checkout -b devel modules
When it comes time to upgrade core:
12345678
git checkout core
tar xvfz ~/Downloads/drupal-6.10.tar.gz
git commit -a -m "Upgraded to Drupal 6.10"git tag DRUPAL-6-10
git checkout modules
git merge core
git checkout devel # or any other branchgit merge modules
Now, if you need to know the difference between the current version of
core and the previous one:
git diff core^ core
If you need to know the difference between the current official
version of the Views module and the one in your code (perhaps because
you’ve made a few patches):
git diff modules devel sites/all/modules/views
Or, for all the work you’ve done on the Views module in the devel
branch since the last time you merged modules and devel:
The blog has now been updated to Drupal 6. There aren’t a lot of
visible differences yet (unless you are an IE6 user… you folks have
been officially deprecated) – the very nice Drupal 6 theming
improvements made it pretty easy to duplicate my old style, while
cleaning it up a lot at the same time.
Some people have accused D6 of being both easier and harder to theme,
but so far all I can perceive is the “easier” part, perhaps because
I’ve been elbow-deep in the Drupal 5 theming functions and know how
complicated that was.
It’s funny how much time I spent looking for a much more obscure IE6
problem (you know, one like the other 1,001 IE6 problems) when the
evidence of this one was staring me in the face. It took a while
before I thought of googling “IE6 too many stylesheets bug”, because
who would impose an arbitrary thirty-stylesheet limit? What would be
the point of that?
I never saw it coming. Touché, Microsoft engineers!
Suppose you would like to work on the open-source Insoshi project. You
have the URL of its official git repository:
git://github.com/insoshi/insoshi.git
You could simply clone this URL to your own machine:
$ git clone git://github.com/insoshi/insoshi.git
after which you could mess around with your local copy as much as you
like. But it’s hard to share a local repository with your friends, or
among several machines, so perhaps you’d rather create a private fork
on a server somewhere. It’s easy to do this with a GitHub-hosted
project like Insoshi – you get a GitHub account, then press GitHub’s
“fork” button, and you rewarded with the URL for your own personal
GitHub-hosted fork:
git@github.com:mechfish/insoshi.git
Now you could make a local clone of that remote fork:
$ git clone git@github.com:mechfish/insoshi.git
but there are disadvantages to that. One is that the local clone will
have a different parent: its origin will be remote fork, while the
remote fork’s origin is the official project repository. Ideally,
you would want the two repositories to be interchangeable, so that you
won’t get confused when comparing them, and so you can upload the
local version to replace the remote version if it gets lost or
corrupted.
You’d also like to be able to pull changes from the official
repository directly into your local copy, then push them up to the
remote fork, rather than having to pull every change forward through
the chain: official -> remote fork -> local copy. That’s painful.
Clone the official repository and cd into the local copy.
Create a devel branch to develop on. Note that if we want to base
our development branch on the “edge” version of the project instead
of the stable version we should do this instead: $ git branch
--track edge origin/edge; git branch devel edge
Checkout our new devel branch.
Add a remote named “fork” that points to our private fork on the remote server.
Fetch the remote fork’s contents.
Push our new devel branch to the remote fork, where it should have the same name.
Set the config for the devel branch so that it automatically
pushes changes to the devel branch on the remote fork when we do a
plain git push, and automatically pulls from the devel branch on
the remote fork when we do a plain git pull.
Note that with this setup you automatically get the feature that a
git push will push up all changes to all branches, including the
master branch. (That’s because the default refspec for remotes is
refs/heads/*:refs/heads/*.) So you can pull changes from origin
directly to your local copy, with the assurance that those changes
will be automatically migrated to your remote fork repository when you
push.
But I decided to try moving away from MAMP. One reason is that MAMP
doesn’t come with certain features – for example, I’m told that it’s
hard to get SSL working. Another is that the upgrade cycle for PHP and
MySQL is different from that of MAMP, and I sometimes want to be able
to roll my own upgrade on my own timeframe. In addition, I want to
play with Ruby technologies like Merb and
Rack and
Thin, and I didn’t relish trying
to hack them into MAMP’s config files – though it would probably be
possible to do so, the Ruby docs and blogs tend to be aimed at people
who are working outside of MAMP.
So I’ve just gotten everything installed. Here’s a quick sketch of the
process:
I’m with VanDyk, who
suggests running mysql_secure_installation once you have MySQL
installed.
Follow
Benjamin’s instructions
for compiling your own Ruby and Rubygems. This is a more dubious
move, since Apple has provided preinstalled versions of Ruby and
Gems in Leopard, and they work pretty well. But I decided to install
my own version so that I can keep control over the upgrades.
I recommend avoiding the stock Apache. I spent a while trying to
compile a PHP that would work with it, and ran into
trouble. Instead, install MacPorts and
use that to
install Apache, then download and compile PHP.
Note that I installed MacPorts’ Apache, then downloaded and built the
PHP source myself, using the following ./configure options:
I’m not sure there’s a good reason to do this instead of just using
MacPorts’ PHP. For that matter, I’m not sure that using your own MySQL
is better than having MacPorts install MySQL, or using the binary
distribution from MySQL AG. There are lots of options. But these
options seemed to work for me.
Of course, you’ve also got to set up a php.ini and an httpd.conf
and get PHP and Apache pointing at them. And I’ve still got
eAccelerator
and
XDebug
to install. It’s a bit of work. I’d advise anybody who wants to avoid
trouble to stick with the MAMP option.