I'm not really all that mysterious

simplicity and blogging

I find myself missing emacs, which is clearly a sign of pathology. The silly thing is that I clearly don’t use even 10% of its features. It’s pure nostalgia. Emacs is the only editor (aside from Vi, I suppose) that I’ve been able to run consistently on all the platforms I’ve blogged on—Windows, Linux, Mac OS X. (Yes, I’ve blogged while using Windows, but only as a stop-gap measure.) I haven’t really ever used emacs for something that I couldn’t do with whatever basic text editor comes with the OS (Notepad, GNU nano,—although, interestingly, of these OSes, emacs comes preinstalled only on Mac OS X—in many Linux distros, you actually have to manually install it. Of course, these are the distros that favor Vi—emacs vs. vi is probably one of the oldest computing holy wars around.) I suppose there is something masochistically perverse about having to type CTRL-X CTRL-C to quit. (I still remember the first time I was faced with an empty emacs buffer in 1994, and I had to bug my UNIX guru college roommate to help me regain control of my machine—an already old-at-the-time 486 running at a paltry 50 MHz. Don’t laugh, I’ve computed on machines running at 1 MHz. Machines that you can actually play some pretty neat games on.)

OK, so maybe emacs is not the sort of thing you associate with the adjective “simple,” but my blogging style was primitive. I would type out an entry—an HTML fragment, really—using emacs, save it to my hard drive, run make to have a Perl script properly link my blog posts in chronologic order, then have xsltproc iterate through my HTML fragments, generating static HTML pages, which I would then rsync to my webhost. Once you had the Makefile written, it was all pretty automatic (although not without bugs that I never stomped out.) With the help of another Perl script on my webserver, I was actually able to add commenting (which, thankfully, the spambots never seemed to ever mess with.) Using XSLT actually allowed me to implement features that I haven’t really reduplicated yet on either blosxom and now on Wordpress. One thing that I thought was neat, for example, was that I could type a hypertext link on the main page and add commentary which would then show up only on the sidebar. (See the first iteration of my blog.) If my markup didn’t have a comment, then it would just yank the title attribute, and if it didn’t have that (which is actually frowned upon), it wouldn’t matter. The other thing that I was able to implement using my kludgery which I was unable to duplicate as elegantly on blosxom were asides, that is, a sidebar mini-blog. Also, I thought the index pages were kind of neat, too—each of my posts would have a synopsis that consisted of a sentence or two. I never implemented excerpting, though, because I never found an elegant way to mark it up. The other thing lacking was being able to break down the index page by month.

Switching to blosxom, my entries became even more primitive. Under my kludgery, each blog post was actually a valid XML document. I even had a semi-complete DTD for it. Under blosxom, each entry is an unholy alliance between a plain-text file with some magic key words (for example, meta if you were using the meta plugin) and HTML markup interspersed. Not that it was a big deal, but I thought it was kind of inelegant that you couldn’t error check your markup by running the file through a validator. This may seem like overkill, but if you start inserting some sophisticated markup into your blog posts—embedded tables, even definition, or ordered lists—markup errors can cause some head-bashing bugs that aren’t easy to fix. I suppose I could’ve written a simple script to just strip the title and the meta tags, and send the rest to the validator, but, I dunno, it just seemed inelegant. What can I say.

In blosxom, I missed some of the features of XSLT. I even created a kludge for blosxom that let me write my markup in pseudo-XML, even though all I was doing was parsing the markup with regular expressions, which we all know is fraught with peril. [1][2]

But I liked the simplicity of letting posts live in the filesystems. There is something that freaks me out about stuffing text files into a database. I just don’t feel that blog posts naturally fit into an relational database. There aren’t really any natural uniquifying keys, except for maybe the time stamp, but who really searches by time stamp?

I like how a file system naturally enforces uniqueness. You can’t have two blog posts with the same slug in blosxom, unless they reside in two different categories. This is something that I don’t think can be trivially error checked.

But, ultimately, what I want is to be able to address a file system with XPath. While the ability to have two identical slugs, just posted in different categories, may be a feature for some people, I can’t imagine how this would work in reality. How would two posts with the exact same slug be not in the same category? Ultimately, I don’t think hierarchical categories are all that useful. What is more useful are tags, mainly because a blog post can be tagged with multiple designations. So, ultimately, I would probably just have posts live in one directory, with tags encoded in the XML files themselves. This makes the search algorithm perhaps more inefficient. Instead of doing a straightforward file system seek, I am forced to parse (at least partially) every file in the directory to find the tags that I am looking for. If I ever get my own blogging engine off the ground, I will need to profile this.

And while timestamps live in the file system itself, and rsync pretty much takes care of keeping them real—those who still use ftp clients to upload web pages know how inconsistent preserving timestamps can be—I would feel more at ease sticking the timestamp into the file itself as more XML metadata. The main reason for doing this is so that you can freely edit old posts without disrupting the temporal flow of your blog. This is the mindset involved in several blosxom plugins, which either store the timestamp in cached metadata files, or stick them in the file itself, albeit without XMLish/SGMLish markup to delimit where they live. The cached metadata (which are basically just serialized perl hashes) is probably the most efficient of these methods, the infixed metadata (with or without angle-brackets) the least efficient. Unfortunately, the infixed metadata is probably also the most user-transparent. While time stamps can be rather cryptic, they are ultimately human readable and human parsable. In contrast, not too many people can readily interpret a UNIX epoch date-and-time (which is how it is stored in the cached metadata, and which is the true representation of date-and-time for a computer, although most OSes provide ways to look at it in more human-readable form.) This, on the other hand, becomes another question of processing efficiency. Converting human readable timestamps into UNIX epoch seconds and vice versa eats quite a bit of processing cycles. Is infixing the epoch seconds as XML not the “right-thing” to do?

Anyway, yeah, I don’t know why I’m obsessed with using XSLT. I suppose it’s the amount of pain I underwent trying to learn the stupid thing. I’m not sure I could write such a stylesheet now. But XSLT has saved me a great amount of time. Back when HTML 4.0 with tables for layout were in vogue, misplaced closing tags in my templates (i.e., themes or skins) were the bane of my existence. The fact that XSLT could be validated as XML made templates easy to debug, and I’m not sure this ease of debugging is quite as present in the hybrid file format that blosxom uses (although the theme plugin has made it more so) and definitely not so in PHP, which is what Wordpress and many other tools are written in.

I’m not quite sure the world exactly needs yet another blogging engine, but I would like one that doesn’t need to use a relational database.

Powered by Bleezer

initially published online on:
page regenerated on: