Published by: Kage, on 08:12 pm Sunday December 23rd, 2012 - Source:
We recently recovered the HackThisSite server from a fairly serious data filesystem crash. The data wasn't lost at all, but the filesystem itself was broken and unusable. We are storing our data on a hardware RAID-10 array of 2TB drives, with an underlying UFS filesystem. On top of this was a GEOM journal. Well, in designing this system, I wasn't aware that the default journal size of 1GB would be problematic. I also found out that, since the original design and deployment of this system back in late-2011, GEOM journal has been shown to be dangerously buggy. Such was the case when the server halted on a kernel panic on Friday from a journal overflow. Subsequent reboots showed this, too:
GEOM: provider: the primary GPT table is corrupt or invalid.
GEOM: provider: using the secondary instead -- recovery strongly advised.
The answer was clear: GEOM journal had to go. It's too new and too unreliable. I'd rather us rely on the old and "slower" general UFS filesystem. And yeah, I know the arguments of ZFS, but that would require remaking the entire filesystem and restoring all its contents, which was more trouble than it's worth for the (dis)advantages. But there was one problem that proved to be quite.. unknown: How do I disable GEOM journaling?
Well, first we drop to single-user mode. Once inside, we need the root filesystem in rw mode so we can edit the fstab entry and remove the '.journal' part. To do this, we re-mount the root filesystem (/dev/da0 is ours, yours may vary. Check your /etc/fstab to be sure.):
mount -u /
mount -o rw /dev/da0 /
Now, we edit out the '.journal' part from the /etc/fstab entry for the data mount. Also, we should disable the geom_journal_load entry in /boot/loader.conf. It's worth mentioning that in single-user mode your display may not be able to handle editors appropriately, so get comfortable manipulating files with sed and other command-line methods.
There is effectively no literature on the subject of disabling a GEOM journal, so I had to rely on some trial-and-error suggestions of the FreeBSD community (big thanks to the Freenode ##freebsd channel members, namely nemysis). Finally, we determined the effective route is as follows:
# Disable the GEOM journal provider
gjournal stop da1.journal
# Remove the journal flag
tunefs -J disable /dev/da1
# Forcibly fsck the filesystem as UFS
fsck_ufs -fy /dev/da1
# Mounting should work now
mount -o rw /dev/da1 /data
Now we can re-use the underlying UFS filesystem. I'm not entirely certain why or how to get rid of it, but subsequent reboots will still show that "primary GPT table is corrupt or invalid" error. I've disabled geom_journal_load entry in /boot/loader.conf, and gjournal list now shows "gjournal: Command 'list' not available." instead of listing da1.journal, so I'm not sure why this legacy attachment still exists. The filesystem works as a normal UFS mount now, so hopefully this is benign and ignorable. If anyone has any additional advice on this, feel free to leave a comment below.
One thing we're left without now, though, is effective journaling or some method of filesystem integrity. In theory, we should not require the rapid fsck recovery of geom journaling since we ideally won't crash frequently, nor experience power outages (we're on a dual power supply system on two separate circuits and battery backups). We should also not need the meta-data security or performance increase of synchronous or asynchronous UFS, respectively, since we have onboard battery backup disk write cache on our RAID controller. I have no realistic means of testing this except against the HackThisSite server itself, so I must rely only on the details of the literature I have before me.
Instead, we reverted back down to the underlying UFS filesystem and investigated Soft Updates instead. Soft Updates grant us some security of the filesystem in the event of crash (with the caveat of producing a 30-60 second "older" version of the filesystem upon recovery), plus the performance benefits of asynchronous UFS (at the cost of memory, which we have plenty of in a 32GB RAM system). Oh, and just to amp up the speed a tad more, we set the noatime flag.
All in all, this has been a good lesson in FreeBSD filesystems and hopefully one others can learn from, too.