Date: Thu, 13 May 2004 22:47:33 +0200 From: Peter Schuller <peter.schuller@infidyne.com> To: freebsd-questions@freebsd.org Cc: John Monkey <thegreatsagemonkey@yahoo.co.uk> Subject: Re: The journalling file system saga Message-ID: <200405132247.33270.peter.schuller@infidyne.com> In-Reply-To: <40A32D0F.5050101@yahoo.co.uk> References: <40A32D0F.5050101@yahoo.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, > I had to build a storage system this week with a capacity of 1.6TB. > Regrettfully I decided to use Linux with XFS as the thought of waiting > for fsck to complete in the event of a problem makes me wince. I > experimented with FreeBSD, using two 800GB partitions and things like > that, but in the end it comes back to the fsck if for any reason the > machine goes down uncleanly. I share your reaction to the thought of fsck-after-crash, though I have come to appreciate softupdates lately after an obscene amount of googling. IMO the primary advantage to soft updates compared to journaling is that it allows good performance without write caching, since write operations can be deferred. The good part about this is that one can achieve good performance with write caching disabled on the drive/RAID, while journaling will be either slower with write caching turned off, or unsafe with it turned on. The question is whether that applies to data aswell as meta-data. I have not yet found any information as to whether soft updates guarantees the order of non-meta data (or: "Is it safe to run PostgreSQL with soft updates?"). If anyone reading this has a clue, I'd love to hear it. Unfortunately there are problems with soft updates, for me as a user. One problem is degraded performance with bgfsck, that you have already mentioned. Another problem is that bgfsck seems to be unsupported on the root filesystem (something which I am trying to fix, but it's going slowly due to lack of knowledge of FreeBSD aswell as lack of time). Yet another problem is that an fsync() no longer guarantees that data is on disk, even with write caching disabled on the media. This doesn't break things like PostgreSQL provided that the order of writes is preserved, but it does break things like MTA:s that want to guarantee that critical data has been commited to persistent storage before signaling success to an external entity (SMTP client). A very big issue is that soft updates addresses multiple problems - but it's an all-or-nothing choice. I can get good performance running "safely" (in some circumstances) by using soft updates, but if I need safety for an MTA I need to turn it off. But turning soft updates off does not only have the effect of decreasing performance, it *ALSO* creates the need for a full fsck after an unclean shutdown. But what if I need safety *AND* do not wish to have a 30 minute boot-up time? (Or in your case with 1.6 TB, I would imagine that's a LOT more than just 30 minutes...) A good solution might be to support *both* some kind of journaling/logging and soft updates. But to me that is still just a work-around for a broken foundation. I believe the fundamental problem lies in the ambiguity of fsync(). The same syscall is used to achieve different effects. A database like PostgreSQL with write-ahead logging (WAL) is concerned with making sure certain data is written before additional modifications are made (though see below). So it uses fsynch() to make sure everything is written before proceeding - thus causing a degredation in performance. But then comes qmail which needs to guarantee the data in question is *on disk*, and also uses fsynch(). This time the intended effect is specifically the goal of synch(). In the former case the intended effect was an implicit side-effect. PostgreSQL can be honored in terms of avoiding corruption (but not in terms of guaranteeing a transaction is commited to persistent storage when it returns) by softupdates provided that both meta-data and all other data is guaranteed to be written in the correct order (though again I don't know if this is the case). But qmail is not served by this. A filesystem that fulfills the requirements of qmail would also fulfill the requirements of PostgreSQL - but it would also unnecessarily decrease performance. > Is anyone remotely interested in this? Yes, for the reasons mentioned below, and strictly for practical personal use because I'd love to be able to share data between FreeBSD and Linux ;) -- / Peter Schuller, InfiDyne Technologies HB PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200405132247.33270.peter.schuller>