Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 06 Feb 2001 21:06:16 +0100
From:      Andre Oppermann <oppermann@monzoon.net>
To:        Matt Dillon <dillon@earth.backplane.com>
Cc:        Rik van Riel <riel@conectiva.com.br>, Mike Silbersack <silby@silby.com>, Poul-Henning Kamp <phk@critter.freebsd.dk>, Charles Randall <crandall@matchlogic.com>, Dan Phoenix <dphoenix@bravenet.com>, Alfred Perlstein <bright@wintelcom.net>, Jos Backus <josb@cncdsl.com>, freebsd-hackers@FreeBSD.ORG
Subject:   Re: soft updates and qmail (RE: qmail IO problems)
Message-ID:  <3A805938.96ED890D@monzoon.net>
References:  <Pine.LNX.4.21.0102061555550.1535-100000@duckman.distro.conectiva> <3A805035.C71AAD5E@monzoon.net> <200102061943.f16Jhp365113@earth.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Matt Dillon wrote:
> 
> :> Pre-softupdate BSD semantics, apparently. Doesn't sound like
> :> the smartest thing to do when you want a reliable MTA...
> :
> :This description is not entirely right.
> :
> :Qmail depends on ordered-metadata updates (Terry! :-). That means
> :if you issue a link() to the new place and a unlink() in the old
> :place it should guarantee that the link() happens *BEFORE* the
> :unlink(). At least standard FFS/UFS does this. Linux ext2 might
> :do the the unlink() before the link() and a crash in that moment
> :will loose the file completely. It is all about the ordering
> :guarantee.
> 
>     No filesystem can guarentee ordered metadata upates.  Well, that
>     isn't quite true... a journaled filesystem can, but it is usually
>     undesireable to have a global ordering guarentee for a filesystem
>     because you can end up in a situation where you have lots of processes
>     banging on unrelated areas of the filesystem in parallel, and an
>     fsync() of one descriptor would have to wait for the entire filesystem
>     to reach a synchronization point to guarentee metadata update ordering.
>     This creates a serious scaleability issue within a filesystem!

Yes, my understanding of the meaning of "ordered meta-date update" as
I have grasped it from Terry's rants in the past years is not that all
meta-data updates on a filesystem have to be done one-after-the-other
but ordered in respect to each other; That a link() happens before a
unlink() on the same file. Does this make sense?

>     Standard FFS/UFS does *NOT* guarentee ordered metadata updates.  It
>     uses synchronous directory updates for certain operations, but these
>     only provide guarentees when operating on lightly loaded directories.
>     Heavily loaded directories can still (and probably will) wind up in a
>     corrupted state if the system crashes at the wrong time.  Dealing with
>     the issue of multiple processes banging on a single directory all
>     at the same time, doing simultanious file creates and deletions, is a
>     very complex problem to solve, and FFS/UFS does not solve it.  Softupdates
>     solves the problem, but even softupdates still doesn't try to guarentee
>     metadata update ordering because it is extremely difficult to do it and
>     still have reasonable filesystem performance.

Qmail has a couple of directories for the different states a queued
message goes through. The whole queue structure is required to be on
the same partition/disk. After the completing of each step in the queue
it is moved through the use of link() and then unlink() to the next
directory. If link() only returns *after* it has written the new
directory
entry to the disk the transaction system of qmail is happy. If a crash
happens when the the file linked to the new place but not yet removed
from
the old place qmail detects that after a reboot because it uses the
inode
number of the file as the filename of the queue file and does a roll-
forward to the next stage. If the link() did not happen because of the
crash no link will be in the next stage directory and the unlink() will
not have been issued. This is sort a natural roll-back. In worst case
a message will be delivered two times but never lost.

>     So the simple answer here is that if QMail is relying on ordered metadata
>     updates, it is relying on something that virtually nobody supports
>     with any real level of confidence.  If you want to achieve a database's
>     transactional qualities, you need to write meta-data operations to a log
>     file, cluster the writes within a filesystem block properly, and fsync()
>     the log file so you can rerun it after a crash.

To go around this qmail uses a file-system transaction strategy I've
described above.

>     (I will mention here that, of course, sendmail and postfix are no better
>     in this regard.  This is not a detriment to QMail itself verses other
>     mailers.  Since QMail fsync()'s reasonably, it will be just as reliable
>     as other existing MTAs).

Does sendmail even use fsync()?

-- 
Andre


>                                         -Matt
> 
> :> If djb could be considered to take things like reliability
> :> and the SMTP specification into account, and not just
> :> security, then qmail would have the potential to be a pretty
> :> decent mailer.
> :
> :He did and qmail is one of the best and most reliable mailers on
> :the Internet.
> :
> :> As it is, I can only recommend people to go with something
> :> like postfix, Exim or zmailer ...
> :
> :Have a look at the qmail source and the facts before you spill
> :out such a *bullshit*!
> :
> :--
> :Andre


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3A805938.96ED890D>