From owner-freebsd-hackers Tue Feb 6 12: 7:57 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from mailtoaster1.pipeline.ch (mailtoaster1.pipeline.ch [62.48.0.70]) by hub.freebsd.org (Postfix) with SMTP id D53E737B491 for ; Tue, 6 Feb 2001 12:07:33 -0800 (PST) Received: (qmail 13169 invoked from network); 6 Feb 2001 20:04:22 -0000 Received: from unknown (HELO monzoon.net) ([195.134.133.140]) (envelope-sender ) by mailtoaster1.pipeline.ch (qmail-ldap-1.03) with SMTP for ; 6 Feb 2001 20:04:22 -0000 Message-ID: <3A805938.96ED890D@monzoon.net> Date: Tue, 06 Feb 2001 21:06:16 +0100 From: Andre Oppermann X-Mailer: Mozilla 4.74 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: Matt Dillon Cc: Rik van Riel , Mike Silbersack , Poul-Henning Kamp , Charles Randall , Dan Phoenix , Alfred Perlstein , Jos Backus , freebsd-hackers@FreeBSD.ORG Subject: Re: soft updates and qmail (RE: qmail IO problems) References: <3A805035.C71AAD5E@monzoon.net> <200102061943.f16Jhp365113@earth.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Matt Dillon wrote: > > :> Pre-softupdate BSD semantics, apparently. Doesn't sound like > :> the smartest thing to do when you want a reliable MTA... > : > :This description is not entirely right. > : > :Qmail depends on ordered-metadata updates (Terry! :-). That means > :if you issue a link() to the new place and a unlink() in the old > :place it should guarantee that the link() happens *BEFORE* the > :unlink(). At least standard FFS/UFS does this. Linux ext2 might > :do the the unlink() before the link() and a crash in that moment > :will loose the file completely. It is all about the ordering > :guarantee. > > No filesystem can guarentee ordered metadata upates. Well, that > isn't quite true... a journaled filesystem can, but it is usually > undesireable to have a global ordering guarentee for a filesystem > because you can end up in a situation where you have lots of processes > banging on unrelated areas of the filesystem in parallel, and an > fsync() of one descriptor would have to wait for the entire filesystem > to reach a synchronization point to guarentee metadata update ordering. > This creates a serious scaleability issue within a filesystem! Yes, my understanding of the meaning of "ordered meta-date update" as I have grasped it from Terry's rants in the past years is not that all meta-data updates on a filesystem have to be done one-after-the-other but ordered in respect to each other; That a link() happens before a unlink() on the same file. Does this make sense? > Standard FFS/UFS does *NOT* guarentee ordered metadata updates. It > uses synchronous directory updates for certain operations, but these > only provide guarentees when operating on lightly loaded directories. > Heavily loaded directories can still (and probably will) wind up in a > corrupted state if the system crashes at the wrong time. Dealing with > the issue of multiple processes banging on a single directory all > at the same time, doing simultanious file creates and deletions, is a > very complex problem to solve, and FFS/UFS does not solve it. Softupdates > solves the problem, but even softupdates still doesn't try to guarentee > metadata update ordering because it is extremely difficult to do it and > still have reasonable filesystem performance. Qmail has a couple of directories for the different states a queued message goes through. The whole queue structure is required to be on the same partition/disk. After the completing of each step in the queue it is moved through the use of link() and then unlink() to the next directory. If link() only returns *after* it has written the new directory entry to the disk the transaction system of qmail is happy. If a crash happens when the the file linked to the new place but not yet removed from the old place qmail detects that after a reboot because it uses the inode number of the file as the filename of the queue file and does a roll- forward to the next stage. If the link() did not happen because of the crash no link will be in the next stage directory and the unlink() will not have been issued. This is sort a natural roll-back. In worst case a message will be delivered two times but never lost. > So the simple answer here is that if QMail is relying on ordered metadata > updates, it is relying on something that virtually nobody supports > with any real level of confidence. If you want to achieve a database's > transactional qualities, you need to write meta-data operations to a log > file, cluster the writes within a filesystem block properly, and fsync() > the log file so you can rerun it after a crash. To go around this qmail uses a file-system transaction strategy I've described above. > (I will mention here that, of course, sendmail and postfix are no better > in this regard. This is not a detriment to QMail itself verses other > mailers. Since QMail fsync()'s reasonably, it will be just as reliable > as other existing MTAs). Does sendmail even use fsync()? -- Andre > -Matt > > :> If djb could be considered to take things like reliability > :> and the SMTP specification into account, and not just > :> security, then qmail would have the potential to be a pretty > :> decent mailer. > : > :He did and qmail is one of the best and most reliable mailers on > :the Internet. > : > :> As it is, I can only recommend people to go with something > :> like postfix, Exim or zmailer ... > : > :Have a look at the qmail source and the facts before you spill > :out such a *bullshit*! > : > :-- > :Andre To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message