From owner-freebsd-small Tue Dec 25 19: 8: 6 2001 Delivered-To: freebsd-small@freebsd.org Received: from mail003.syd.optusnet.com.au (mail003.syd.optusnet.com.au [203.2.75.251]) by hub.freebsd.org (Postfix) with ESMTP id 3723F37B417 for ; Tue, 25 Dec 2001 19:08:00 -0800 (PST) Received: from w95 (wdcax13-187.dialup.optusnet.com.au [198.142.220.187]) by mail003.syd.optusnet.com.au (8.11.1/8.11.1) with SMTP id fBQ37su02783; Wed, 26 Dec 2001 14:07:54 +1100 Message-ID: <001901c18dba$83dfcbc0$0104010a@famzon.com.au> Reply-To: "Andrew Hannam" From: "Andrew Hannam" To: "John Hanley" Cc: References: <20011224070645.77398.qmail@web10108.mail.yahoo.com> Subject: Re: Disk Writes Date: Wed, 26 Dec 2001 13:07:46 +1000 Organization: FamZon Systems MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 Disposition-Notification-To: "Andrew Hannam" X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-small@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thanks for your help... Just a short note on the power-fail condition ; With our equipment a power failure is more likely to come just after a transaction that requires writing to the disk. The power fail (if it occurs) is most likely to occur 5 -> 10 seconds after the transaction and is the result of action by a serviceman at the machine (about once or twice a week). I therefore believe it should be possible to achieve a safe write 100% of the time. This however has not been born out in practice with a failure rate of about 0.5% in these conditions. With a 1000 machines in the field this would equate to a failure of about 1 to 2 machines a day. This is not acceptable in practice so I must find a solution. The hard-links idea is a useful bit - I'll add it to my toolkit for FreeBSD. I had tried this technique before on a Linux box but in Linux the rename(2) call is not atomic where an existing file exists. Using fsync() or even sync are not generally options for me as I am using java for a large part of the application. Where C has been used - it is liberally sprinkled with fsync and sync. Special care has been taken with the java to ensure that files are being closed properly. Without the files being closed after each write operation I found that even on unbuffered writes that there was a high probability of file corruption on power-down. Now that files are being closed after each write, I never seem to lose information during an fsck auto-repair (a great improvement) however occasionally fsck is not able to repair it at all effectively causing the device to be inoperable with a return to depot for repair. The return to depot is expensive and requires special low level data extraction to try and get the information off the now badly corrupted disk. I have tried both with and without soft updates. The largest problem of using soft updates is the latency - using the standard parameters it can take up to 30 seconds for data to actually be written out to disk thus introducing a good probability of losing information on a power down. The 0, 1 & 2 second delays are much better (using my kernel parameters) but 2 seconds is still a long time in my application. Looking through the code - 0, 1 & 2 second delays appear to be the smallest periods available without affecting the way soft updates work. With soft updates I seem to be more likely to lose information but less likely to kill the disk. Given this compromise I have turned off soft-updates. Is there an alternative file-system that would be more tolerant to power-down issues? For example, with original DOS operating system I can't remember ever having this sort of problem until Windows started adding write caching. Is there some option to make file-system calls totally synchronous (turning off all write caching) thus significantly reducing the risk? Write speed performance is not a critical criteria. Integrity and completeness are far more important. It might be true that voltage sag on write is toasting extra super-block copies or alternatively that the super-blocks are not being written synchronously. I have two variants of the motherboard hardware using different chipsets with two different sized hard-drives (3" and 2") so it doesn't appear to be a hardware specific problem. If this is what is happening then having more than one super-block is an integrity risk rather than an integrity improver because I cannot manually fsck after a super-block corruption. How then would I turn off the extra super-block copies? I presume this would be done at file-system creation time. An example of redundant information being useless is in the original FAT file-system. The second copy of the FAT is only ever used to detect that the two copies of the FAT are out of sync. I have never seen a DOS or Windows utility that takes any notice of the information written in the second copy of the FAT. For example, scandisk (equivalent to fsck) detects that they are different but the only repair option is to write the first copy of the FAT on top of the second copy of the FAT. Is the UFS file-system and fsck different in this regard? ----- Original Message ----- From: "John Hanley" To: "Andrew Hannam" Sent: Monday, December 24, 2001 5:06 PM Subject: Re: Disk Writes > --- Andrew Hannam wrote: > > The application has been > > written to only append to files or to replace the file by creating a new > > file and then writing a single byte to redirect which file is in use. > > BTW, using hard links might be pretty slick, here. Watch me atomically > delete "file": > > $ date > file > $ TMP=file.$$ > $ mv $TMP file > > At worst, file.${pid} is left around. > At all times, we have either the old or new contents available. > The rename(2) does the "delete" and the "make new contents available" > operations as a single atomic operation, safe in the face of power fails. > > > > My current kernel settings are: > > sysctl -w vfs.write_behind=0 kern.filedelay=2 kern.dirdelay=1 > > kern.metadelay=0 > > I'll bet that combination of parameters has received less testing > than the default parameters. I feel that the default params with > soft updates should be working just fine for you. > > Is power fail pretty straightforward? Does the CPU go down before > the device that /app is on goes down? Maybe voltage sag at the time > of a write is toasting one or more superblock replicas? > > > Cheers, > JH > > __________________________________________________ > Do You Yahoo!? > Send your FREE holiday greetings online! > http://greetings.yahoo.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-small" in the body of the message