Date: Wed, 26 Dec 2001 16:09:12 -0700 (MST) From: "Forrest W. Christian" <forrestc@imach.com> To: Andrew Hannam <famzon@bigfoot.com> Cc: John Hanley <jh_@yahoo.com>, freebsd-small@FreeBSD.ORG Subject: Re: Disk Writes Message-ID: <Pine.BSF.4.21.0112261606560.18913-100000@workhorse.iMach.com> In-Reply-To: <001901c18dba$83dfcbc0$0104010a@famzon.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
Are these ide drives? If so. you probably need to turn off write caching.
I *THINK* this is hw.ata.wc but needs to be turned off in a special way.
see man tuning.
On Wed, 26 Dec 2001, Andrew Hannam wrote:
> Date: Wed, 26 Dec 2001 13:07:46 +1000
> From: Andrew Hannam <famzon@bigfoot.com>
> To: John Hanley <jh_@yahoo.com>
> Cc: freebsd-small@FreeBSD.ORG
> Subject: Re: Disk Writes
>
> Thanks for your help...
>
> Just a short note on the power-fail condition ; With our equipment a power
> failure is more likely to come just after a transaction that requires
> writing to the disk. The power fail (if it occurs) is most likely to occur
> 5 -> 10 seconds after the transaction and is the result of action by a
> serviceman at the machine (about once or twice a week). I therefore believe
> it should be possible to achieve a safe write 100% of the time.
> This however has not been born out in practice with a failure rate of about
> 0.5% in these conditions. With a 1000 machines in the field this would
> equate to a failure of about 1 to 2 machines a day. This is not acceptable
> in practice so I must find a solution.
>
> The hard-links idea is a useful bit - I'll add it to my toolkit for FreeBSD.
> I had tried this technique before on a Linux box but in Linux the rename(2)
> call is not atomic where an existing file exists.
>
> Using fsync() or even sync are not generally options for me as I am using
> java for a large part of the application. Where C has been used - it is
> liberally sprinkled with fsync and sync. Special care has been taken with
> the java to ensure that files are being closed properly. Without the files
> being closed after each write operation I found that even on unbuffered
> writes that there was a high probability of file corruption on power-down.
> Now that files are being closed after each write, I never seem to lose
> information during an fsck auto-repair (a great improvement) however
> occasionally fsck is not able to repair it at all effectively causing the
> device to be inoperable with a return to depot for repair. The return to
> depot is expensive and requires special low level data extraction to try and
> get the information off the now badly corrupted disk.
>
> I have tried both with and without soft updates. The largest problem of
> using soft updates is the latency - using the standard parameters it can
> take up to 30 seconds for data to actually be written out to disk thus
> introducing a good probability of losing information on a power down.
>
> The 0, 1 & 2 second delays are much better (using my kernel parameters) but
> 2 seconds is still a long time in my application. Looking through the code -
> 0, 1 & 2 second delays appear to be the smallest periods available without
> affecting the way soft updates work. With soft updates I seem to be more
> likely to lose information but less likely to kill the disk. Given this
> compromise I have turned off soft-updates.
>
> Is there an alternative file-system that would be more tolerant to
> power-down issues? For example, with original DOS operating system I can't
> remember ever having this sort of problem until Windows started adding write
> caching.
>
> Is there some option to make file-system calls totally synchronous (turning
> off all write caching) thus significantly reducing the risk? Write speed
> performance is not a critical criteria. Integrity and completeness are far
> more important.
>
> It might be true that voltage sag on write is toasting extra super-block
> copies or alternatively that the super-blocks are not being written
> synchronously. I have two variants of the motherboard hardware using
> different chipsets with two different sized hard-drives (3" and 2") so it
> doesn't appear to be a hardware specific problem.
>
> If this is what is happening then having more than one super-block is an
> integrity risk rather than an integrity improver because I cannot manually
> fsck after a super-block corruption. How then would I turn off the extra
> super-block copies? I presume this would be done at file-system creation
> time.
>
> An example of redundant information being useless is in the original FAT
> file-system. The second copy of the FAT is only ever used to detect that the
> two copies of the FAT are out of sync. I have never seen a DOS or Windows
> utility that takes any notice of the information written in the second copy
> of the FAT. For example, scandisk (equivalent to fsck) detects that they are
> different but the only repair option is to write the first copy of the FAT
> on top of the second copy of the FAT.
>
> Is the UFS file-system and fsck different in this regard?
>
> ----- Original Message -----
> From: "John Hanley" <jh_@yahoo.com>
> To: "Andrew Hannam" <famzon@bigfoot.com>
> Sent: Monday, December 24, 2001 5:06 PM
> Subject: Re: Disk Writes
>
>
> > --- Andrew Hannam <famzon@bigfoot.com> wrote:
> > > The application has been
> > > written to only append to files or to replace the file by creating a new
> > > file and then writing a single byte to redirect which file is in use.
> >
> > BTW, using hard links might be pretty slick, here. Watch me atomically
> > delete "file":
> >
> > $ date > file
> > $ TMP=file.$$
> > $ mv $TMP file
> >
> > At worst, file.${pid} is left around.
> > At all times, we have either the old or new contents available.
> > The rename(2) does the "delete" and the "make new contents available"
> > operations as a single atomic operation, safe in the face of power fails.
> >
> >
> > > My current kernel settings are:
> > > sysctl -w vfs.write_behind=0 kern.filedelay=2 kern.dirdelay=1
> > > kern.metadelay=0
> >
> > I'll bet that combination of parameters has received less testing
> > than the default parameters. I feel that the default params with
> > soft updates should be working just fine for you.
> >
> > Is power fail pretty straightforward? Does the CPU go down before
> > the device that /app is on goes down? Maybe voltage sag at the time
> > of a write is toasting one or more superblock replicas?
> >
> >
> > Cheers,
> > JH
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Send your FREE holiday greetings online!
> > http://greetings.yahoo.com
>
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-small" in the body of the message
>
- Forrest W. Christian (forrestc@imach.com) AC7DE
----------------------------------------------------------------------
The Innovation Machine Ltd. P.O. Box 5749
http://www.imach.com/ Helena, MT 59604
Home of PacketFlux Technogies and BackupDNS.com (406)-442-6648
----------------------------------------------------------------------
Protect your personal freedoms - visit http://www.lp.org/
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-small" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0112261606560.18913-100000>
