FreeBSD Mail Archives

Date:      Sat, 2 Apr 2011 09:57:02 +0200
From:      Olivier Smedts <olivier@gid0.org>
To:        freebsd-stable@freebsd.org
Subject:   Re: Constant rebooting after power loss
Message-ID:  <BANLkTik9aN7TZ_pSZ1b=nMeXO-mW-fYuUA@mail.gmail.com>
In-Reply-To: <201104020335.p323Zp8Q018666@apollo.backplane.com>
References:  <87d3l6p5xv.fsf@cosmos.claresco.hr> <AANLkTi=kEyz-mKLzdV8LAf91ZhMTP8gLKs=3Eu5WD8mh@mail.gmail.com> <874o6ip0ak.fsf@cosmos.claresco.hr> <7b15d37d28f8ddac9eb81e4390231c96.HRCIM@webmail.1command.com> <AANLkTi=KEwmm1hM6Z=r_SWUAn9KhUrkTVzfF6VmqQauW@mail.gmail.com> <14c23d4bf5b47a7790cff65e70c66151.HRCIM@webmail.1command.com> <AANLkTi=6pqRwJ96Lg=603cYg_f8QUXkg8aXtbjbYpFrV@mail.gmail.com> <201104020335.p323Zp8Q018666@apollo.backplane.com>

index | next in thread | previous in thread | raw e-mail


2011/4/2 Matthew Dillon <dillon@apollo.backplane.com>:
> � �The core of the issue here comes down to two things:
>
> � �First, a power loss to the drive will cause the drive's dirty write cache
> � �to be lost, that data will not make it to disk. �Nor do you really want
> � �to turn of write caching on the physical drive. �Well, you CAN turn it
> � �off, but if you do performance will become so bad that there's no point.
> � �So turning off the write caching is really a non-starter.
>
> � �The solution to this first item is for the OS/filesystem to issue a
> � �disk flush command to the drive at appropriate times. �If I recall the
> � �ZFS implementation in FreeBSD *DOES* do this for transaction groups,
> � �which guarantees that a prior transaction group is fully synced before
> � �a new ones starts running (HAMMER in DragonFly also does this).
> � �(Just getting an 'ack' from the write transaction over the SATA bus only
> � �means the data made it to the drive's cache, not that it made it to
> � �the platter).

Amen !

> � �I'm not sure about UFS vis-a-vie the recent UFS logging features...
> � �it might be an option but I don't know if it is a default. �Perhaps
> � �someone can comment on that.
>
> � �One last note here. �Many modern drives have very large ram caches.
> � �OCZ's SSDs have something like 256MB write caches and many modern HDs
> � �now come with 32MB and 64MB caches. �Aged drives with lots of relocated
> � �sectors and bit errors can also take a very long time to perform writes
> � �on certain sectors. �So these large caches take time to drain and one
> � �can't really assume that an acknowledged write to disk will actually
> � �make it to the disk under adverse circumstances any more. �All sorts
> � �of bad things can happen.
>
> � �Finally, the drives don't order their writes to the platter (you can
> � �set a bit to tell them to, but like many similar bits in the past there
> � �is no real guarantee that the drives will honor it). �So if two
> � �transactions do not have a disk flush command inbetween them it is
> � �possible for data from the second transaction to commit to the platter
> � �before all the data from the first transaction commits to the platter.
> � �Or worse, for the non-transactional data to update out of order relative
> � �to the transactional data which was supposed to commit first.
>
> � �Hence IMHO the OS/filesystem must use the disk flush command in such
> � �situations for good reliability.
>
> � �--
>
> � �The second problem is that a physical loss of power to the drive can
> � �cause the drive to physically lose one or more sectors, and can even
> � �effectively destroy the drive (even with the fancy auto-park)... if the
> � �drive happens to be in the middle of a track write-back when power is
> � �lost it is possible to lose far more than a single sector, including
> � �sectors unrelated to recent filesystem operations.
>
> � �The only solution to #2 is to make sure your machines (or at least the
> � �drives if they happen to be in external enclosures) are connected to
> � �a UPS and that the machines are communicating with the UPS via
> � �something like the "apcupsd" port. �AND also that you test to make
> � �sure the machines properly shut themselves down when AC is lost before
> � �the UPS itself runs out of battery time. �After all, a UPS won't help
> � �if the machines don't at least idle their drives before power is lost!!!
>
> � �I learned this lesson the hard way about 3 years ago. �I had something
> � �like a dozen drives in two raid arrays doing heavy write activity and
> � �lost physical power and several of the drives were totally destroyed,
> � �with thousands of sector errors. �Not just one or two... thousands.
>
> � �(It is unclear how SSDs react to physical loss of power during heavy
> � �writing activity. �Theoretically while they will certainly lose their
> � �write cache they shouldn't wind up with any read errors).
>
> � � � � � � � � � � � � � � � � � � � � � � � �-Matt
>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



-- 
Olivier Smedts� � � � � � � � � � � � � � � � � � � � � � � �� _
� � � � � � � � � � � � � � � � � � � � ASCII ribbon campaign ( )
e-mail: olivier@gid0.org� � � � - against HTML email & vCards� X
www: http://www.gid0.org� � - against proprietary attachments / \

� "Il y a seulement 10 sortes de gens dans le monde :
� ceux qui comprennent le binaire,
� et ceux qui ne le comprennent pas."

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTik9aN7TZ_pSZ1b=nMeXO-mW-fYuUA>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation