Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Apr 2011 09:57:02 +0200
From:      Olivier Smedts <olivier@gid0.org>
To:        freebsd-stable@freebsd.org
Subject:   Re: Constant rebooting after power loss
Message-ID:  <BANLkTik9aN7TZ_pSZ1b=nMeXO-mW-fYuUA@mail.gmail.com>
In-Reply-To: <201104020335.p323Zp8Q018666@apollo.backplane.com>
References:  <87d3l6p5xv.fsf@cosmos.claresco.hr> <AANLkTi=kEyz-mKLzdV8LAf91ZhMTP8gLKs=3Eu5WD8mh@mail.gmail.com> <874o6ip0ak.fsf@cosmos.claresco.hr> <7b15d37d28f8ddac9eb81e4390231c96.HRCIM@webmail.1command.com> <AANLkTi=KEwmm1hM6Z=r_SWUAn9KhUrkTVzfF6VmqQauW@mail.gmail.com> <14c23d4bf5b47a7790cff65e70c66151.HRCIM@webmail.1command.com> <AANLkTi=6pqRwJ96Lg=603cYg_f8QUXkg8aXtbjbYpFrV@mail.gmail.com> <201104020335.p323Zp8Q018666@apollo.backplane.com>

index | next in thread | previous in thread | raw e-mail

2011/4/2 Matthew Dillon <dillon@apollo.backplane.com>:
>    The core of the issue here comes down to two things:
>
>    First, a power loss to the drive will cause the drive's dirty write cache
>    to be lost, that data will not make it to disk.  Nor do you really want
>    to turn of write caching on the physical drive.  Well, you CAN turn it
>    off, but if you do performance will become so bad that there's no point.
>    So turning off the write caching is really a non-starter.
>
>    The solution to this first item is for the OS/filesystem to issue a
>    disk flush command to the drive at appropriate times.  If I recall the
>    ZFS implementation in FreeBSD *DOES* do this for transaction groups,
>    which guarantees that a prior transaction group is fully synced before
>    a new ones starts running (HAMMER in DragonFly also does this).
>    (Just getting an 'ack' from the write transaction over the SATA bus only
>    means the data made it to the drive's cache, not that it made it to
>    the platter).

Amen !

>    I'm not sure about UFS vis-a-vie the recent UFS logging features...
>    it might be an option but I don't know if it is a default.  Perhaps
>    someone can comment on that.
>
>    One last note here.  Many modern drives have very large ram caches.
>    OCZ's SSDs have something like 256MB write caches and many modern HDs
>    now come with 32MB and 64MB caches.  Aged drives with lots of relocated
>    sectors and bit errors can also take a very long time to perform writes
>    on certain sectors.  So these large caches take time to drain and one
>    can't really assume that an acknowledged write to disk will actually
>    make it to the disk under adverse circumstances any more.  All sorts
>    of bad things can happen.
>
>    Finally, the drives don't order their writes to the platter (you can
>    set a bit to tell them to, but like many similar bits in the past there
>    is no real guarantee that the drives will honor it).  So if two
>    transactions do not have a disk flush command inbetween them it is
>    possible for data from the second transaction to commit to the platter
>    before all the data from the first transaction commits to the platter.
>    Or worse, for the non-transactional data to update out of order relative
>    to the transactional data which was supposed to commit first.
>
>    Hence IMHO the OS/filesystem must use the disk flush command in such
>    situations for good reliability.
>
>    --
>
>    The second problem is that a physical loss of power to the drive can
>    cause the drive to physically lose one or more sectors, and can even
>    effectively destroy the drive (even with the fancy auto-park)... if the
>    drive happens to be in the middle of a track write-back when power is
>    lost it is possible to lose far more than a single sector, including
>    sectors unrelated to recent filesystem operations.
>
>    The only solution to #2 is to make sure your machines (or at least the
>    drives if they happen to be in external enclosures) are connected to
>    a UPS and that the machines are communicating with the UPS via
>    something like the "apcupsd" port.  AND also that you test to make
>    sure the machines properly shut themselves down when AC is lost before
>    the UPS itself runs out of battery time.  After all, a UPS won't help
>    if the machines don't at least idle their drives before power is lost!!!
>
>    I learned this lesson the hard way about 3 years ago.  I had something
>    like a dozen drives in two raid arrays doing heavy write activity and
>    lost physical power and several of the drives were totally destroyed,
>    with thousands of sector errors.  Not just one or two... thousands.
>
>    (It is unclear how SSDs react to physical loss of power during heavy
>    writing activity.  Theoretically while they will certainly lose their
>    write cache they shouldn't wind up with any read errors).
>
>                                                -Matt
>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



-- 
Olivier Smedts                                                 _
                                        ASCII ribbon campaign ( )
e-mail: olivier@gid0.org        - against HTML email & vCards  X
www: http://www.gid0.org    - against proprietary attachments / \

  "Il y a seulement 10 sortes de gens dans le monde :
  ceux qui comprennent le binaire,
  et ceux qui ne le comprennent pas."


help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTik9aN7TZ_pSZ1b=nMeXO-mW-fYuUA>