Date: Sat, 2 Apr 2011 09:57:02 +0200 From: Olivier Smedts <olivier@gid0.org> To: freebsd-stable@freebsd.org Subject: Re: Constant rebooting after power loss Message-ID: <BANLkTik9aN7TZ_pSZ1b=nMeXO-mW-fYuUA@mail.gmail.com> In-Reply-To: <201104020335.p323Zp8Q018666@apollo.backplane.com> References: <87d3l6p5xv.fsf@cosmos.claresco.hr> <AANLkTi=kEyz-mKLzdV8LAf91ZhMTP8gLKs=3Eu5WD8mh@mail.gmail.com> <874o6ip0ak.fsf@cosmos.claresco.hr> <7b15d37d28f8ddac9eb81e4390231c96.HRCIM@webmail.1command.com> <AANLkTi=KEwmm1hM6Z=r_SWUAn9KhUrkTVzfF6VmqQauW@mail.gmail.com> <14c23d4bf5b47a7790cff65e70c66151.HRCIM@webmail.1command.com> <AANLkTi=6pqRwJ96Lg=603cYg_f8QUXkg8aXtbjbYpFrV@mail.gmail.com> <201104020335.p323Zp8Q018666@apollo.backplane.com>
index | next in thread | previous in thread | raw e-mail
2011/4/2 Matthew Dillon <dillon@apollo.backplane.com>: > The core of the issue here comes down to two things: > > First, a power loss to the drive will cause the drive's dirty write cache > to be lost, that data will not make it to disk. Nor do you really want > to turn of write caching on the physical drive. Well, you CAN turn it > off, but if you do performance will become so bad that there's no point. > So turning off the write caching is really a non-starter. > > The solution to this first item is for the OS/filesystem to issue a > disk flush command to the drive at appropriate times. If I recall the > ZFS implementation in FreeBSD *DOES* do this for transaction groups, > which guarantees that a prior transaction group is fully synced before > a new ones starts running (HAMMER in DragonFly also does this). > (Just getting an 'ack' from the write transaction over the SATA bus only > means the data made it to the drive's cache, not that it made it to > the platter). Amen ! > I'm not sure about UFS vis-a-vie the recent UFS logging features... > it might be an option but I don't know if it is a default. Perhaps > someone can comment on that. > > One last note here. Many modern drives have very large ram caches. > OCZ's SSDs have something like 256MB write caches and many modern HDs > now come with 32MB and 64MB caches. Aged drives with lots of relocated > sectors and bit errors can also take a very long time to perform writes > on certain sectors. So these large caches take time to drain and one > can't really assume that an acknowledged write to disk will actually > make it to the disk under adverse circumstances any more. All sorts > of bad things can happen. > > Finally, the drives don't order their writes to the platter (you can > set a bit to tell them to, but like many similar bits in the past there > is no real guarantee that the drives will honor it). So if two > transactions do not have a disk flush command inbetween them it is > possible for data from the second transaction to commit to the platter > before all the data from the first transaction commits to the platter. > Or worse, for the non-transactional data to update out of order relative > to the transactional data which was supposed to commit first. > > Hence IMHO the OS/filesystem must use the disk flush command in such > situations for good reliability. > > -- > > The second problem is that a physical loss of power to the drive can > cause the drive to physically lose one or more sectors, and can even > effectively destroy the drive (even with the fancy auto-park)... if the > drive happens to be in the middle of a track write-back when power is > lost it is possible to lose far more than a single sector, including > sectors unrelated to recent filesystem operations. > > The only solution to #2 is to make sure your machines (or at least the > drives if they happen to be in external enclosures) are connected to > a UPS and that the machines are communicating with the UPS via > something like the "apcupsd" port. AND also that you test to make > sure the machines properly shut themselves down when AC is lost before > the UPS itself runs out of battery time. After all, a UPS won't help > if the machines don't at least idle their drives before power is lost!!! > > I learned this lesson the hard way about 3 years ago. I had something > like a dozen drives in two raid arrays doing heavy write activity and > lost physical power and several of the drives were totally destroyed, > with thousands of sector errors. Not just one or two... thousands. > > (It is unclear how SSDs react to physical loss of power during heavy > writing activity. Theoretically while they will certainly lose their > write cache they shouldn't wind up with any read errors). > > -Matt > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: olivier@gid0.org - against HTML email & vCards X www: http://www.gid0.org - against proprietary attachments / \ "Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas."help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTik9aN7TZ_pSZ1b=nMeXO-mW-fYuUA>
