Date: Thu, 12 May 2011 11:50:11 +0200 From: Alexander Leidinger <Alexander@Leidinger.net> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: ZFS: How to enable cache and logs. Message-ID: <20110512115011.17724x18akn60oao@webmail.leidinger.net> In-Reply-To: <20110512090524.GA2106@icarus.home.lan> References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org> <20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg> <20110512083429.GA58841@icarus.home.lan> <A3B76BB6-49DC-4C2F-BD2B-9A0C62F4D24C@gmail.com> <20110512090524.GA2106@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
Quoting Jeremy Chadwick <freebsd@jdc.parodius.com> (from Thu, 12 May 2011 02:05:24 -0700): >> > What guarantee is there that the intent log -- which is written to the >> > disk -- actually got written to the disk in the middle of a power >> > failure? There's a lot of focus there on the idea that "the intent log >> > will fix everything, but may lose writes", but what guarantee do I have >> > that the intent log isn't corrupt or botched during a power failure? >> >> I expect that checksumming also works for ZIL (anybody knows?). If It would be a damn big design flaw if it wouldn't checksum the ZIL. >> that is the case, corruption would be detected, but you will have lost >> data unless you are using mirrored slog devices. > > I can't believe that statement either (the last line). > > I guess that's also what I'm asking here -- what guarantee do you have > that even with a mirrored 2-disk SLOG (or heck, 3 or 4!) that *no data* > will be *lost* during a power outage? > > It seems to me the proper phrase would be "the likelihood of losing an > entire pool during a power outage is lessened". Alexander indirectly > hinted at this in another post of his tonight, specifically regarding > zpool v15 versus v28: > > "The difference between v15 and v28 is the amount of data you lose (the > entire pool vs. only what is still on the log devices)". To recover the context: This was for losing the SLOG completely. > This makes much more sense to me. > > It seems that in a power outage, there will always be some form of data > loss. I imagine even systems that have hardware RAM/cache with BBUs on > everything; there's always some form of caching going on *somewhere* > within a system, from CPU all the way up, that guarantees some degree of > data loss). I guess I'm OCD'ing over the terminology here. Sorry. A simple power-loss should not destroy the SLOG (or the pool). For easy comprehension just let us assume that the log can only be destroyed by a hardware problem (broken disk -> the reason why it should be mirrored -> if all devices are broken, you have the same case as if the pool without a SLOG lost more drives than the redundancy allows): As written in my other mail (which I've send before I've seen this mail but probably arrived after you wrote this mail), the SLOG is not about an enhanced guarantee (you had the guarantee before), it is about performance. You need to handle the data-loss problem at several layers. If you have a power-loss during the write of the SLOG, you will lose the last SLOG entry (but there is no corruption). At this point in time the write did not return to the application, so the application should not have ACKed the reception of the data. If it did, you will lose data. If it didn't the application will just pick this transaction again from the queue of outstanding transactions and redo it. Detecting the case of a succeeded write but a power-loss before the ACK to the sender is up to be handled in the application too (e.g. calculating an ID based upon the incoming data, writing the ID together with the rest of the transaction, if the ID is in e.g. the DB and a corresponding state flag in the DB (if the processing is split up into several DB-transactions) which is written in the corresponding transaction then you know that the write before the power-loss was done correctly and the app just needs to ACK to the sender). Was this clear enough, or shall I try to draw a better picture (in this case please try to specify your concerns, maybe with an example)? Bye, Alexander. -- Do YOU have redeeming social value? http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110512115011.17724x18akn60oao>