Date: Tue, 18 Jul 2017 13:22:00 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Kirk McKusick <mckusick@mckusick.com> Cc: Andreas Longwitz <longwitz@incore.de>, freebsd-fs@freebsd.org Subject: Re: ufs snapshot is sometimes corrupt on gjourneled partition Message-ID: <20170718102200.GT1935@kib.kiev.ua> In-Reply-To: <201707180044.v6I0iKvg040471@chez.mckusick.com> References: <596C7201.8090700@incore.de> <201707180044.v6I0iKvg040471@chez.mckusick.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jul 17, 2017 at 05:44:20PM -0700, Kirk McKusick wrote: > The sequence of calls when using bread is: > > Function Line File > -------- ---- ---- > bread 491 sys/buf.h > breadn_flags 1814 kern/vfs_bio.c > bstrategy 397 sys/buf.h > BO_STRATEGY 86 sys/bufobj.h > bufstrategy 4535 kern/vfs_bio.c > ufs_strategy 2290 ufs/ufs/ufs_vnops.c > BO_STRATEGY on filesystem device -> ffs_geom_strategy > ffs_geom_strategy 2141 ufs/ffs/ffs_vfsops.c > g_vfs_strategy 161 geom/geom_vfs.c > g_io_request 470 geom/geom_io.c > > Whereas readblock skips all these steps and calls g_io_request > directly. In my looking at the gjournal code, I believe that we > will still enter it with the g_io_request() call as I believe that > it does not hook itself into any of the VOP_ call structure. but I > have not done a backtrace to confirm this fact. Assuming that we > are still getting into g_journal_start(), then it should be possible > to catch reads that are only in the log and pull out the data as > needed. > > Another alternative is to send gjournal a request to flush its log > before starting the removal of a snapshot. I do not think that UFS call sequence is relevant there. It is clearly an underlying io device (gjournal) malfunction if it returns a data block which is different from the latest successful written block. As is, whether UFS pass the read request from buffer cache by the BO_STRATEGY layers, or directly creates bio and reads the block, is not important. OTOH, I do not think that this is an issue that gjournal always reads from the data area and misses journal. The failure would be much more spectacular in this case. I see some gjournal code which tries to find the data in 'cache' on read, whatever it means. It is clearly that sometimes it does not find the data. The failure is probably additionally hidden by the buffer cache eliminating most reads for recently written data. So the way to fix the bug is to read gjournal code and understand why does it sometime returns wrong data. For instance, there were relatively recent changes to geom infrastructure allowing for direct completion of bios. Anyway, I have no interest in gjournal.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170718102200.GT1935>