From owner-freebsd-fs@freebsd.org Tue Jul 18 10:22:06 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 790F5CFE23A for ; Tue, 18 Jul 2017 10:22:06 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 10E0668F7D for ; Tue, 18 Jul 2017 10:22:05 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v6IAM0Yj008446 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 18 Jul 2017 13:22:00 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v6IAM0Yj008446 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v6IAM0qj008445; Tue, 18 Jul 2017 13:22:00 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 18 Jul 2017 13:22:00 +0300 From: Konstantin Belousov To: Kirk McKusick Cc: Andreas Longwitz , freebsd-fs@freebsd.org Subject: Re: ufs snapshot is sometimes corrupt on gjourneled partition Message-ID: <20170718102200.GT1935@kib.kiev.ua> References: <596C7201.8090700@incore.de> <201707180044.v6I0iKvg040471@chez.mckusick.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201707180044.v6I0iKvg040471@chez.mckusick.com> User-Agent: Mutt/1.8.3 (2017-05-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Jul 2017 10:22:06 -0000 On Mon, Jul 17, 2017 at 05:44:20PM -0700, Kirk McKusick wrote: > The sequence of calls when using bread is: > > Function Line File > -------- ---- ---- > bread 491 sys/buf.h > breadn_flags 1814 kern/vfs_bio.c > bstrategy 397 sys/buf.h > BO_STRATEGY 86 sys/bufobj.h > bufstrategy 4535 kern/vfs_bio.c > ufs_strategy 2290 ufs/ufs/ufs_vnops.c > BO_STRATEGY on filesystem device -> ffs_geom_strategy > ffs_geom_strategy 2141 ufs/ffs/ffs_vfsops.c > g_vfs_strategy 161 geom/geom_vfs.c > g_io_request 470 geom/geom_io.c > > Whereas readblock skips all these steps and calls g_io_request > directly. In my looking at the gjournal code, I believe that we > will still enter it with the g_io_request() call as I believe that > it does not hook itself into any of the VOP_ call structure. but I > have not done a backtrace to confirm this fact. Assuming that we > are still getting into g_journal_start(), then it should be possible > to catch reads that are only in the log and pull out the data as > needed. > > Another alternative is to send gjournal a request to flush its log > before starting the removal of a snapshot. I do not think that UFS call sequence is relevant there. It is clearly an underlying io device (gjournal) malfunction if it returns a data block which is different from the latest successful written block. As is, whether UFS pass the read request from buffer cache by the BO_STRATEGY layers, or directly creates bio and reads the block, is not important. OTOH, I do not think that this is an issue that gjournal always reads from the data area and misses journal. The failure would be much more spectacular in this case. I see some gjournal code which tries to find the data in 'cache' on read, whatever it means. It is clearly that sometimes it does not find the data. The failure is probably additionally hidden by the buffer cache eliminating most reads for recently written data. So the way to fix the bug is to read gjournal code and understand why does it sometime returns wrong data. For instance, there were relatively recent changes to geom infrastructure allowing for direct completion of bios. Anyway, I have no interest in gjournal.