From owner-freebsd-stable@FreeBSD.ORG Tue Jul 9 19:39:26 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id BB4A2998 for ; Tue, 9 Jul 2013 19:39:26 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from dss.incore.de (dss.incore.de [195.145.1.138]) by mx1.freebsd.org (Postfix) with ESMTP id 668C317FD for ; Tue, 9 Jul 2013 19:39:26 +0000 (UTC) Received: from inetmail.dmz (inetmail.dmz [10.3.0.3]) by dss.incore.de (Postfix) with ESMTP id 4141D5CABB; Tue, 9 Jul 2013 21:39:25 +0200 (CEST) X-Virus-Scanned: amavisd-new at incore.de Received: from dss.incore.de ([10.3.0.3]) by inetmail.dmz (inetmail.dmz [10.3.0.3]) (amavisd-new, port 10024) with LMTP id BLz07wJ5ErCz; Tue, 9 Jul 2013 21:39:24 +0200 (CEST) Received: from mail.incore (fwintern.dmz [10.0.0.253]) by dss.incore.de (Postfix) with ESMTP id 1C30F5CAB3; Tue, 9 Jul 2013 21:39:24 +0200 (CEST) Received: from bsdmhs.longwitz (unknown [192.168.99.6]) by mail.incore (Postfix) with ESMTP id AC89050881; Tue, 9 Jul 2013 21:39:23 +0200 (CEST) Message-ID: <51DC66EB.40109@incore.de> Date: Tue, 09 Jul 2013 21:39:23 +0200 From: Andreas Longwitz User-Agent: Thunderbird 2.0.0.19 (X11/20090113) MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: Shutdown hangs on unmount of a gjournaled file system in 8-Stable References: <51D9EB23.4070505@incore.de> <20130708054301.GI91021@kib.kiev.ua> In-Reply-To: <20130708054301.GI91021@kib.kiev.ua> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jul 2013 19:39:26 -0000 Konstantin Belousov wrote: > On Mon, Jul 08, 2013 at 12:26:43AM +0200, Andreas Longwitz wrote: >> The deadlock can be explained now: pid 1 (init) sleeps on "mount drain" >> because mp->mnt_lockref was 1. This setting was done by pid 18 (gjournal >> switcher) by calling vfs_busy(). pid 18 now sleeps on "suspwt" because >> mp->mnt_writeopcount was 1. This setting was done by pid 1 before going >> to sleep by calling vn_start_write() in dounmount(). >> >> I think the reason for this deadlock is the commit r249055 which seems >> not to be compatible with gjournal. > Thank you for the analysis. I think 'not compatible' is some > understatement. The situation clearly causes a deadlock, you are right. > > The vfs_busy(); vfs_write_suspend(); call sequence is somewhat dubious, > in fact, exactly because unmount could start in between. I think that > vfs_write_suspend() must avoid setting MNT_SUSPEND if unmount was > started. Patch below, for HEAD, should fix the problem, by marking the > callers of vfs_write_suspend(), which are not protected by the covered > vnode lock, with the VS_SKIP_UNMOUNT flag. Agree. > I believe that the conflicts on stable/8 should be trivial, if any. Yes, I have adapted r244795, r244925 and r245286 from head and your patch for the umount hang to 8-Stable and everything looks fine. All my reboots worked as expected. By the way, because the source gjounal.c is involved: can you extend the panic message for Journal overflow a little bit: -> diff g_journal.c.orig g_journal.c 342,343c343,344 < panic("Journal overflow (joffset=%jd active=%jd inactive=%jd)", < (intmax_t)sc->sc_journal_offset, --- > panic("Journal overflow (id=%d joffset=%jd active=%jd inactive=%jd)", > sc->sc_id, (intmax_t)sc->sc_journal_offset, This was helpful for analyzing the still unsolved "suspwt" lock problem from kern/164252, please look at http://lists.freebsd.org/pipermail/freebsd-geom/2012-May/005246.html -- Andreas Longwitz