Date: Tue, 09 Jul 2013 21:39:23 +0200 From: Andreas Longwitz <longwitz@incore.de> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-stable@freebsd.org Subject: Re: Shutdown hangs on unmount of a gjournaled file system in 8-Stable Message-ID: <51DC66EB.40109@incore.de> In-Reply-To: <20130708054301.GI91021@kib.kiev.ua> References: <51D9EB23.4070505@incore.de> <20130708054301.GI91021@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov wrote: > On Mon, Jul 08, 2013 at 12:26:43AM +0200, Andreas Longwitz wrote: >> The deadlock can be explained now: pid 1 (init) sleeps on "mount drain" >> because mp->mnt_lockref was 1. This setting was done by pid 18 (gjournal >> switcher) by calling vfs_busy(). pid 18 now sleeps on "suspwt" because >> mp->mnt_writeopcount was 1. This setting was done by pid 1 before going >> to sleep by calling vn_start_write() in dounmount(). >> >> I think the reason for this deadlock is the commit r249055 which seems >> not to be compatible with gjournal. > Thank you for the analysis. I think 'not compatible' is some > understatement. The situation clearly causes a deadlock, you are right. > > The vfs_busy(); vfs_write_suspend(); call sequence is somewhat dubious, > in fact, exactly because unmount could start in between. I think that > vfs_write_suspend() must avoid setting MNT_SUSPEND if unmount was > started. Patch below, for HEAD, should fix the problem, by marking the > callers of vfs_write_suspend(), which are not protected by the covered > vnode lock, with the VS_SKIP_UNMOUNT flag. Agree. > I believe that the conflicts on stable/8 should be trivial, if any. Yes, I have adapted r244795, r244925 and r245286 from head and your patch for the umount hang to 8-Stable and everything looks fine. All my reboots worked as expected. By the way, because the source gjounal.c is involved: can you extend the panic message for Journal overflow a little bit: -> diff g_journal.c.orig g_journal.c 342,343c343,344 < panic("Journal overflow (joffset=%jd active=%jd inactive=%jd)", < (intmax_t)sc->sc_journal_offset, --- > panic("Journal overflow (id=%d joffset=%jd active=%jd inactive=%jd)", > sc->sc_id, (intmax_t)sc->sc_journal_offset, This was helpful for analyzing the still unsolved "suspwt" lock problem from kern/164252, please look at http://lists.freebsd.org/pipermail/freebsd-geom/2012-May/005246.html -- Andreas Longwitz
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51DC66EB.40109>