Date: Tue, 09 Jul 2013 21:39:23 +0200 From: Andreas Longwitz <longwitz@incore.de> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-stable@freebsd.org Subject: Re: Shutdown hangs on unmount of a gjournaled file system in 8-Stable Message-ID: <51DC66EB.40109@incore.de> In-Reply-To: <20130708054301.GI91021@kib.kiev.ua> References: <51D9EB23.4070505@incore.de> <20130708054301.GI91021@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov wrote:
> On Mon, Jul 08, 2013 at 12:26:43AM +0200, Andreas Longwitz wrote:
>> The deadlock can be explained now: pid 1 (init) sleeps on "mount drain"
>> because mp->mnt_lockref was 1. This setting was done by pid 18 (gjournal
>> switcher) by calling vfs_busy(). pid 18 now sleeps on "suspwt" because
>> mp->mnt_writeopcount was 1. This setting was done by pid 1 before going
>> to sleep by calling vn_start_write() in dounmount().
>>
>> I think the reason for this deadlock is the commit r249055 which seems
>> not to be compatible with gjournal.
> Thank you for the analysis. I think 'not compatible' is some
> understatement. The situation clearly causes a deadlock, you are right.
>
> The vfs_busy(); vfs_write_suspend(); call sequence is somewhat dubious,
> in fact, exactly because unmount could start in between. I think that
> vfs_write_suspend() must avoid setting MNT_SUSPEND if unmount was
> started. Patch below, for HEAD, should fix the problem, by marking the
> callers of vfs_write_suspend(), which are not protected by the covered
> vnode lock, with the VS_SKIP_UNMOUNT flag.
Agree.
> I believe that the conflicts on stable/8 should be trivial, if any.
Yes, I have adapted r244795, r244925 and r245286 from head and your
patch for the umount hang to 8-Stable and everything looks fine. All my
reboots worked as expected.
By the way, because the source gjounal.c is involved: can you extend the
panic message for Journal overflow a little bit:
-> diff g_journal.c.orig g_journal.c
342,343c343,344
< panic("Journal overflow (joffset=%jd active=%jd inactive=%jd)",
< (intmax_t)sc->sc_journal_offset,
---
> panic("Journal overflow (id=%d joffset=%jd active=%jd inactive=%jd)",
> sc->sc_id, (intmax_t)sc->sc_journal_offset,
This was helpful for analyzing the still unsolved "suspwt" lock problem
from kern/164252, please look at
http://lists.freebsd.org/pipermail/freebsd-geom/2012-May/005246.html
--
Andreas Longwitz
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51DC66EB.40109>
