Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Sep 2011 15:31:56 +0200
From:      Attilio Rao <attilio@freebsd.org>
To:        Kirk McKusick <mckusick@mckusick.com>
Cc:        Garrett Cooper <yanegomi@gmail.com>, freebsd-fs@freebsd.org, Konstantin Belousov <kib@freebsd.org>, Xin LI <delphij@freebsd.org>
Subject:   Re: Need to force sync(2) before umounting UFS1 filesystems?
Message-ID:  <CAJ-FndBDzv6af%2BAVq9wkLbN8V3wHDk3BGPuTFaXB7j8EXsrrhA@mail.gmail.com>
In-Reply-To: <201109301325.p8UDPPG2072859@chez.mckusick.com>
References:  <CAJ-FndBUhzBH3KaaDBvZk3OnNifLd83NedYLPvmLHTL7PTCmCA@mail.gmail.com> <201109301325.p8UDPPG2072859@chez.mckusick.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2011/9/30 Kirk McKusick <mckusick@mckusick.com>:
>> Date: Thu, 29 Sep 2011 18:13:34 +0200
>> Subject: Re: Need to force sync(2) before umounting UFS1 filesystems?
>> From: Attilio Rao <attilio@freebsd.org>
>> To: Kirk McKusick <mckusick@mckusick.com>,
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 Konstantin Belousov <kib@freebsd.org>
>> Cc: Garrett Cooper <yanegomi@gmail.com>, freebsd-fs@freebsd.org,
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 Xin LI <delphij@freebsd.org>
>>
>> 2011/9/29 Kirk McKusick <mckusick@mckusick.com>:
>>
>> > I am definitely not in a rush on this, so by all means take some time
>> > to look it over. The EBUSY unmount has been in its current state
>> > for several years, so I am fine with taking a few weeks to sort out
>> > the correct solution. Indeed, I am glad that Garrett has volunteered
>> > to do some more serious testing.
>> >
>> > If this general approach is not correct, I can put a hook in for just
>> > UFS so that it can have its historic behavior. As you have noted, the
>> > SU code has a lot of activity that gets done under the protection of
>> > vfs_busy. So it may be the only filesystem for which draining the
>> > vfs_busy lock during unmount is needed.
>>
>> Honestly, my first thought was exactly that -- an option that was
>> forcing the unmount sleep if SU is compiled in the kernel.
>
> The above would solve the problem at hand (NanoBSD builds). But I
> believe that this issue can arise with any filesystem that has
> background behavior such as ZFS and NFS. It is just most evident
> with UFS using SU.
>
>> When you mention 'historical behaviour' you mean the behaviour UFS had
>> even prior the introduction of the 'complete' VFS layer or it was the
>> behaviour unmount(2) was expected to implemented since the beginning?
>
> Synchronous unmount has exisited since at least the UNIX versions
> that I used in the 1970's.
>
>> My guess is that recent SU improvement by you and Jeff may have lead
>> to higher vfs_busy() contention, thus making this behaviour just more
>> visible.
>
> You are correct. SU has a lot of background activity. And as memories
> have grown bigger, the backlog has been able to get bigger and hence
> more noticable. On a busy system I have measured over fourty calls to
> "sync; sleep 1; umount" before the filesystem would finally unmount
> successfully.
>
>> BTW, I'm afraid the forced unmount case may have a possible deadlock.
>>
>> thread1 is doing whatever codepath it wants and thread2 is doing
>> unmount (forced right now):
>>
>> thread1::vfs_busy()
>> thread2::lock coveredvnode
>> thread1::contests coveredvnode
>> thread2::sleep because of thread1 vfs_busy
>
> I agree that the above deadlock is possible. See suggested solution below=
.
>
>> I think this deadlock was actually possible even with the old code, it
>> was just a LOR between a vnode lock and mount lock.
>
> Yes, this is a long-standing issue.
>
>> I'm not sure if there was any invariant I discussed with Kostik in the
>> past, preventing one way or another I'm forgetting about, but it seems
>> a possible deadlock to me.
>>
>> If you see this issue I'll make a patch for it.
>>
>> Attilio
>
> Here is my proposed fix. It does the unroll originally found in the
> non-FORCE case before sleeping waiting for the vfs_busy to clear.
> Is it acceptable to hold the mount mutex while calling VOP_UNLOCK?
> If not, then it needs to be released before the unlock, reacquired
> afterwards, and the check to see if the sleep is needed redone.

I thought about this approach when sending the e-mail, but there is a
problem: you need to handle the MNTK_UNMOUNT flag checking and
subsequent setting after coveredvnode is held, otherwise at the first
looping you will just return EBUSY.

> Index: vfs_mount.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- vfs_mount.c (revision 225881)
> +++ vfs_mount.c (working copy)
> @@ -1187,6 +1187,7 @@
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_assert(&Giant, MA_OWNED);
>
> +top:
> =C2=A0 =C2=A0 =C2=A0 =C2=A0if ((coveredvp =3D mp->mnt_vnodecovered) !=3D =
NULL) {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mnt_gen_r =3D mp->=
mnt_gen;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0VI_LOCK(coveredvp)=
;
> @@ -1227,21 +1228,19 @@
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mp->mnt_kern_flag =
|=3D MNTK_UNMOUNTF;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D 0;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0if (mp->mnt_lockref) {
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if ((flags & MNT_FORCE=
) =3D=3D 0) {
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 mp->mnt_kern_flag &=3D ~(MNTK_UNMOUNT | MNTK_NOINSMNTQ |
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 MNTK_UNMOUNTF);
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 if (mp->mnt_kern_flag & MNTK_MWAIT) {
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mp->mnt_kern_flag &=3D ~MNTK_MWAIT;
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 wakeup(mp);
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 }
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 MNT_IUNLOCK(mp);
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 if (coveredvp)
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VOP_UNLOCK(coveredvp, 0);
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 return (EBUSY);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mp->mnt_kern_flag &=3D=
 ~(MNTK_UNMOUNT | MNTK_NOINSMNTQ |
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 MNTK_UNM=
OUNTF);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (mp->mnt_kern_flag =
& MNTK_MWAIT) {
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 mp->mnt_kern_flag &=3D ~MNTK_MWAIT;
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 wakeup(mp);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (coveredvp)
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 VOP_UNLOCK(coveredvp, 0);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mp->mnt_kern_flag =
|=3D MNTK_DRAINING;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D msleep(&=
mp->mnt_lockref, MNT_MTX(mp), PVFS,
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"mou=
nt drain", 0);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 MNT_IUNLOCK(mp);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 goto top;

You can avoid the unlock by passing PVFS | PDROP.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndBDzv6af%2BAVq9wkLbN8V3wHDk3BGPuTFaXB7j8EXsrrhA>