Date: Fri, 30 Sep 2011 15:31:56 +0200 From: Attilio Rao <attilio@freebsd.org> To: Kirk McKusick <mckusick@mckusick.com> Cc: Garrett Cooper <yanegomi@gmail.com>, freebsd-fs@freebsd.org, Konstantin Belousov <kib@freebsd.org>, Xin LI <delphij@freebsd.org> Subject: Re: Need to force sync(2) before umounting UFS1 filesystems? Message-ID: <CAJ-FndBDzv6af%2BAVq9wkLbN8V3wHDk3BGPuTFaXB7j8EXsrrhA@mail.gmail.com> In-Reply-To: <201109301325.p8UDPPG2072859@chez.mckusick.com> References: <CAJ-FndBUhzBH3KaaDBvZk3OnNifLd83NedYLPvmLHTL7PTCmCA@mail.gmail.com> <201109301325.p8UDPPG2072859@chez.mckusick.com>
next in thread | previous in thread | raw e-mail | index | archive | help
2011/9/30 Kirk McKusick <mckusick@mckusick.com>: >> Date: Thu, 29 Sep 2011 18:13:34 +0200 >> Subject: Re: Need to force sync(2) before umounting UFS1 filesystems? >> From: Attilio Rao <attilio@freebsd.org> >> To: Kirk McKusick <mckusick@mckusick.com>, >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 Konstantin Belousov <kib@freebsd.org> >> Cc: Garrett Cooper <yanegomi@gmail.com>, freebsd-fs@freebsd.org, >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 Xin LI <delphij@freebsd.org> >> >> 2011/9/29 Kirk McKusick <mckusick@mckusick.com>: >> >> > I am definitely not in a rush on this, so by all means take some time >> > to look it over. The EBUSY unmount has been in its current state >> > for several years, so I am fine with taking a few weeks to sort out >> > the correct solution. Indeed, I am glad that Garrett has volunteered >> > to do some more serious testing. >> > >> > If this general approach is not correct, I can put a hook in for just >> > UFS so that it can have its historic behavior. As you have noted, the >> > SU code has a lot of activity that gets done under the protection of >> > vfs_busy. So it may be the only filesystem for which draining the >> > vfs_busy lock during unmount is needed. >> >> Honestly, my first thought was exactly that -- an option that was >> forcing the unmount sleep if SU is compiled in the kernel. > > The above would solve the problem at hand (NanoBSD builds). But I > believe that this issue can arise with any filesystem that has > background behavior such as ZFS and NFS. It is just most evident > with UFS using SU. > >> When you mention 'historical behaviour' you mean the behaviour UFS had >> even prior the introduction of the 'complete' VFS layer or it was the >> behaviour unmount(2) was expected to implemented since the beginning? > > Synchronous unmount has exisited since at least the UNIX versions > that I used in the 1970's. > >> My guess is that recent SU improvement by you and Jeff may have lead >> to higher vfs_busy() contention, thus making this behaviour just more >> visible. > > You are correct. SU has a lot of background activity. And as memories > have grown bigger, the backlog has been able to get bigger and hence > more noticable. On a busy system I have measured over fourty calls to > "sync; sleep 1; umount" before the filesystem would finally unmount > successfully. > >> BTW, I'm afraid the forced unmount case may have a possible deadlock. >> >> thread1 is doing whatever codepath it wants and thread2 is doing >> unmount (forced right now): >> >> thread1::vfs_busy() >> thread2::lock coveredvnode >> thread1::contests coveredvnode >> thread2::sleep because of thread1 vfs_busy > > I agree that the above deadlock is possible. See suggested solution below= . > >> I think this deadlock was actually possible even with the old code, it >> was just a LOR between a vnode lock and mount lock. > > Yes, this is a long-standing issue. > >> I'm not sure if there was any invariant I discussed with Kostik in the >> past, preventing one way or another I'm forgetting about, but it seems >> a possible deadlock to me. >> >> If you see this issue I'll make a patch for it. >> >> Attilio > > Here is my proposed fix. It does the unroll originally found in the > non-FORCE case before sleeping waiting for the vfs_busy to clear. > Is it acceptable to hold the mount mutex while calling VOP_UNLOCK? > If not, then it needs to be released before the unlock, reacquired > afterwards, and the check to see if the sleep is needed redone. I thought about this approach when sending the e-mail, but there is a problem: you need to handle the MNTK_UNMOUNT flag checking and subsequent setting after coveredvnode is held, otherwise at the first looping you will just return EBUSY. > Index: vfs_mount.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- vfs_mount.c (revision 225881) > +++ vfs_mount.c (working copy) > @@ -1187,6 +1187,7 @@ > > =C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_assert(&Giant, MA_OWNED); > > +top: > =C2=A0 =C2=A0 =C2=A0 =C2=A0if ((coveredvp =3D mp->mnt_vnodecovered) !=3D = NULL) { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mnt_gen_r =3D mp->= mnt_gen; > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0VI_LOCK(coveredvp)= ; > @@ -1227,21 +1228,19 @@ > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mp->mnt_kern_flag = |=3D MNTK_UNMOUNTF; > =C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D 0; > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (mp->mnt_lockref) { > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if ((flags & MNT_FORCE= ) =3D=3D 0) { > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 mp->mnt_kern_flag &=3D ~(MNTK_UNMOUNT | MNTK_NOINSMNTQ | > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 MNTK_UNMOUNTF); > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 if (mp->mnt_kern_flag & MNTK_MWAIT) { > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mp->mnt_kern_flag &=3D ~MNTK_MWAIT; > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 wakeup(mp); > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 } > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 MNT_IUNLOCK(mp); > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 if (coveredvp) > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VOP_UNLOCK(coveredvp, 0); > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 return (EBUSY); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mp->mnt_kern_flag &=3D= ~(MNTK_UNMOUNT | MNTK_NOINSMNTQ | > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 MNTK_UNM= OUNTF); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (mp->mnt_kern_flag = & MNTK_MWAIT) { > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 mp->mnt_kern_flag &=3D ~MNTK_MWAIT; > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 wakeup(mp); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (coveredvp) > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 VOP_UNLOCK(coveredvp, 0); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mp->mnt_kern_flag = |=3D MNTK_DRAINING; > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D msleep(&= mp->mnt_lockref, MNT_MTX(mp), PVFS, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"mou= nt drain", 0); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 MNT_IUNLOCK(mp); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 goto top; You can avoid the unlock by passing PVFS | PDROP. Attilio --=20 Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndBDzv6af%2BAVq9wkLbN8V3wHDk3BGPuTFaXB7j8EXsrrhA>