From owner-freebsd-fs@FreeBSD.ORG Sun Aug 8 23:50:05 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F1AE81065674 for ; Sun, 8 Aug 2010 23:50:04 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 9F1CE8FC0A for ; Sun, 8 Aug 2010 23:50:04 +0000 (UTC) Received: by qyk11 with SMTP id 11so1722377qyk.13 for ; Sun, 08 Aug 2010 16:50:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=+Vr8iuy/bwHB+HiV9SJFo4AGJHp3UEKti+ou9oaddVQ=; b=XpTJsVM3phMsd9VJZp/+w3I2O17GaxYPCAGJBLrn+dqMwv+imTNyAW6LeUum8JLVid lhcoza6pSjjfYP6ZoVKtSQvyDhc/DNu1F21dmMDYUtbi02XQartfDr1VBiqdNEvqI1Js 8RArrf0ZMCI1NVnr3vqKA5zUyaxJbPgjPN3N0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=ZFgow74/2SqylqrZNfjgFM29R8l83pTfybfdlGDmoR2nHb2xLJ7dmvVqTPzn/TfA1x vH8IXz+8dp+0jBuzc+gPEAbz0n4dT1gm7uc4wAjJQZvhvrSTEgysFCK7pyhCjjm125ZJ 0ETv7uE7Ggb9Kzk9uZgee8ELDlginXeUHd5zY= MIME-Version: 1.0 Received: by 10.224.104.153 with SMTP id p25mr8153339qao.98.1281309663049; Sun, 08 Aug 2010 16:21:03 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.229.251.6 with HTTP; Sun, 8 Aug 2010 16:21:03 -0700 (PDT) In-Reply-To: <20100613144240.GV13238@deviant.kiev.zoral.com.ua> References: <20100603143501.GA3176@a91-153-117-195.elisa-laajakaista.fi> <20100613144240.GV13238@deviant.kiev.zoral.com.ua> Date: Mon, 9 Aug 2010 01:21:03 +0200 X-Google-Sender-Auth: krODq5mtkRxwjsvi6157yuZ27fc Message-ID: From: Attilio Rao To: Kostik Belousov Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Jaakko Heinonen Subject: Re: syncer vnode leak because of nmount() race X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Aug 2010 23:50:05 -0000 I think that the patch looks good. Did you test it? Otherwise you could ask to gianni@ to test it out. Thanks, Attilio 2010/6/13 Kostik Belousov : > On Fri, Jun 04, 2010 at 05:00:49PM +0200, Attilio Rao wrote: >> 2010/6/3 Jaakko Heinonen : >> > >> > Hi, >> > >> > I have been playing with devfs and stress2 recently and I was able to >> > trigger a syncer vnode leak with devfs.sh. As far as I can tell it is = a >> > nmount(2) (initial) mount race against update mount. The code in >> > vfs_domount() is protected with vfs_busy()/vfs_unbusy() but it doesn't >> > protect against update mounts. The protection by Giant is defeated >> > because devfs_root() may sleep due to sx_xlock(). vfs_domount() calls >> > VFS_ROOT() before allocating the syncer vnode. VI_MOUNT vnode flag >> > doesn't provide sufficient protection against update mounts either. >> >> Thanks for this bug report. >> I think that, luckilly, it is not a very common condition to have the >> mount still in flight and get updates... :) >> However, I think that the real breakage here is that the check on >> mnt->mnt_syncer is done lockless and it is unsafe. It might really be >> protected by the mount interlock but that's tricky because >> vfs_allocate_syncvnode() sleeps. Also re-checking the condition (after >> the allocation takes place) is not too simple here. > Is it =C2=A0? I think that the patch below would fix the issue, by > syncronizing mnt_syncer by syncer mutex. > > On the other hand, I prefer the jh' patch, because it seemingly disallows > parallel updates of the mount point. I believe that mp->mnt_vnodecovered > should be stable after the call to vfs_busy() succeeded. > > >> I found also this bug when rewriting the syncer and I resolved that by >> using a separate flag for that (in my case it was simpler and more >> beneficial actually for some other reasons, but you may do the same >> thing with a mnt_kern_flag entry). >> If you have no time I can work actively on the patch, even if I'm >> confident, luckilly, this problem is very hard to happen in the real >> life. >> >> Additively, note that vfs_busy() here is not intended to protect >> against such situations but against unmount. >> Also, I'm very unsure about what Giant protection might bring here. >> IMHO we might probabilly remove it at all as long as all the sleeping >> point present make it completely unuseful if anything (and I don't see >> a reason why it may be needed). >> >> > PS. vfs_unbusy(9) manual page is out of date after r184554 and IMO >> > =C2=A0 =C2=A0vfs_busy(9) manual page is misleading because it talks ab= out >> > =C2=A0 =C2=A0synchronizing access to a mount point. >> >> May you be more precise on what is misleading please? > > diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c > index 088d939..053bd68 100644 > --- a/sys/kern/vfs_mount.c > +++ b/sys/kern/vfs_mount.c > @@ -1034,14 +1034,10 @@ vfs_domount( > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0else > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0mp->mnt_kern_flag &=3D ~MNTK_ASYNC; > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0MNT_IUNLOCK(mp); > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if ((mp->mnt_flag & MN= T_RDONLY) =3D=3D 0) { > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 if (mp->mnt_syncer =3D=3D NULL) > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 error =3D vfs_allocate_syncvnode(mp); > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } else { > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 if (mp->mnt_syncer !=3D NULL) > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vrele(mp->mnt_syncer); > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 mp->mnt_syncer =3D NULL; > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if ((mp->mnt_flag & MN= T_RDONLY) =3D=3D 0) > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 error =3D vfs_allocate_syncvnode(mp); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 vfs_deallocate_syncvnode(mp); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vfs_unbusy(mp); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0VI_LOCK(vp); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vp->v_iflag &=3D ~= VI_MOUNT; > diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c > index ff6744a..6e4bb12 100644 > --- a/sys/kern/vfs_subr.c > +++ b/sys/kern/vfs_subr.c > @@ -3400,12 +3400,31 @@ vfs_allocate_syncvnode(struct mount *mp) > =C2=A0 =C2=A0 =C2=A0 =C2=A0/* XXX - vn_syncer_add_to_worklist() also grab= s and drops sync_mtx. */ > =C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_lock(&sync_mtx); > =C2=A0 =C2=A0 =C2=A0 =C2=A0sync_vnode_count++; > + =C2=A0 =C2=A0 =C2=A0 if (mp->mnt_syncer =3D=3D NULL) { > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mp->mnt_syncer =3D vp; > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vp =3D NULL; > + =C2=A0 =C2=A0 =C2=A0 } > =C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_unlock(&sync_mtx); > =C2=A0 =C2=A0 =C2=A0 =C2=A0BO_UNLOCK(bo); > - =C2=A0 =C2=A0 =C2=A0 mp->mnt_syncer =3D vp; > + =C2=A0 =C2=A0 =C2=A0 if (vp !=3D NULL) > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vgone(vp); > =C2=A0 =C2=A0 =C2=A0 =C2=A0return (0); > =C2=A0} > > +void > +vfs_deallocate_syncvnode(struct mount *mp) > +{ > + =C2=A0 =C2=A0 =C2=A0 struct vnode *vp; > + > + =C2=A0 =C2=A0 =C2=A0 mtx_lock(&sync_mtx); > + =C2=A0 =C2=A0 =C2=A0 vp =3D mp->mnt_syncer; > + =C2=A0 =C2=A0 =C2=A0 if (vp !=3D NULL) > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mp->mnt_syncer =3D NUL= L; > + =C2=A0 =C2=A0 =C2=A0 mtx_unlock(&sync_mtx); > + =C2=A0 =C2=A0 =C2=A0 if (vp !=3D NULL) > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vrele(vp); > +} > + > =C2=A0/* > =C2=A0* Do a lazy sync of the filesystem. > =C2=A0*/ > @@ -3484,15 +3503,16 @@ sync_reclaim(struct vop_reclaim_args *ap) > > =C2=A0 =C2=A0 =C2=A0 =C2=A0bo =3D &vp->v_bufobj; > =C2=A0 =C2=A0 =C2=A0 =C2=A0BO_LOCK(bo); > - =C2=A0 =C2=A0 =C2=A0 vp->v_mount->mnt_syncer =3D NULL; > + =C2=A0 =C2=A0 =C2=A0 mtx_lock(&sync_mtx); > + =C2=A0 =C2=A0 =C2=A0 if (vp->v_mount->mnt_syncer =3D=3D vp) > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vp->v_mount->mnt_synce= r =3D NULL; > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (bo->bo_flag & BO_ONWORKLST) { > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mtx_lock(&sync_mtx); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0LIST_REMOVE(bo, bo= _synclist); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0syncer_worklist_le= n--; > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sync_vnode_count--= ; > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mtx_unlock(&sync_mtx); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0bo->bo_flag &=3D ~= BO_ONWORKLST; > =C2=A0 =C2=A0 =C2=A0 =C2=A0} > + =C2=A0 =C2=A0 =C2=A0 mtx_unlock(&sync_mtx); > =C2=A0 =C2=A0 =C2=A0 =C2=A0BO_UNLOCK(bo); > > =C2=A0 =C2=A0 =C2=A0 =C2=A0return (0); > diff --git a/sys/sys/mount.h b/sys/sys/mount.h > index 20dcf64..dcff206 100644 > --- a/sys/sys/mount.h > +++ b/sys/sys/mount.h > @@ -731,6 +731,7 @@ int vfs_busy(struct mount *, int); > =C2=A0int =C2=A0 =C2=A0vfs_export =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* process mount export info */ > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(struct mount *, struct export_a= rgs *); > =C2=A0int =C2=A0 =C2=A0vfs_allocate_syncvnode(struct mount *); > +void =C2=A0 vfs_deallocate_syncvnode(struct mount *); > =C2=A0int =C2=A0 =C2=A0vfs_donmount(struct thread *td, int fsflags, struc= t uio *fsoptions); > =C2=A0void =C2=A0 vfs_getnewfsid(struct mount *); > =C2=A0struct cdev *vfs_getrootfsid(struct mount *); > --=20 Peace can only be achieved by understanding - A. Einstein