From owner-freebsd-stable@FreeBSD.ORG Sat Jun 12 15:09:19 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E22E9106567C for ; Sat, 12 Jun 2010 15:09:19 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 64ECC8FC14 for ; Sat, 12 Jun 2010 15:09:19 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o5CF9FXv029941 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 12 Jun 2010 18:09:16 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o5CF9FKc076591; Sat, 12 Jun 2010 18:09:15 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o5CF9FPw076590; Sat, 12 Jun 2010 18:09:15 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 12 Jun 2010 18:09:15 +0300 From: Kostik Belousov To: Rick Macklem Message-ID: <20100612150915.GN13238@deviant.kiev.zoral.com.ua> References: <20100606144443.GA50876@emmi.physik-pool.tu-berlin.de> <8639wsk4t1.fsf@kopusha.home.net> <20100612141549.GM13238@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="aIbcA3MSwnGacr4f" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Leon Me??ner , freebsd-stable@freebsd.org, Mikolaj Golub Subject: Re: Re: freeBSD nullfs together with nfs and "silly rename" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 15:09:20 -0000 --aIbcA3MSwnGacr4f Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jun 12, 2010 at 11:15:49AM -0400, Rick Macklem wrote: >=20 >=20 > On Sat, 12 Jun 2010, Kostik Belousov wrote: >=20 > >On Sat, Jun 12, 2010 at 11:56:10AM +0300, Mikolaj Golub wrote: > >> > >>On Sun, 6 Jun 2010 16:44:43 +0200 Leon Me??ner wrote: > >> > >> LM> Hi, > >> LM> I hope this is not the wrong list to ask. Didn't get any answers on > >> LM> -questions. > >> > >> LM> When you try to do the following inside a nullfs mounted directory, > >> LM> where the nullfs origin is itself mounted via nfs you get an error: > >> > >> LM> # foo > >> LM> # tail -f foo& > >> LM> # rm -f foo > >> LM> tail: foo: Stale NFS file handle > >> LM> # fg > >> > >> LM> This is really a problem when running services inside jails and us= ing > >> LM> NFS as storage. As of [2] it looks like this problem is known for a > >> LM> while. On a normal NFS mount this does not happen as "silly renami= ng" > >> LM> [1] works there (producing nasty little .nfsXXXX files). > >> > >>nfs_sillyrename() is called when vnode's usecount is more then 1. It is > >>expected that unlink() syscall increases vnode's usecount in namei() an= d=20 > >>if > >>the file has been already opened usecount will be more then 1. > >> > >>But with nullfs layer present the reference counts are held by the uppe= r=20 > >>node, > >>not the lower (nfs) one, so when unlink() is called it increases usecou= nt=20 > >>of > >>the upper vnode, not nfs vnode and nfs_sillyrename() is never called. > >> > >>The strightforward solution looks like to implement null_remove() that= =20 > >>will > >>increase lower vnode's refcount before calling null_bypass() and then > >>decrement it after the call. See the attached patch (it works for me on= =20 > >>both > >>8-STABLE and CURRENT). > > > >The upper vnode holds a reference to the lower vnode, as you noted. > >Now, with your patch, I believe that _all_ calls to the nfs_remove() > >are happen with refcount > 1. > > > I'm not familiar with the nullfs so this might be way off, but would > this patch be ok by any chance? >=20 > Index: sys/fs/nullfs/null_vnops.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/fs/nullfs/null_vnops.c (revision 208960) > +++ sys/fs/nullfs/null_vnops.c (working copy) > @@ -499,6 +499,23 @@ > } >=20 > /* > + * Increasing refcount of lower vnode is needed at least for the case > + * when lower FS is NFS to do sillyrename if the file is in use. > + */ > +static int > +null_remove(struct vop_remove_args *ap) > +{ > + int retval; > + struct vnode *lvp; > + > + if (ap->a_vp->v_usecount > 1) { > + lvp =3D NULLVPTOLOWERVP(ap->a_vp); > + VREF(lvp); > + } else > + lvp =3D NULL; > + retval =3D null_bypass(&ap->a_gen); > + if (lvp !=3D NULL) > + vrele(lvp); > + return (retval); > +} > + > +/* Yes, I hoped that Mikolaj ends up with something similar :). Please note that this is racy, since we cannot know why usecount is greater then 1. This might cause the silly rename to kick in some time where it should not, but the race is rare. --aIbcA3MSwnGacr4f Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkwToxsACgkQC3+MBN1Mb4icPACcDNXe595aY92KTDDMm1Wjuj9E LxcAoOaScCNCZS1OXRmR4zR4sLEClw4n =N2VX -----END PGP SIGNATURE----- --aIbcA3MSwnGacr4f--