From owner-freebsd-current@freebsd.org Fri Nov 25 16:27:28 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 79829C5521D for ; Fri, 25 Nov 2016 16:27:28 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from na01-bl2-obe.outbound.protection.outlook.com (mail-bl2on0057.outbound.protection.outlook.com [65.55.169.57]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CD669BB5; Fri, 25 Nov 2016 16:27:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM (10.165.218.133) by YTXPR01MB0191.CANPRD01.PROD.OUTLOOK.COM (10.165.218.135) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.734.8; Fri, 25 Nov 2016 12:54:07 +0000 Received: from YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM ([10.165.218.133]) by YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM ([10.165.218.133]) with mapi id 15.01.0734.014; Fri, 25 Nov 2016 12:54:07 +0000 From: Rick Macklem To: Konstantin Belousov CC: Alan Somers , FreeBSD CURRENT Subject: Re: NFSv4 performance degradation with 12.0-CURRENT client Thread-Topic: NFSv4 performance degradation with 12.0-CURRENT client Thread-Index: AQHSRhIPtzL8Jj0C/0q6PJtKKLQgMaDn2GqAgAA8DvaAAGR2gIAAPUCRgACtAACAAEDk+w== Date: Fri, 25 Nov 2016 12:54:07 +0000 Message-ID: References: <20161124090811.GO54029@kib.kiev.ua> , <20161125084106.GX54029@kib.kiev.ua> In-Reply-To: <20161125084106.GX54029@kib.kiev.ua> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-microsoft-exchange-diagnostics: 1; YTXPR01MB0191; 7:sL9s3+/xPzFcGglA6ptksDBcjAjZfiyhEuKuzgXwVeHZQrGdkdQHzH9pzsKGwOziFneHlBhLHAd8GBhTGmzQ6VRDWikMy5YBVMnsZesmGQhN0kNy33wKQhQWjEIrAGBvMS4rypHD+zUiFllGqBv+k/ihYWGoI2N914dccSmA1mJ4MiahPTULsnJ+hE5N06L+1ML5jtNzmWkRoSJ/sexoinrieyvV5hVjAx4mGi8S+InxTLxe4FXrRZacWZ2sIx8LBC9Rqa0/EOx0pd2iNypXHan0qSIIqqv9IvDwS0VXc7POdQMaoC79ig04sqTQ5bxN4uifLNRfH7T67M0/jcDGqn7nvzrNKM0+oaNf6OZeFDM= x-ms-office365-filtering-correlation-id: 856dee4b-f28f-43ab-a19f-08d415322497 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:YTXPR01MB0191; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6045199)(6060326)(6040361)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6041248)(6061324)(2016111802025)(20161123555025)(20161123564025)(20161123560025)(20161123562025)(6043046); SRVR:YTXPR01MB0191; BCL:0; PCL:0; RULEID:; SRVR:YTXPR01MB0191; x-forefront-prvs: 01371B902F x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(7916002)(189002)(24454002)(199003)(38730400001)(101416001)(81156014)(81166006)(7846002)(68736007)(77096005)(74316002)(39410400001)(39380400001)(2900100001)(8936002)(97736004)(229853002)(92566002)(6506003)(93886004)(74482002)(2906002)(189998001)(39060400001)(4326007)(3280700002)(8676002)(2950100002)(1411001)(39450400002)(86362001)(3660700001)(122556002)(6916009)(105586002)(106356001)(110136003)(54356999)(7696004)(102836003)(106116001)(50986999)(5660300001)(76176999)(305945005)(9686002)(33656002)(39400400001); DIR:OUT; SFP:1101; SCL:1; SRVR:YTXPR01MB0191; H:YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Nov 2016 12:54:07.0936 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTXPR01MB0191 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Nov 2016 16:27:28 -0000 Konstantin Belousov wrote: >On Thu, Nov 24, 2016 at 10:45:51PM +0000, Rick Macklem wrote: >> asomers@gmail.com wrote: >> >OpenOwner Opens LockOwner Locks Delegs LocalOwn LocalOpen L= ocalLOwn >> > 5638 141453 0 0 0 0 0 = 0 >> Ok, I think this shows us the problem. 141453 opens is a lot and the cli= ent would have >> to chek these every time another open is done (there goes all that CPU;-= ). >> >> Now, why has this occurred? >> Well, the NFSv4 client can't close NFSv4 Opens on a vnode until that vno= de's >> v_usecount goes to 0. This is because mmap'd files might do I/O after th= e file >> descriptor is closed. >> Now, hopefully Kostik will know something about nullfs and can help with= this. >> My guess is that nullfs ends up acquiring a refcnt on the NFS vnode so t= he >> v_usecount doesn't go to 0 and, therefore, the client never closes the N= FSv4 Opens. >> Kostik, do you know if this is the case and whether or not it can be cha= nged? >You are absolutely right. Nullfs vnode keeps a reference to the lower >vnode which is below the nullfs one, i.e. to the nfs vnode in this case. >If cache option is specified for the nullfs mount (default), the nullfs >vnodes are cached normally to avoid the cost of creating and destroying >nullfs vnode on each operation, and related cost of the exclusive locks >on the lower vnode. > >An answer to my question in the previous mail to try with nocache >option would give the confirmation. Really, I suspected that v_hash >is calculated differently for NFSv3 and v4 mounts, but if opens are >accumulated until use ref is dropped, that would explain things as well. Hopefully Alan can test this and let us know if "nocache" on the nullfs mou= nt fixes the problem. >Assuming your diagnosis is correct, are you in fact stating that the >current VFS KPI is flawed ? It sounds as if either some another callback >or counter needs to exist to track number of mapping references to the >vm object of the vnode, in addition to VOP_OPEN/VOP_CLOSE ? > >Currently a rough estimation of the number of mappings, which is sometimes >slightly wrong, can be obtained by the expression > vp->v_object->ref_count - vp->v_object->shadow_count Well, ideally theer would be a VOP_MMAPDONE() or something like that, which would tell the NFSv4 client that I/O is done on the vnode so it can close i= t. If there was some way for the NFSv4 VOP_CLOSE() to be able to tell if the f= ile has been mmap'd, that would help since it could close the ones that are not mmap'd on the last descriptor close. (A counter wouldn't be as useful, since NFSv4 would have to keep checking i= t to see if it can do the close yet, but it might still be doable.) > >> >LocalLock >> > 0 >> >Rpc Info: >> >TimedOut Invalid X Replies Retries Requests >> > 0 0 0 0 662 >> >Cache Info: >> >Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits = Misses >> > 1275 58 837 121 0 0 0 = 0 >> >BioRLHits Misses BioD Hits Misses DirE Hits Misses >> > 1 0 6 0 1 0 >> > >> [more stuff snipped] >> >What role could nullfs be playing? >> As noted above, my hunch is that is acquiring a refcnt on the NFS client= vnode such >> that the v_usecount doesn't go to zero (at least for a long time) and wi= thout >> a VOP_INACTIVE() on the NFSv4 vnode, the NFSv4 Opens don't get closed an= d >> accumulate. >> (If that isn't correct, it is somehow interfering with the client Closin= g the NFSv4 Opens >> in some other way.) >> >The following patch should automatically unset cache option for nullfs >mounts over NFSv4 filesystem. > >diff --git a/sys/fs/nfsclient/nfs_clvfsops.c b/sys/fs/nfsclient/nfs_clvfso= ps.c >index 524a372..a7e9fe3 100644 >--- a/sys/fs/nfsclient/nfs_clvfsops.c >+++ b/sys/fs/nfsclient/nfs_clvfsops.c >@@ -1320,6 +1320,8 @@ out: > MNT_ILOCK(mp); > mp->mnt_kern_flag |=3D MNTK_LOOKUP_SHARED | MNTK_NO_IOPF | > MNTK_USES_BCACHE; >+ if ((VFSTONFS(mp)->nm_flag & NFSMNT_NFSV4) !=3D 0) >+ mp->mnt_kern_flag |=3D MNTK_NULL_NOCACHE; > MNT_IUNLOCK(mp); > } > return (error); >diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c >index 49bae28..de05e8b 100644 >--- a/sys/fs/nullfs/null_vfsops.c >+++ b/sys/fs/nullfs/null_vfsops.c >@@ -188,7 +188,8 @@ nullfs_mount(struct mount *mp) > } > > xmp->nullm_flags |=3D NULLM_CACHE; >- if (vfs_getopt(mp->mnt_optnew, "nocache", NULL, NULL) =3D=3D 0) >+ if (vfs_getopt(mp->mnt_optnew, "nocache", NULL, NULL) =3D=3D 0 || >+ (xmp->nullm_vfs->mnt_kern_flag & MNTK_NULL_NOCACHE) !=3D 0) > xmp->nullm_flags &=3D ~NULLM_CACHE; > > MNT_ILOCK(mp); >diff --git a/sys/sys/mount.h b/sys/sys/mount.h >index 94cabb6..b6f9fec 100644 >--- a/sys/sys/mount.h >+++ b/sys/sys/mount.h >@@ -370,7 +370,8 @@ void __mnt_vnode_markerfree_active(struct vno= de **mvp, >struct mount *); >#define MNTK_SUSPEND 0x08000000 /* request write suspensio= n */ >#define MNTK_SUSPEND2 0x04000000 /* block secondary writes = */ >#define MNTK_SUSPENDED 0x10000000 /* write operations are su= spended */ >-#define MNTK_UNUSED1 0x20000000 >+#define MNTK_NULL_NOCACHE 0x20000000 /* auto disable cache f= or nullfs >+ mounts over this fs */ >#define MNTK_LOOKUP_SHARED 0x40000000 /* FS supports shared lock looku= ps */ > #define MNTK_NOKNOTE 0x80000000 /* Don't send KNOTEs from = VOP hooks */ If the "nocache" option fixes Alan's problem, then I think a patch like thi= s is a good idea. Does unionfs suffer from the same issue? - I just took a glance and it doesn't have a "nocache" mount option. I can probably do a little test later to-day to see if unionfs seems to suf= fer from the same "accumulating opens" issue. Thanks for looking at this, rick