From owner-freebsd-fs@freebsd.org Fri Oct 23 12:04:40 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7F71CA1C89F for ; Fri, 23 Oct 2015 12:04:40 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id E29FF1EB6; Fri, 23 Oct 2015 12:04:39 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:0L5tLBa6E2JNMRqUmnH9/pv/LSx+4OfEezUN459isYplN5qZpcu8bnLW6fgltlLVR4KTs6sC0LqL9fi+EjBfqb+681k8M7V0HycfjssXmwFySOWkMmbcaMDQUiohAc5ZX0Vk9XzoeWJcGcL5ekGA6ibqtW1aJBzzOEJPK/jvHcaK1oLsh730o8OYP1oArQH+SI0xBS3+lR/WuMgSjNkqAYcK4TyNnEF1ff9Lz3hjP1OZkkW0zM6x+Jl+73YY4Kp5pIZoGJ/3dKUgTLFeEC9ucyVsvJWq5lH/Sl6h/HYReF462jRzS1zL9hz3VIz99yXhnuRn1SSQJsGwSqo7D2eM9aBuHSXpgyRPEjcy82Xaj4QklqdSqxGlqhlX3onbfYyRLPo4daqLLoBSfnZIQssED38JOYi7dYZaSrNZZes= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2B7AgC+ISpW/61jaINehApvBr4jDoFZGYYEAoF/FAEBAQEBAQEBgQmCK4IJBSMEUhIBIBEZAgRVAgSIQ7JhkmEBAQEBAQEEAQEBAQEBAQESCYZ3iUAWATQHgmmBRQWWK4JOgkuJXUiDd4MkkmkCHwFDghEdgXEiNIU9gQYBAQE X-IronPort-AV: E=Sophos;i="5.20,186,1444708800"; d="scan'208";a="246411735" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 23 Oct 2015 08:04:24 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 0D4E715F55D; Fri, 23 Oct 2015 08:04:25 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id EZI_MB8O2F7D; Fri, 23 Oct 2015 08:04:24 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 07B5815F565; Fri, 23 Oct 2015 08:04:24 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Tb-99-PTqemN; Fri, 23 Oct 2015 08:04:23 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id D9EE215F55D; Fri, 23 Oct 2015 08:04:23 -0400 (EDT) Date: Fri, 23 Oct 2015 08:04:23 -0400 (EDT) From: Rick Macklem To: FreeBSD FS Cc: Josh Paetzel , Alexander Motin , ken@freebsd.org Message-ID: <1927021065.49882210.1445601863864.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <2144282175.49880310.1445601737139.JavaMail.zimbra@uoguelph.ca> Subject: NFS FHA issue and possible change to the algorithm MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_49882208_271473258.1445601863862" X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: NFS FHA issue and possible change to the algorithm Thread-Index: pEHcTY22qkgnw6m7/neUQejW0d+Nlw== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Oct 2015 12:04:40 -0000 ------=_Part_49882208_271473258.1445601863862 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi, An off list discussion occurred where a site running an NFS server found that they needed to disable File Handle Affinity (FHA) to get good performance. Here is a re-post of some of that (with Josh's permission): First what was observed w.r.t. the machine. Josh Paetzel wrote: >>>> It's all good. >>>> >>>> It's a 96GB RAM machine and I have 2 million nmbclusters, so 8GB RAM, >>>> and we've tried 1024 NFS threads. >>>> >>>> It might be running out of network memory but we can't really afford to >>>> give it any more, for this use case disabling FHA might end up being the >>>> way to go. >>>> I wrote: >>> Just to fill mav@ in, the person that reported a serious performance >>> problem >>> to Josh was able to fix it by disabling FHA. Josh Paetzel wrote: >> >> There's about 300 virtual machines that mount root from a read only NFS >> share. >> >> There's also another few hundred users that mount their home directories >> over NFS. When things went sideways it is always the virtual machines >> that get unusable. 45 seconds to log in via ssh, 15 minutes to boot, >> stuff like that. >> >> root@head2] ~# nfsstat -s 1 >> GtAttr Lookup Rdlink Read Write Rename Access Rddir >> 4117 17 0 124 689 4 680 0 >> 4750 31 5 121 815 3 950 1 >> 4168 16 0 109 659 9 672 0 >> 4416 24 0 112 771 3 748 0 >> 5038 86 0 76 728 4 825 0 >> 5602 21 0 76 740 3 702 6 >> >> [root@head2] ~# arcstat.py 1 >> time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c >> 18:25:36 21 0 0 0 0 0 0 0 0 65G 65G >> 18:25:37 1.8K 23 1 23 1 0 0 7 0 65G 65G >> 18:25:38 1.9K 88 4 32 1 56 32 3 0 65G 65G >> 18:25:39 2.2K 67 3 62 2 5 5 2 0 65G 65G >> 18:25:40 2.7K 132 4 39 1 93 17 8 0 65G 65G >> >> last pid: 7800; load averages: 1.44, 1.65, 1.68 >> up >> 0+19:22:29 18:26:16 >> 69 processes: 1 running, 68 sleeping >> CPU: 0.1% user, 0.0% nice, 1.8% system, 0.9% interrupt, 97.3% idle >> Mem: 297M Active, 180M Inact, 74G Wired, 140K Cache, 565M Buf, 19G Free >> ARC: 66G Total, 39G MFU, 24G MRU, 53M Anon, 448M Header, 1951M Other >> Swap: 28G Total, 28G Free >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU >> COMMAND >> 9915 root 37 52 0 9900K 2060K rpcsvc 16 16.7H 24.02% nfsd >> 6402 root 1 52 0 85352K 20696K select 8 47:17 3.08% >> python2.7 >> 43178 root 1 20 0 70524K 30752K select 7 31:04 0.59% >> rsync >> 7363 root 1 20 0 49512K 6456K CPU16 16 0:00 0.59% top >> 37968 root 1 20 0 70524K 31432K select 7 16:53 0.00% >> rsync >> 37969 root 1 20 0 55752K 11052K select 1 9:11 0.00% ssh >> 13516 root 12 20 0 176M 41152K uwait 23 4:14 0.00% >> collectd >> 31375 root 12 20 0 176M 42432K uwa >> >> This is a quick peek at the system at the end of the day, so load has >> dropped off considerably, however the main takeaway is it has plenty of >> free RAM, and ZFS ARC hit percentage is > 99%. >> I wrote: >>> I took a look at it and I wonder if it is time to consider changing the >>> algorithm >>> somewhat? >>> >>> The main thing that I wonder about is doing FHA for all the RPCs other than >>> Read and Write. >>> >>> In particular, Getattr is often the most frequent RPC and doing FHA for it >>> seems >>> like wasted overhead to me? Normally separate Getattr RPCs wouldn't be done >>> for >>> FHs are being Read/Written, since the Read/Write reply has updated >>> attributes in it. >>> Although the load is mostly Getattr RPCs and I think the above statement is correct, I don't know if the overhead of doing FHA for all the Getattr RPCs explains the observed performance problem? I don't see how doing FHA for RPCs like Getattr will improve their performance. Note that when the FHA algorithm was originally done, there wasn't a shared vnode lock and, as such, all RPCs on a given FH/vnode would have been serialized by the vnode lock anyhow. Now, with shared vnode locks, this isn't the case for frequently performed RPCs like Getattr, Read (Write for ZFS), Lookup and Access. I have always felt that doing FHA for RPCs other than Read and Write didn't make much sense to me, but I don't have any evidence that it causes a significant performance penalty. Anyhow, the attached simple patch limits FHA to Read and Write RPCs. The simple testing I've done shows it to be about performance neutral (0-1% improvement), but I have only small hardware and no ZFS or any easy way to emulate a load of mostly Getattr RPCs. As such, unless others can determine if this patch (or some other one) helps w.r.t. this, I don't think committing it makes much sense? If anyone can test this or have comments w.r.t. this or suggestions for other possible changes to the FHA algorithm, please do so. Thanks, rick ------=_Part_49882208_271473258.1445601863862 Content-Type: text/x-patch; name=nfsfha.patch Content-Disposition: attachment; filename=nfsfha.patch Content-Transfer-Encoding: base64 LS0tIG5mcy9uZnNfZmhhLmMuc2F2CTIwMTUtMTAtMjEgMTk6Mjk6NTMuMDAwMDAwMDAwIC0wNDAw CisrKyBuZnMvbmZzX2ZoYS5jCTIwMTUtMTAtMjIgMTk6MzE6NDMuMDAwMDAwMDAwIC0wNDAwCkBA IC00Miw2ICs0Miw4IEBAIF9fRkJTRElEKCIkRnJlZUJTRDogaGVhZC9zeXMvbmZzL25mc19maGEK IAogc3RhdGljIE1BTExPQ19ERUZJTkUoTV9ORlNfRkhBLCAiTkZTIEZIQSIsICJORlMgRkhBIik7 CiAKK3N0YXRpYyB1X2ludAluZnNmaGFfZW5hYmxlYWxscnBjcyA9IDA7CisKIC8qCiAgKiBYWFgg bmVlZCB0byBjb21tb25pemUgZGVmaW5pdGlvbnMgYmV0d2VlbiBvbGQgYW5kIG5ldyBORlMgY29k ZS4gIERlZmluZQogICogdGhpcyBoZXJlIHNvIHdlIGRvbid0IGluY2x1ZGUgb25lIG5mc3Byb3Rv Lmggb3ZlciB0aGUgb3RoZXIuCkBAIC0xMDksNiArMTExLDEwIEBAIGZoYV9pbml0KHN0cnVjdCBm aGFfcGFyYW1zICpzb2Z0YykKIAkgICAgT0lEX0FVVE8sICJmaGVfc3RhdHMiLCBDVExUWVBFX1NU UklORyB8IENUTEZMQUdfUkQsIDAsIDAsCiAJICAgIHNvZnRjLT5jYWxsYmFja3MuZmhlX3N0YXRz X3N5c2N0bCwgIkEiLCAiIik7CiAKKwlTWVNDVExfQUREX1VJTlQoJnNvZnRjLT5zeXNjdGxfY3R4 LCBTWVNDVExfQ0hJTERSRU4oc29mdGMtPnN5c2N0bF90cmVlKSwKKwkgICAgT0lEX0FVVE8sICJl bmFibGVfYWxscnBjcyIsIENUTEZMQUdfUlcsCisJICAgICZuZnNmaGFfZW5hYmxlYWxscnBjcywg MCwgIkVuYWJsZSBGSEEgZm9yIGFsbCBSUENzIik7CisKIH0KIAogdm9pZApAQCAtMzgzLDYgKzM4 OSw3IEBAIGZoYV9hc3NpZ24oU1ZDVEhSRUFEICp0aGlzX3RocmVhZCwgc3RydWMKIAlzdHJ1Y3Qg ZmhhX2luZm8gaTsKIAlzdHJ1Y3QgZmhhX2hhc2hfZW50cnkgKmZoZTsKIAlzdHJ1Y3QgZmhhX2Nh bGxiYWNrcyAqY2I7CisJcnBjcHJvY190IHByb2NudW07CiAKIAljYiA9ICZzb2Z0Yy0+Y2FsbGJh Y2tzOwogCkBAIC0zOTksNiArNDA2LDI0IEBAIGZoYV9hc3NpZ24oU1ZDVEhSRUFEICp0aGlzX3Ro cmVhZCwgc3RydWMKIAlpZiAocmVxLT5ycV92ZXJzICE9IDIgJiYgcmVxLT5ycV92ZXJzICE9IDMp CiAJCWdvdG8gdGhpc3Q7CiAKKwkvKgorCSAqIFRoZSBtYWluIHJlYXNvbiBmb3IgdXNlIG9mIEZI QSBub3cgdGhhdCBGcmVlQlNEIHN1cHBvcnRzIHNoYXJlZAorCSAqIHZub2RlIGxvY2tzIGlzIHRv IHRyeSBhbmQgbWFpbnRhaW4gc2VxdWVudGlhbCBvcmRlcmluZyBvZiBSZWFkCisJICogYW5kIFdy aXRlIG9wZXJhdGlvbnMuICBBbHNvLCBpdCBoYXMgYmVlbiBvYnNlcnZlZCB0aGF0IHNvbWUKKwkg KiBSUEMgbG9hZHMsIHN1Y2ggYXMgb25lIG1vc3RseSBvZiBHZXRhdHRyIFJQQ3MsIHBlcmZvcm0g YmV0dGVyCisJICogd2l0aG91dCBGSEEgYXBwbGllZCB0byB0aGVtLiAgQXMgc3VjaCwgRkhBIGlz IG9ubHkgYXBwbGllZCB0bworCSAqIFJlYWQgYW5kIFdyaXRlIFJQQ3MgYnkgZGVmYXVsdC4KKwkg KiBUaGUgc3lzY3RsICJmaGEuZW5hYmxlX2FsbHJwY3MiIGNhbiBiZSBzZXQgbm9uemVybyBzbyB0 aGF0IEZIQSBpcworCSAqIGFwcGxpZWQgdG8gYWxsIFJQQ3MgZm9yIGJhY2t3YXJkcyBjb21wYXRp YmlsaXR5IHdpdGggdGhlIG9sZCBGSEEKKwkgKiBjb2RlLgorCSAqLworCXByb2NudW0gPSByZXEt PnJxX3Byb2M7CisJaWYgKHJlcS0+cnFfdmVycyA9PSAyKQorCQlwcm9jbnVtID0gY2ItPmdldF9w cm9jbnVtKHByb2NudW0pOworCWlmIChjYi0+aXNfcmVhZChwcm9jbnVtKSA9PSAwICYmIGNiLT5p c193cml0ZShwcm9jbnVtKSA9PSAwICYmCisJICAgIG5mc2ZoYV9lbmFibGVhbGxycGNzID09IDAp CisJCWdvdG8gdGhpc3Q7CisKIAlmaGFfZXh0cmFjdF9pbmZvKHJlcSwgJmksIGNiKTsKIAogCS8q Cg== ------=_Part_49882208_271473258.1445601863862--