From owner-freebsd-net@freebsd.org Sat Apr 14 23:18:30 2018 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6F904FA272C for ; Sat, 14 Apr 2018 23:18:30 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660080.outbound.protection.outlook.com [40.107.66.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT TLS CA 4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E8E2772607 for ; Sat, 14 Apr 2018 23:18:29 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM (52.132.66.153) by YQBPR0101MB1682.CANPRD01.PROD.OUTLOOK.COM (52.132.70.29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.675.10; Sat, 14 Apr 2018 23:18:27 +0000 Received: from YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM ([fe80::893c:efc2:d71f:945a]) by YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM ([fe80::893c:efc2:d71f:945a%13]) with mapi id 15.20.0675.014; Sat, 14 Apr 2018 23:18:27 +0000 From: Rick Macklem To: =?iso-8859-1?Q?Niels_Kobsch=E4tzki?= , "freebsd-net@freebsd.org" Subject: Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release Thread-Topic: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release Thread-Index: AQHT0qiQifFFt12p/0eGRV3S1dRLRqP/ezZKgAA+WICAASn0RA== Date: Sat, 14 Apr 2018 23:18:27 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YQBPR0101MB1682; 7:RNhB4CRNivUj5K1l/J1l/GxeLcnMV5NW9Cqb4L1ltjfHfU10xzlYZX2H+7KF3KnIjoeJixlqG9JXne215bsR2gdNVNRyMYxVXWyvZiksCPITShqxUdlxloYV3PxumbiQsTfDmjhpdlvESjSagNS42GNL5h0d3wwBfcC3PpkTcpENZtTg5gIrgeXo5YUin5mjYBLoIZN3pHi1OX4zfgpvAAq6KPhuBgoiCjdSJnlopPcGvVKUPeR7DvYrX+0x6BSr x-ms-exchange-antispam-srfa-diagnostics: SOS; x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(8989080)(5600026)(4534165)(4627221)(201703031133081)(201702281549075)(8990040)(2017052603328)(7153060)(7193020); SRVR:YQBPR0101MB1682; x-ms-traffictypediagnostic: YQBPR0101MB1682: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(5005006)(8121501046)(10201501046)(3002001)(93006095)(93001095)(3231232)(944501327)(52105095)(6041310)(20161123562045)(20161123558120)(20161123560045)(20161123564045)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:YQBPR0101MB1682; BCL:0; PCL:0; RULEID:; SRVR:YQBPR0101MB1682; x-forefront-prvs: 0642A5E7BA x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(346002)(39380400002)(396003)(39860400002)(376002)(199004)(189003)(76176011)(14454004)(102836004)(26005)(59450400001)(6506007)(229853002)(68736007)(7696005)(486006)(97736004)(6436002)(6246003)(105586002)(86362001)(9686003)(53936002)(55016002)(11346002)(446003)(2900100001)(74482002)(99286004)(476003)(2501003)(2906002)(8936002)(74316002)(81156014)(8676002)(81166006)(5250100002)(786003)(316002)(106356001)(5660300001)(33656002)(3660700001)(478600001)(186003)(25786009)(305945005)(110136005)(3280700002)(437434002); DIR:OUT; SFP:1101; SCL:1; SRVR:YQBPR0101MB1682; H:YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: V0aFVzAvhKi/eUrKldr/Z+McahPkW1+nDJGmw9INAF7r3oIM6sgJA3Iyv9sDiM04M2ED74PTrd2uCHfF9C927OBAphsWUi1yBYpDeDqExxr3KUVyulO+ipFB2TVGeDSrTGkguqoOqvMYRhsU/90DsQ1RCxrCqesupcvf1ZluL4zu1lXTlttV1c38OftqNdZl spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: c70fdbd6-8090-4f66-55cf-08d5a25e075d X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: c70fdbd6-8090-4f66-55cf-08d5a25e075d X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Apr 2018 23:18:27.6351 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQBPR0101MB1682 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Apr 2018 23:18:30 -0000 Niels Kobsch=E4tzki wrote: >On 04/14/2018 03:49 AM, Rick Macklem wrote: >> Niels Kobsch=E4tzki wrote: >>> sorry for the cross-posting but so far I had no real luck on the forum >>> or on question, thus I want to try my luck here as well. >> I read email lists but don't do the other stuff, so I just saw this yest= erday. >> Short answer, I haven't a clue why cache hits rate would have changed. >> >> The code that decides if there is a hit/miss for the attribute cache is = in >> ncl_getattrcache() and the code hasn't changed between 10.3->11.1, >> except the old code did a mtx_lock(&Giant), but I can't imagine how that >> would affect the code. >> >> You might want to: >> # sysctl -a | fgrep vfs.nfs >> for both the 10.3 and 11.1 systems, to check if any defaults have someho= w >> been changed. (I don't recall any being changed, but??) > >I did that and there did nothing change. > >> If you go into ncl_getattrcache() {it's in sys/fs/nfsclient/nfs_clsubs.c= } >> and add a printf() for "time_second" and "np->n_mtime.tv_sec" near the >> top, where it calculates "timeo" from it. >> Running this hacked kernel might show you if either of these fields is b= ogus. >> (You could then printf() "timeo" and "np->n_attrtimeo" just before the "= if" >> clause that increments "attrcache_misses", which is where the cache miss= es >> happen to see why it is missing the cache.) >> If you could do this for the 10.3 kernel as well, this might indicate wh= y the >> miss rate has increased? > >I will do this next week. On monday we switch for other reasons to other >nfs-servers and when we see that they run stable, I will do this next. With a miss rate of 2.7%, I doubt printing the above will help. I thought you were seeing a high miss rate. >Btw. I calculated now the percentages. The old servers had a attr miss >rate of something like 0.004%, while the upgraded one has more like >2.7%. This is till low from what I've read (I remember that you should >start adjusting acreg* when you hit more than 40% misses) but far higher >than before. You could try increasing acregmin, acregmax and see if the misses are reduc= ed. (The only risk with increasing the cache timeout is that, if another client= changes the attributes, then the client will use stale ones for longer. Usually, t= his doesn't cause serious problems.) To be honest, a Getattr RPC is pretty low overhead, so I doubt the increase to 2.7% will affect your application's performance, but it is interesting t= hat it increased. You might also try increasing acdirmin, acdirmax in case it is the director= y attributes that are having cache misses. Oh, and check that your time of day clocks are in sync with the server, since the caches are time based, since there is no cache coherency protocol in NFS. [good stuff snipped] rick