From owner-freebsd-net@freebsd.org Sat Apr 14 01:49:35 2018 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8B025F9537C for ; Sat, 14 Apr 2018 01:49:35 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660053.outbound.protection.outlook.com [40.107.66.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT TLS CA 4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0236B86497 for ; Sat, 14 Apr 2018 01:49:34 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM (52.132.66.153) by YQBPR0101MB1058.CANPRD01.PROD.OUTLOOK.COM (52.132.66.157) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.675.10; Sat, 14 Apr 2018 01:49:33 +0000 Received: from YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM ([fe80::893c:efc2:d71f:945a]) by YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM ([fe80::893c:efc2:d71f:945a%13]) with mapi id 15.20.0675.014; Sat, 14 Apr 2018 01:49:33 +0000 From: Rick Macklem To: =?Windows-1252?Q?Niels_Kobsch=E4tzki?= , "freebsd-net@freebsd.org" Subject: Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release Thread-Topic: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release Thread-Index: AQHT0qiQifFFt12p/0eGRV3S1dRLRqP/ezZK Date: Sat, 14 Apr 2018 01:49:33 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YQBPR0101MB1058; 7:7LmiJ+a38/9+1EZpr8/jOC+2KtbBpCGgX/mWaSfqplve6MQhaqCzk1qkAFBE9bTj0tStfIY5xzKpqNn6EwMHyBWxsF596PhCFDGelWwREnBUWu2Gb2PzBU9LuMv0HQjAfgRWP+MXA/7mwyK6vdH7rxRWLBOpHjpYquimD7MJraUi6H+LZZP5Zt+ykUdnKSXRkQOLR9MpYQch7iKdTfAiU+e8uA0BKvmjD7LWLEYbUXoQuumgE/M5SHbFe2/l17Ck x-ms-exchange-antispam-srfa-diagnostics: SOS; x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(8989080)(4534165)(4627221)(201703031133081)(201702281549075)(8990040)(5600026)(2017052603328)(7153060)(7193020); SRVR:YQBPR0101MB1058; x-ms-traffictypediagnostic: YQBPR0101MB1058: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(8121501046)(5005006)(3231232)(944501327)(52105095)(3002001)(93006095)(93001095)(10201501046)(6041310)(20161123558120)(20161123562045)(20161123564045)(20161123560045)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:YQBPR0101MB1058; BCL:0; PCL:0; RULEID:; SRVR:YQBPR0101MB1058; x-forefront-prvs: 0642A5E7BA x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(366004)(396003)(39860400002)(376002)(39380400002)(199004)(189003)(186003)(26005)(106356001)(105586002)(74316002)(2501003)(478600001)(33656002)(305945005)(74482002)(5250100002)(81166006)(446003)(11346002)(81156014)(5660300001)(68736007)(86362001)(8676002)(476003)(7696005)(229853002)(2906002)(316002)(3660700001)(59450400001)(76176011)(14454004)(486006)(110136005)(55016002)(6506007)(8936002)(6436002)(786003)(102836004)(3280700002)(6246003)(25786009)(2900100001)(97736004)(9686003)(99286004)(53936002)(437434002); DIR:OUT; SFP:1101; SCL:1; SRVR:YQBPR0101MB1058; H:YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: uRiuU+oBrtCpAa/NVBRfAwt8Mg3Uy+o6t1gn8AKSpD5iSVnOT54UOP/U3TDGmYJaAhds/ZYKZjStGSUJUZg/B2k/V0P9ukVbibIcMCDCuWMSrVy1lGeWzrB9xgUBJh1DI9zzdjrWl1iuRdLIvI3a6XmS5KyiLJ7IyyxjoPj1WNSYCx0/GwXXiwjZhCYrIFSB spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: b3cd06be-1900-4eab-ce34-08d5a1a9f886 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: b3cd06be-1900-4eab-ce34-08d5a1a9f886 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Apr 2018 01:49:33.2828 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQBPR0101MB1058 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Apr 2018 01:49:35 -0000 Niels Kobsch=E4tzki wrote: >sorry for the cross-posting but so far I had no real luck on the forum >or on question, thus I want to try my luck here as well. I read email lists but don't do the other stuff, so I just saw this yesterd= ay. Short answer, I haven't a clue why cache hits rate would have changed. The code that decides if there is a hit/miss for the attribute cache is in ncl_getattrcache() and the code hasn't changed between 10.3->11.1, except the old code did a mtx_lock(&Giant), but I can't imagine how that would affect the code. You might want to: # sysctl -a | fgrep vfs.nfs for both the 10.3 and 11.1 systems, to check if any defaults have somehow been changed. (I don't recall any being changed, but??) If you go into ncl_getattrcache() {it's in sys/fs/nfsclient/nfs_clsubs.c} and add a printf() for "time_second" and "np->n_mtime.tv_sec" near the top, where it calculates "timeo" from it. Running this hacked kernel might show you if either of these fields is bogu= s. (You could then printf() "timeo" and "np->n_attrtimeo" just before the "if" clause that increments "attrcache_misses", which is where the cache misses happen to see why it is missing the cache.) If you could do this for the 10.3 kernel as well, this might indicate why t= he miss rate has increased? >I upgraded a machine from 10.3-Prerelease (custom kernel with >tcp_fastopen added) to 11.1-Release (standard kernel) with >freebsd-update. I have two other machines that are still on >10.3-Prerelease. Those machines mount an NFS-export from a >Linux-NFS-server and use NFSv3. The machine that got upgraded shows now >far more cache misses for getattr than on the 10.3-machines (we talk a >factor of 100) in munin. munin also shows a lot more cache-misses for >other metrics like biow, biorl, biod (where can I find what those >metrics mean=85currently I have not even an understanding what these are) >etc. > >Can anybody help me how I can debug this problem or has an idea what >could cause the problem? The result of this behavior is that this >machine shows a lower performance than the others and I cannot upgrade >other machines before I didn't fix this bug. I haven't run a 10.x system in quite a while. When I get home in a few days= , I might be able to reproduce this. If I can. I can poke at it, but it would= be at least a week before I might have an answer and I may not figure it out for = a long time. rick