From owner-freebsd-stable@freebsd.org Sun Oct 15 21:25:21 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EBC68E4B4B4 for ; Sun, 15 Oct 2017 21:25:21 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from NAM03-BY2-obe.outbound.protection.outlook.com (mail-by2nam03on0060.outbound.protection.outlook.com [104.47.42.60]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8997B2484 for ; Sun, 15 Oct 2017 21:25:20 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTOPR0101MB2172.CANPRD01.PROD.OUTLOOK.COM (52.132.46.161) by YTOPR0101MB1451.CANPRD01.PROD.OUTLOOK.COM (52.132.47.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.77.7; Sun, 15 Oct 2017 21:25:18 +0000 Received: from YTOPR0101MB2172.CANPRD01.PROD.OUTLOOK.COM ([fe80::50ab:b8bf:bbb9:81f4]) by YTOPR0101MB2172.CANPRD01.PROD.OUTLOOK.COM ([fe80::50ab:b8bf:bbb9:81f4%13]) with mapi id 15.20.0077.022; Sun, 15 Oct 2017 21:25:18 +0000 From: Rick Macklem To: Fabian Freyer , "freebsd-stable@freebsd.org" CC: "lists@searchy.net" , "zeha@physik.tu-berlin.de" , "steff@physik.tu-berlin.de" Subject: Re: kernel process [nfscl] high cpu Thread-Topic: kernel process [nfscl] high cpu Thread-Index: AQHTRfZ3DNkUvR06Q0enbGk7u4laxqLlaNid Date: Sun, 15 Oct 2017 21:25:18 +0000 Message-ID: References: <1031959302.30289198.1430594914473.JavaMail.root@uoguelph.ca> <5603AE3D.5090407@searchy.net> <1887696626.8730412.1443097925392.JavaMail.zimbra@uoguelph.ca> <5603FC85.8070105@searchy.net> <924567664.10016007.1443129423733.JavaMail.zimbra@uoguelph.ca>, <39012cc6-a527-eb8e-3a6f-7e24cd9d98d8@physik.tu-berlin.de> In-Reply-To: <39012cc6-a527-eb8e-3a6f-7e24cd9d98d8@physik.tu-berlin.de> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YTOPR0101MB1451; 6:csf354/YU2qXZSo2wgA/QJysq18CbtBlhwdFuqyfxlCJIzbqEohcD/0Bet4LDIcb7x4Q6vHMjWBRW0mbcJG7eYTgpmLUnB/wuVsJyoIIUzW7Rm0RnDMagHdJto3cLsIflCDn8QAMHWaoC9VbtLrvqHOCzieCZrTyK5bllzky5HI3OJ9M7bDBOvdww71+JMMZyu5Nn+oY4zHQ4fJ89mT9y+jZNuvXb2DnmrRu/D2KmwSxq23AwIy0xLDL3o/Innuu+HRL57pDgT6pegsh33oU8zxGK4MnJ9b994Qxs3geY8Krrt6tIzSfUTnELCk9zMztJcUPHR6I1c0Oq7AdJuau+A==; 5:rlazAeCjwrIk3BaNrd4KNzGKRzNrJqSjwrQuErCpJiCTdSnNJul7deqbO+fzpCQXlDeRzGi04A7D55GKx2sz29WIQ/pR+Z0ua0Y85Q9MsRX6EPU84s1cDUP9d1nkfxf9ysz7kr6Jx+scwIh+3fgxw+2XIDTnPI0D77VlSYzJOAg=; 24:Q6Usov6eBT51oGWHKUaxB7MLdSImQObdsz0O8conl+eTo9pfnD8y7gfL7MYdPx6gMvS6gL2+eunPLyDudKP2IT9M6W/7R4WJ1Hp+YTAvFWA=; 7:kXzJF/Fbt3zF0/IxEYKVKzJHl8E8rDx5qcRYitfJsKzhdWNiVvxW6rpmpPFzFDkYREDg7v/jqdN7XJtI/OLArg0tDJTICu4+R7lpe4XnoJAv8z+T9emd1iexfHepBuzyj9xWafXiGlu0bZkHz0OpdxG5rIK2i9XOILPk52uwQylULtYRQdRj2SPGvbr+o4hAI+jUuLjjQDbmJyg42q13VtxECHIU8z8z9rMlhe++sqE= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 6d50d38c-7d52-40a1-b119-08d514133c20 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254152)(2017052603199)(201703131423075)(201703031133081)(201702281549075); SRVR:YTOPR0101MB1451; x-ms-traffictypediagnostic: YTOPR0101MB1451: x-exchange-antispam-report-test: UriScan:(158342451672863)(33945880838080)(150554046322364)(75325880899374)(155532106045638)(21532816269658); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(100000703101)(100105400095)(3002001)(10201501046)(93006095)(93001095)(6041248)(20161123560025)(20161123558100)(201703131423075)(201702281529075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(20161123562025)(20161123564025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:YTOPR0101MB1451; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:YTOPR0101MB1451; x-forefront-prvs: 046164D5C4 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(376002)(346002)(24454002)(199003)(189002)(377454003)(93886005)(305945005)(81166006)(81156014)(8676002)(786003)(7696004)(2900100001)(55016002)(4326008)(110136005)(54906003)(8936002)(316002)(2950100002)(25786009)(14454004)(9686003)(6306002)(74316002)(53936002)(74482002)(2501003)(102836003)(6506006)(6246003)(5660300001)(33656002)(5250100002)(97736004)(6436002)(3280700002)(966005)(478600001)(53546010)(68736007)(76176999)(54356999)(101416001)(229853002)(189998001)(86362001)(106356001)(50986999)(105586002)(2906002)(3660700001); DIR:OUT; SFP:1101; SCL:1; SRVR:YTOPR0101MB1451; H:YTOPR0101MB2172.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:3; LANG:en; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Oct 2017 21:25:18.7409 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTOPR0101MB1451 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Oct 2017 21:25:22 -0000 Well, a couple of comments: 1 - I have no idea if NFSv4 mounts (or any NFS mount for that matter) will work correctly within jails. (I don't do jails and know nothing about them. Or, at least very little about them.) 2 - I do know that the "nfsuserd" daemon is badly broken when jails are in use. - There are two things you can do about this. 1 - Set vfs.nfsd.enable_stringtouid=3D1 on the server, plus do not run the nfsuserd daemon on either the client or server. (This will require a small change to /etc/rc.d/nfsd to avoid it being starte= d at boot. See the /etc/rc.d/nfsd script in head/current. The chang= e wasn't MFC'd because it was considered a POLA violation.) --> This causes the user/group strings to be just the numbers on the wire and nfsuserd is no longer needed. (The Linux client will do this by default.) If it exists on the client, also set vfs.nfs.enable_uidtostring=3D= 1. (I think this is in stable/11, but not 11.1.) OR 2 - Carry the patches in head/current as r320698, r320757 and r320758 over to the 11.1 sources and build the kernel plus nfsuserd from these patched sources. (These patches make the nfsuserd daemon use an AF_LOCAL socket, which allows it to work within a jail.) As noted at the beginning, I know that nfsuserd breaks when jails are in use, but I do not know what other NFSv4 related things break within jails, so fixing the nfsuserd situation may not resolve your problem= s. rick ________________________________________ From: Fabian Freyer Sent: Sunday, October 15, 2017 4:45:00 PM To: freebsd-stable@freebsd.org Cc: Rick Macklem; lists@searchy.net; zeha@physik.tu-berlin.de; steff@physik= .tu-berlin.de Subject: Re: kernel process [nfscl] high cpu Hi, (I'm not on this list, please CC me in future replies. My apologies to those who get this message twice, I had a typo in the To: header, and have to re-send it to the list.) sorry for reviving such an old thread, but I've run into this problem lately as well, on a 11.1-RELEASE-p1 jailhost mounting NFSv4.0 mounts into jails. On 24.09.2015 23:17, Rick Macklem wrote: > Frank de Bot wrote: >> Rick Macklem wrote: >>> Frank de Bot wrote: >>>> Rick Macklem wrote: >>>>> Frank de Bot wrote: >>>>>> Hi, >>>>>> >>>>>> On a 10.1-RELEASE-p9 server I have several NFS mounts used for a >>>>>> jail. >>>>>> Because it's a server only to test, there is a low load. But the >>>>>> [nfscl] >>>>>> process is hogging a CPU after a while. This happens pretty fast, >>>>>> within >>>>>> 1 or 2 days. I'm noticing the high CPU of the process when I want to >>>>>> do >>>>>> some test after a little while (those 1 or 2 days). Here's my ps ax | grep nfscl: # ps ax | grep nfscl 11111 - DL 932:08.74 [nfscl] 11572 - DL 442:27.42 [nfscl] 30396 - DL 933:44.13 [nfscl] 35902 - DL 442:08.70 [nfscl] 40881 - DL 938:56.04 [nfscl] 43276 - DL 932:38.88 [nfscl] 49178 - DL 934:24.77 [nfscl] 56314 - DL 935:21.55 [nfscl] 60085 - DL 936:37.11 [nfscl] 71788 - DL 933:10.96 [nfscl] 82001 - DL 934:45.76 [nfscl] 86222 - DL 931:42.94 [nfscl] 92353 - DL 1186:53.38 [nfscl] 21105 20 S+ 0:00.00 grep nfscl And this is on a 12-core with Hyperthreading: # uptime 7:28PM up 11 days, 4:50, 4 users, load averages: 25.49, 21.91, 20.25 Most of this load is being generated by the nfscl threads. >>>>>> My jail.conf look like: >>>>>> >>>>>> exec.start =3D "/bin/sh /etc/rc"; >>>>>> exec.stop =3D "/bin/sh /etc/rc.shutdown"; >>>>>> exec.clean; >>>>>> mount.devfs; >>>>>> exec.consolelog =3D "/var/log/jail.$name.log"; >>>>>> #mount.fstab =3D "/usr/local/etc/jail.fstab.$name"; >>>>>> >>>>>> test01 { >>>>>> host.hostname =3D "test01_hosting"; >>>>>> ip4.addr =3D somepublicaddress; >>>>>> ip4.addr +=3D someprivateaddress; >>>>>> >>>>>> mount =3D "10.13.37.2:/tank/hostingbase /opt/jails/test= 01 >>>>>> nfs nfsv4,minorversion=3D1,pnfs,ro,noatime 0 0"; >>>>>> mount +=3D "10.13.37.2:/tank/hosting/test >>>>>> /opt/jails/test01/opt nfs nfsv4,minorversion=3D1,pnfs,noat= ime >>>>>> 0 0"; >>>>>> >>>>>> path =3D "/opt/jails/test01"; >>>>>> } >>>>>> >>>>>> Last test was with NFS 4.1, I also worked with NFS 4.(0) with the >>>>>> same >>>>>> result. In the readonly nfs share there are symbolic links point to >>>>>> the >>>>>> read-write share for logging, storing .run files, etc. When I monito= r >>>>>> my >>>>>> network interface with tcpdump, there is little nfs traffic, only >>>>>> when I >>>>>> do try to access the shares there is activity. >>>>>> >>>>>> What is causing nfscl to run around in circles, hogging the CPU (it >>>>>> makes the system slow to respond too) or how can I found out what's >>>>>> the >>>>>> cause? >>>>>> >>>>> Well, the nfscl does server->client RPCs referred to as callbacks. I >>>>> have no idea what the implications of running it in a jail is, but I'= d >>>>> guess that these server->client RPCs get blocked somehow, etc... >>>>> (The NFSv4.0 mechanism requires a separate IP address that the server >>>>> can connect to on the client. For NFSv4.1, it should use the same >>>>> TCP connection as is used for the client->server RPCs. The latter >>>>> seems like it should work, but there is probably some glitch.) >>>>> >>>>> ** Just run without the nfscl daemon (it is only needed for delegatio= ns >>>>> or >>>>> pNFS). >>>> >>>> How can I disable the nfscl daemon? >>>> >>> Well, the daemon for the callbacks is called nfscbd. >>> You should check via "ps ax", to see if you have it running. >>> (For NFSv4.0 you probably don't want it running, but for NFSv4.1 you >>> do need it. pNFS won't work at all without it, but unless you have a >>> server that supports pNFS, it won't work anyhow. Unless your server is >>> a clustered Netapp Filer, you should probably not have the "pnfs" opti= on.) >>> >>> To run the "nfscbd" daemon you can set: >>> nfscbd_enable=3D"TRUE" >>> in your /etc/rc.conf will start it on boot. >>> Alternately, just type "nfscbd" as root. >>> >>> The "nfscl" thread is always started when an NFSv4 mount is done. It do= es >>> an assortment of housekeeping things, including a Renew op to make sure= the >>> lease doesn't expire. If for some reason the jail blocks these Renew RP= Cs, >>> it will try to do them over and over and ... because having the lease >>> expire is bad news for NFSv4. How could you tell? >>> Well, capturing packets between the client and server, then looking at = them >>> in wireshark is probably the only way. (Or maybe a large count for Rene= w >>> in the output from "nfsstat -e".) >>> >>> "nfscbd" is optional for NFSv4.0. Without it, you simply don't do >>> callbacks/delegations. >>> For NFSv4.1 it is pretty much required, but doesn't need a separate >>> server->client TCP >>> connection. >>> --> I'd enable it for NFSv4.1, but disable it for NFSv4.0 at least as a >>> starting point. >>> >>> And as I said before, none of this is tested within jails, so I have no >>> idea >>> what effect the jails have. Someone who understands jails might have so= me >>> insight >>> w.r.t. this? >>> >>> rick >>> >> >> Since last time I haven't tried to use pnfs and just sticked with >> nfsv4.0. nfscbd is not running. The server is now running 10.2. The >> number of renews is not very high (56k, getattr is for example 283M) >> View with wireshark, renew calls look good ,the nfs status is ok. >> >> Is there a way to know what [nfscl] is active with? >> > Not that I can think of. When I do "ps axHl" I see it in DL state and not > doing much of anything. (You could try setting "sysctl vfs.nfs.debuglevel= =3D4", > but I don't think you'll see anything syslog'd that is useful?) > This is what I'd expect for an NFSv4.0 mount without the nfscbd running. nfscbd is not running. I've done a bit of digging with kgdb, by attatching to a thread of one of the nfscl's and grabbed a backtrace: --8<---- snip (kgdb) info threads [...] 678 Thread 101151 (PID=3D82001: nfscl) 0xffffffff80a9780a in sched_switch () [...] (kgdb) thread 678 [Switching to thread 678 (Thread 101151)]#0 0xffffffff80a9780a in sched_switch () (kgdb) bt #0 0xffffffff80a9780a in sched_switch () #1 0xffffffff80a75a35 in mi_switch () #2 0xffffffff80abaa82 in sleepq_timedwait () #3 0xffffffff80a75435 in _sleep () #4 0xffffffff8095eb7d in nfscl_renewthread () #5 0xffffffff80985204 in start_nfscl () #6 0xffffffff80a2f815 in fork_exit () #7 0xffffffff80ec3b7e in fork_trampoline () #8 0x0000000000000000 in ?? () --8<---- snap So as you said, the nfscl thread is mostly hanging in a sleep. However, every approx. 30-50 seconds (this corresponds quite well with the timing of the renew calls on the wire) the CPU usage spikes to 100% on all 24 threads. dtrace(1m)'ing during those spikes show a huge number (order of 10^5) calls to nfscl_procdoesntexist and nfscl_cleanup_common: Calls nfscl_postop_attr 12 nfscl_loadattrcache 17 nfscl_request 22 nfscl_deleggetmodtime 26 nfscl_reqstart 26 nfscl_mustflush 30 nfscl_nodeleg 34 nfscl_cleanup_common 12005 nfscl_procdoesntexist 12315 Times nfscl_loadattrcache 41949 nfscl_deleggetmodtime 63014 nfscl_postop_attr 65046 nfscl_reqstart 77851 nfscl_procdoesntexist 30187728 nfscl_request 423855128 nfscl_nodeleg 841033772 nfscl_mustflush 4838119753 nfscl_cleanup_common 6436207841 For reference, here's the dtrace(1m) script I'm using: --8<---- snip #!/usr/sbin/dtrace -Cs #pragma D option quiet fbt::nfscl_*:entry { @entries[probefunc] =3D count(); traceme[probefunc] =3D 1; ts[probefunc] =3D timestamp; } fbt::nfscl_*:return /traceme[probefunc]/ { @times[probefunc] =3D sum(timestamp - ts[probefunc]); } tick-5sec { printa(@times); clear(@times); printa(@entries); clear(@entries); } --8<---- snap For completeness, I guess I *should* synchronize this with the for(;;) loop in nfscl_renewthread and count the actual number of process terminations, but OTOH, I am extremely sure (just by observing the number of running processes and PIDs) that it's nowhere near those counts. Digging a little deeper, I see pfind(9), or more closely pfind_locked is used in nfscl_procdoesntexist. What I do not understand yet is how pfind(9) interacts with jails. Maybe someone(TM) could shed some light on this? > Basically, when the nfscbd isn't running the server shouldn't issue any > delegations, because it shouldn't see a callback path (server->client TCP > connection). Also, if you are using a FreeBSD NFS server, it won't issue > delegations unless you've enabled that, which isn't the default. > > Check to see your Delegs in "nfsstat -e" is 0. It is. > If it is, then all the nfscl should be doing is waking up once per second > and doing very little except a Renew RPC once every 30-60sec. (A fraction= of > what the server's lease duration is.) A tcpdump shows at least one renew call/reply pair per minute. > The only thing I can think of that might cause it to run a lot would be s= ome > weirdness related to the TOD clock. It msleep()s for hz and also checks f= or > time_uptime (which should be in resolution of seconds) !=3D the previous = time. > (If the msleep()s were waking up too fequently, then it would loop around > doing not much of anything, over and over and over again...) > > >> I do understand nfs + jails could have issues, but I like to understand >> them. >> Fabian [1] https://lists.freebsd.org/pipermail/freebsd-hackers/2017-June/thread.html#5= 1200