From owner-freebsd-stable@freebsd.org Sun Oct 15 20:45:15 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 44170E4A989 for ; Sun, 15 Oct 2017 20:45:15 +0000 (UTC) (envelope-from fabian.freyer@physik.tu-berlin.de) Received: from mail.tu-berlin.de (mail.tu-berlin.de [130.149.7.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 045DA115C for ; Sun, 15 Oct 2017 20:45:14 +0000 (UTC) (envelope-from fabian.freyer@physik.tu-berlin.de) X-tubIT-Incoming-IP: 130.149.50.25 Received: from mail.physik-pool.tu-berlin.de ([130.149.50.25] helo=mail.physik.tu-berlin.de) by mail.tu-berlin.de (exim-4.89/mailfrontend-5) with esmtp id 1e3pmY-0002sJ-7O; Sun, 15 Oct 2017 22:45:12 +0200 Received: from [130.149.50.251] (aufsichtskiste.physik-pool.tu-berlin.de [130.149.50.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.physik.tu-berlin.de (Postfix) with ESMTPSA id 73B3FB441; Sun, 15 Oct 2017 20:45:00 +0000 (UTC) From: Fabian Freyer Subject: Re: kernel process [nfscl] high cpu To: freebsd-stable@freebsd.org Cc: rmacklem@uoguelph.ca, lists@searchy.net, zeha@physik.tu-berlin.de, steff@physik.tu-berlin.de References: <1031959302.30289198.1430594914473.JavaMail.root@uoguelph.ca> <5603AE3D.5090407@searchy.net> <1887696626.8730412.1443097925392.JavaMail.zimbra@uoguelph.ca> <5603FC85.8070105@searchy.net> <924567664.10016007.1443129423733.JavaMail.zimbra@uoguelph.ca> Message-ID: <39012cc6-a527-eb8e-3a6f-7e24cd9d98d8@physik.tu-berlin.de> Date: Sun, 15 Oct 2017 22:45:00 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <924567664.10016007.1443129423733.JavaMail.zimbra@uoguelph.ca> Content-Type: text/plain; charset=utf-8 Content-Language: de-DE Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Oct 2017 20:45:15 -0000 Hi, (I'm not on this list, please CC me in future replies. My apologies to those who get this message twice, I had a typo in the To: header, and have to re-send it to the list.) sorry for reviving such an old thread, but I've run into this problem lately as well, on a 11.1-RELEASE-p1 jailhost mounting NFSv4.0 mounts into jails. On 24.09.2015 23:17, Rick Macklem wrote: > Frank de Bot wrote: >> Rick Macklem wrote: >>> Frank de Bot wrote: >>>> Rick Macklem wrote: >>>>> Frank de Bot wrote: >>>>>> Hi, >>>>>> >>>>>> On a 10.1-RELEASE-p9 server I have several NFS mounts used for a >>>>>> jail. >>>>>> Because it's a server only to test, there is a low load. But the >>>>>> [nfscl] >>>>>> process is hogging a CPU after a while. This happens pretty fast, >>>>>> within >>>>>> 1 or 2 days. I'm noticing the high CPU of the process when I want to >>>>>> do >>>>>> some test after a little while (those 1 or 2 days). Here's my ps ax | grep nfscl: # ps ax | grep nfscl 11111 - DL 932:08.74 [nfscl] 11572 - DL 442:27.42 [nfscl] 30396 - DL 933:44.13 [nfscl] 35902 - DL 442:08.70 [nfscl] 40881 - DL 938:56.04 [nfscl] 43276 - DL 932:38.88 [nfscl] 49178 - DL 934:24.77 [nfscl] 56314 - DL 935:21.55 [nfscl] 60085 - DL 936:37.11 [nfscl] 71788 - DL 933:10.96 [nfscl] 82001 - DL 934:45.76 [nfscl] 86222 - DL 931:42.94 [nfscl] 92353 - DL 1186:53.38 [nfscl] 21105 20 S+ 0:00.00 grep nfscl And this is on a 12-core with Hyperthreading: # uptime 7:28PM up 11 days, 4:50, 4 users, load averages: 25.49, 21.91, 20.25 Most of this load is being generated by the nfscl threads. >>>>>> My jail.conf look like: >>>>>> >>>>>> exec.start = "/bin/sh /etc/rc"; >>>>>> exec.stop = "/bin/sh /etc/rc.shutdown"; >>>>>> exec.clean; >>>>>> mount.devfs; >>>>>> exec.consolelog = "/var/log/jail.$name.log"; >>>>>> #mount.fstab = "/usr/local/etc/jail.fstab.$name"; >>>>>> >>>>>> test01 { >>>>>> host.hostname = "test01_hosting"; >>>>>> ip4.addr = somepublicaddress; >>>>>> ip4.addr += someprivateaddress; >>>>>> >>>>>> mount = "10.13.37.2:/tank/hostingbase /opt/jails/test01 >>>>>> nfs nfsv4,minorversion=1,pnfs,ro,noatime 0 0"; >>>>>> mount += "10.13.37.2:/tank/hosting/test >>>>>> /opt/jails/test01/opt nfs nfsv4,minorversion=1,pnfs,noatime >>>>>> 0 0"; >>>>>> >>>>>> path = "/opt/jails/test01"; >>>>>> } >>>>>> >>>>>> Last test was with NFS 4.1, I also worked with NFS 4.(0) with the >>>>>> same >>>>>> result. In the readonly nfs share there are symbolic links point to >>>>>> the >>>>>> read-write share for logging, storing .run files, etc. When I monitor >>>>>> my >>>>>> network interface with tcpdump, there is little nfs traffic, only >>>>>> when I >>>>>> do try to access the shares there is activity. >>>>>> >>>>>> What is causing nfscl to run around in circles, hogging the CPU (it >>>>>> makes the system slow to respond too) or how can I found out what's >>>>>> the >>>>>> cause? >>>>>> >>>>> Well, the nfscl does server->client RPCs referred to as callbacks. I >>>>> have no idea what the implications of running it in a jail is, but I'd >>>>> guess that these server->client RPCs get blocked somehow, etc... >>>>> (The NFSv4.0 mechanism requires a separate IP address that the server >>>>> can connect to on the client. For NFSv4.1, it should use the same >>>>> TCP connection as is used for the client->server RPCs. The latter >>>>> seems like it should work, but there is probably some glitch.) >>>>> >>>>> ** Just run without the nfscl daemon (it is only needed for delegations >>>>> or >>>>> pNFS). >>>> >>>> How can I disable the nfscl daemon? >>>> >>> Well, the daemon for the callbacks is called nfscbd. >>> You should check via "ps ax", to see if you have it running. >>> (For NFSv4.0 you probably don't want it running, but for NFSv4.1 you >>> do need it. pNFS won't work at all without it, but unless you have a >>> server that supports pNFS, it won't work anyhow. Unless your server is >>> a clustered Netapp Filer, you should probably not have the "pnfs" option.) >>> >>> To run the "nfscbd" daemon you can set: >>> nfscbd_enable="TRUE" >>> in your /etc/rc.conf will start it on boot. >>> Alternately, just type "nfscbd" as root. >>> >>> The "nfscl" thread is always started when an NFSv4 mount is done. It does >>> an assortment of housekeeping things, including a Renew op to make sure the >>> lease doesn't expire. If for some reason the jail blocks these Renew RPCs, >>> it will try to do them over and over and ... because having the lease >>> expire is bad news for NFSv4. How could you tell? >>> Well, capturing packets between the client and server, then looking at them >>> in wireshark is probably the only way. (Or maybe a large count for Renew >>> in the output from "nfsstat -e".) >>> >>> "nfscbd" is optional for NFSv4.0. Without it, you simply don't do >>> callbacks/delegations. >>> For NFSv4.1 it is pretty much required, but doesn't need a separate >>> server->client TCP >>> connection. >>> --> I'd enable it for NFSv4.1, but disable it for NFSv4.0 at least as a >>> starting point. >>> >>> And as I said before, none of this is tested within jails, so I have no >>> idea >>> what effect the jails have. Someone who understands jails might have some >>> insight >>> w.r.t. this? >>> >>> rick >>> >> >> Since last time I haven't tried to use pnfs and just sticked with >> nfsv4.0. nfscbd is not running. The server is now running 10.2. The >> number of renews is not very high (56k, getattr is for example 283M) >> View with wireshark, renew calls look good ,the nfs status is ok. >> >> Is there a way to know what [nfscl] is active with? >> > Not that I can think of. When I do "ps axHl" I see it in DL state and not > doing much of anything. (You could try setting "sysctl vfs.nfs.debuglevel=4", > but I don't think you'll see anything syslog'd that is useful?) > This is what I'd expect for an NFSv4.0 mount without the nfscbd running. nfscbd is not running. I've done a bit of digging with kgdb, by attatching to a thread of one of the nfscl's and grabbed a backtrace: --8<---- snip (kgdb) info threads [...] 678 Thread 101151 (PID=82001: nfscl) 0xffffffff80a9780a in sched_switch () [...] (kgdb) thread 678 [Switching to thread 678 (Thread 101151)]#0 0xffffffff80a9780a in sched_switch () (kgdb) bt #0 0xffffffff80a9780a in sched_switch () #1 0xffffffff80a75a35 in mi_switch () #2 0xffffffff80abaa82 in sleepq_timedwait () #3 0xffffffff80a75435 in _sleep () #4 0xffffffff8095eb7d in nfscl_renewthread () #5 0xffffffff80985204 in start_nfscl () #6 0xffffffff80a2f815 in fork_exit () #7 0xffffffff80ec3b7e in fork_trampoline () #8 0x0000000000000000 in ?? () --8<---- snap So as you said, the nfscl thread is mostly hanging in a sleep. However, every approx. 30-50 seconds (this corresponds quite well with the timing of the renew calls on the wire) the CPU usage spikes to 100% on all 24 threads. dtrace(1m)'ing during those spikes show a huge number (order of 10^5) calls to nfscl_procdoesntexist and nfscl_cleanup_common: Calls nfscl_postop_attr 12 nfscl_loadattrcache 17 nfscl_request 22 nfscl_deleggetmodtime 26 nfscl_reqstart 26 nfscl_mustflush 30 nfscl_nodeleg 34 nfscl_cleanup_common 12005 nfscl_procdoesntexist 12315 Times nfscl_loadattrcache 41949 nfscl_deleggetmodtime 63014 nfscl_postop_attr 65046 nfscl_reqstart 77851 nfscl_procdoesntexist 30187728 nfscl_request 423855128 nfscl_nodeleg 841033772 nfscl_mustflush 4838119753 nfscl_cleanup_common 6436207841 For reference, here's the dtrace(1m) script I'm using: --8<---- snip #!/usr/sbin/dtrace -Cs #pragma D option quiet fbt::nfscl_*:entry { @entries[probefunc] = count(); traceme[probefunc] = 1; ts[probefunc] = timestamp; } fbt::nfscl_*:return /traceme[probefunc]/ { @times[probefunc] = sum(timestamp - ts[probefunc]); } tick-5sec { printa(@times); clear(@times); printa(@entries); clear(@entries); } --8<---- snap For completeness, I guess I *should* synchronize this with the for(;;) loop in nfscl_renewthread and count the actual number of process terminations, but OTOH, I am extremely sure (just by observing the number of running processes and PIDs) that it's nowhere near those counts. Digging a little deeper, I see pfind(9), or more closely pfind_locked is used in nfscl_procdoesntexist. What I do not understand yet is how pfind(9) interacts with jails. Maybe someone(TM) could shed some light on this? > Basically, when the nfscbd isn't running the server shouldn't issue any > delegations, because it shouldn't see a callback path (server->client TCP > connection). Also, if you are using a FreeBSD NFS server, it won't issue > delegations unless you've enabled that, which isn't the default. > > Check to see your Delegs in "nfsstat -e" is 0. It is. > If it is, then all the nfscl should be doing is waking up once per second > and doing very little except a Renew RPC once every 30-60sec. (A fraction of > what the server's lease duration is.) A tcpdump shows at least one renew call/reply pair per minute. > The only thing I can think of that might cause it to run a lot would be some > weirdness related to the TOD clock. It msleep()s for hz and also checks for > time_uptime (which should be in resolution of seconds) != the previous time. > (If the msleep()s were waking up too fequently, then it would loop around > doing not much of anything, over and over and over again...) > > >> I do understand nfs + jails could have issues, but I like to understand >> them. >> Fabian [1] https://lists.freebsd.org/pipermail/freebsd-hackers/2017-June/thread.html#51200