Date: Mon, 6 Jan 2025 05:53:38 -0800 From: Rick Macklem <rick.macklem@gmail.com> To: "Peter 'PMc' Much" <pmc@citylink.dinoex.sub.org> Cc: freebsd-fs@freebsd.org Subject: Re: system stalled, no I/O but 100% CPU from nfs Message-ID: <CAM5tNy5AzL9%2BWpjRV9N1Wzy94RpA2L93NqnYFjFvx38iAo1iyg@mail.gmail.com> In-Reply-To: <Z3tdPjxTE6GZmzwW@disp.intra.daemon.contact> References: <Z3tdPjxTE6GZmzwW@disp.intra.daemon.contact>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jan 5, 2025 at 8:45=E2=80=AFPM Peter 'PMc' Much <pmc@citylink.dinoex.sub.org> wrote: > > Cheers, > > This doesn't look good. It goes on for hours. What can be done about it? > (13.4 client & server) > > > 44 processes: 4 running, 39 sleeping, 1 waiting > CPU: 0.4% user, 0.0% nice, 99.6% system, 0.0% interrupt, 0.0% idle > Mem: 21M Active, 198M Inact, 1190M Wired, 278M Buf, 3356M Free > ARC: 418M Total, 39M MFU, 327M MRU, 128K Anon, 7462K Header, 43M Other > 332M Compressed, 804M Uncompressed, 2.42:1 Ratio > Swap: 15G Total, 15G Free > > PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAN= D > 417 root 4 52 0 12M 2148K RUN 20:55 99.12% nfscbd Do you have delegations enabled on your server (vfs.nfsd.issue_delegations not 0)? (If you do not, I have no idea why the server would be doing callbacks, which is what nfscbd handles.) Also, "nfsstat -m" on the client shows you/us what your mount options are. > 0 root 65 -16 - 0B 1040K swapin 0:17 0.64% kernel > 11054 root 1 52 0 18M 7664K RUN 0:04 0.10% bsdtar > 11 root 15 -56 - 0B 240K WAIT 0:15 0.05% intr > 16 root 1 -16 - 0B 16K - 0:01 0.03% racctd > 11062 root 1 20 0 14M 3804K RUN 0:00 0.03% top > 7 root 3 -16 - 0B 48K psleep 0:00 0.01% pageda= emon > 11056 root 1 20 0 21M 10M select 0:00 0.01% sshd > 6 root 1 -16 - 0B 16K - 0:00 0.01% rand_h= arvest > > > Interface Traffic Peak Total > vtnet0 in 5.380 KB/s 9.113 KB/s 781.439 = MB > out 4.012 KB/s 8.002 KB/s 674.294 = MB > > > # nfsstat -zc > /dev/null ; sleep 1 ; nfsstat -c Adding -E makes it show all RPC counts. (Without -E you just get the "old Sun compatible" output. > Rpc Counts: > Getattr Setattr Lookup Readlink Read = Write Create Remove > 1 2 5 0 0 = 0 0 0 > Rename Link Symlink Mkdir Rmdir Re= addir RdirPlus Access > 0 0 0 0 0 = 1 0 1 > Mknod Fsstat Fsinfo PathConf Commit > 0 0 0 0 0 > Rpc Info: > TimedOut Invalid X Replies Retries Requests > 0 0 0 0 11 > Cache Info: > Attr Hits Attr Misses Lkup Hits Lkup Misses BioR Hits BioR M= isses BioW Hits BioW Misses > 11 1 2 5 0 = 0 0 0 > BioRL Hits BioRL Misses BioD Hits BioD Misses DirE Hits DirE M= isses Accs Hits Accs Misses > 0 0 1 1 1 = 0 8 1 > > The above suggests that there is still some activity on the client, but the info. is limited. If the client is still in this state, you can collect more info via: # tcpdump -s 0 -w out.pcap host <nfs-server> run for a little while. The out.pcap file needs to be looked at in wireshark (tcpdump is useless at decoding NFS). If there is nothing secret in it, you can email it to me as an attachment, so I can take a look. # ps axHl done repeatedly gets a lot more info about the NFS related thread= s. (I'll admit I doubt the info is useful for this case?) # nfsstat -E -c -z repeatedly as above. If you just want to get rid of the mount # umount -N <mnt-path> should work, although it can take a couple of minutes. Either not running "nfscbd" on the client or disabling delegations by setting vfs.nfsd.issue_delegations=3D0 on the server (assuming you have them enabled) ,might/should avoid the problem. rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy5AzL9%2BWpjRV9N1Wzy94RpA2L93NqnYFjFvx38iAo1iyg>