Date: Tue, 7 Jan 2025 15:45:37 -0800 From: Rick Macklem <rick.macklem@gmail.com> To: "Peter 'PMc' Much" <pmc@citylink.dinoex.sub.org> Cc: freebsd-fs@freebsd.org Subject: Re: system stalled, no I/O but 100% CPU from nfs Message-ID: <CAM5tNy5jfjMYj8hEyXO4sWqjiV=GM4T2tXEaf69nN5J02%2BvU=Q@mail.gmail.com> In-Reply-To: <Z3wG3fEYjeE9f4nF@disp.intra.daemon.contact> References: <Z3tdPjxTE6GZmzwW@disp.intra.daemon.contact> <CAM5tNy5AzL9%2BWpjRV9N1Wzy94RpA2L93NqnYFjFvx38iAo1iyg@mail.gmail.com> <Z3wG3fEYjeE9f4nF@disp.intra.daemon.contact>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jan 6, 2025 at 8:45=E2=80=AFAM Peter 'PMc' Much <pmc@citylink.dinoex.sub.org> wrote: > > On Mon, Jan 06, 2025 at 05:53:38AM -0800, Rick Macklem wrote: > ! On Sun, Jan 5, 2025 at 8:45=E2=80=AFPM Peter 'PMc' Much > ! <pmc@citylink.dinoex.sub.org> wrote: > > ! > This doesn't look good. It goes on for hours. What can be done about= it? > ! > (13.4 client & server) > ! > > ! > > ! > 44 processes: 4 running, 39 sleeping, 1 waiting > ! > CPU: 0.4% user, 0.0% nice, 99.6% system, 0.0% interrupt, 0.0% idl= e > ! > Mem: 21M Active, 198M Inact, 1190M Wired, 278M Buf, 3356M Free > ! > ARC: 418M Total, 39M MFU, 327M MRU, 128K Anon, 7462K Header, 43M Othe= r > ! > 332M Compressed, 804M Uncompressed, 2.42:1 Ratio > ! > Swap: 15G Total, 15G Free > ! > > ! > PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU CO= MMAND > ! > 417 root 4 52 0 12M 2148K RUN 20:55 99.12% nf= scbd > ! Do you have delegations enabled on your server > ! (vfs.nfsd.issue_delegations not 0)? > > Not knowingly: > > # sysctl vfs.nfsd.issue_delegations > vfs.nfsd.issue_delegations: 0 > > ! (If you do not, I have no idea why the server would be doing > ! callbacks, which is what nfscbd > ! handles.) > > Me neither. ;) The cpu being associated with nfscbd might just be a glitch. NFS uses kernel threads and it is hard to know what process they might get associated with for these stats. When you do: # ps axHl it will show the kernel threads. If it happens again, it might turn out that the thread(s) racking up CPU aren't actually doing callacks. rick > > The good news at this point is, it is a single event. At first I > thought the whole cluster got slow (it is always too slow ;) ), but > it was only this node - the others have no cpu consumption on > nfscbd. > > The bad thing is, I cannot remember why I did switch that thing > on. > > ! Also, "nfsstat -m" on the client shows you/us what your mount > ! options are. > > It had to be destroyed, as effects got worse. > > What I figured is: it didn't issue any syscalls, and it didn't > act on kill -9. > Which means: most likely it found an infinite loop inside the > kernel, aka a never-returning syscall. > > ! The above suggests that there is still some activity on the client, but= the > ! info. is limited. > > Yes, it got ever slower. The NFS mount is for /usr/ports, and I did > fix some ports there. At some point a "make clean" would start to > take minutes to complete, and there I noticed something is wrong. > Finally it didn't even echo on the console (I had only one cpu > available, and then when something is stuck within the kernel, all > depends on preemption). > > ! If the client is still in this state, you can collect more info via: > ! # tcpdump -s 0 -w out.pcap host <nfs-server> > ! run for a little while. > > I had to destroy it. I tried to run dtrace to pinpoint exactly where > that thing does execute, but it didn't startup. At that point I didn't > consider it feasible to try further investigation. > These are temporary building guests, they get destroyed after > completion anyway. > > So, as apparently it was a single event, I might suggest we just > remember that nfscbd /can do this/ (under yet unclear circumstances) > and otherwise hope for the best. > > And probably I should get rid of that daemon altogether. I think I > read something about these delegations, and it looked suitable for the > usecase, but I didn't realize that it would need to be activated > on the server also. > (The usecase is, a snapshot + clone is created from the ports repo, > then switched to a desired tag/branch, and that filetree is then > used by a single guest, exclusively.) > > > Thanks for Your help! > > cheerio, > PMc
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy5jfjMYj8hEyXO4sWqjiV=GM4T2tXEaf69nN5J02%2BvU=Q>