FreeBSD Mail Archives

Date:      Mon, 6 Jan 2025 17:37:49 +0100
From:      "Peter 'PMc' Much" <pmc@citylink.dinoex.sub.org>
To:        Rick Macklem <rick.macklem@gmail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: system stalled, no I/O but 100% CPU from nfs
Message-ID:  <Z3wG3fEYjeE9f4nF@disp.intra.daemon.contact>
In-Reply-To: <CAM5tNy5AzL9%2BWpjRV9N1Wzy94RpA2L93NqnYFjFvx38iAo1iyg@mail.gmail.com>
References:  <Z3tdPjxTE6GZmzwW@disp.intra.daemon.contact> <CAM5tNy5AzL9%2BWpjRV9N1Wzy94RpA2L93NqnYFjFvx38iAo1iyg@mail.gmail.com>

On Mon, Jan 06, 2025 at 05:53:38AM -0800, Rick Macklem wrote:
! On Sun, Jan 5, 2025 at 8:45=E2=80=AFPM Peter 'PMc' Much
! <pmc@citylink.dinoex.sub.org> wrote:

! >  This doesn't look good. It goes on for hours. What can be done about i=
t?
! > (13.4 client & server)
! >
! >
! > 44 processes:  4 running, 39 sleeping, 1 waiting
! > CPU:  0.4% user,  0.0% nice, 99.6% system,  0.0% interrupt,  0.0% idle
! > Mem: 21M Active, 198M Inact, 1190M Wired, 278M Buf, 3356M Free
! > ARC: 418M Total, 39M MFU, 327M MRU, 128K Anon, 7462K Header, 43M Other
! >      332M Compressed, 804M Uncompressed, 2.42:1 Ratio
! > Swap: 15G Total, 15G Free
! >
! >   PID USERNAME    THR PRI NICE   SIZE    RES STATE    TIME    WCPU COMM=
AND
! >   417 root          4  52    0    12M  2148K RUN     20:55  99.12% nfsc=
bd
! Do you have delegations enabled on your server
! (vfs.nfsd.issue_delegations not 0)?

Not knowingly:

# sysctl vfs.nfsd.issue_delegations
vfs.nfsd.issue_delegations: 0

! (If you do not, I have no idea why the server would be doing
! callbacks, which is what nfscbd
! handles.)

Me neither. ;)

The good news at this point is, it is a single event. At first I
thought the whole cluster got slow (it is always too slow ;) ), but
it was only this node - the others have no cpu consumption on
nfscbd.

The bad thing is, I cannot remember why I did switch that thing
on.

! Also, "nfsstat -m" on the client shows you/us what your mount
! options are.

It had to be destroyed, as effects got worse.

What I figured is: it didn't issue any syscalls, and it didn't
act on kill -9.
Which means: most likely it found an infinite loop inside the
kernel, aka a never-returning syscall.

! The above suggests that there is still some activity on the client, but t=
he
! info. is limited.

Yes, it got ever slower. The NFS mount is for /usr/ports, and I did
fix some ports there. At some point a "make clean" would start to
take minutes to complete, and there I noticed something is wrong.
Finally it didn't even echo on the console (I had only one cpu
available, and then when something is stuck within the kernel, all
depends on preemption).

! If the client is still in this state, you can collect more info via:
! # tcpdump -s 0 -w out.pcap host <nfs-server>
! run for a little while.

I had to destroy it. I tried to run dtrace to pinpoint exactly where
that thing does execute, but it didn't startup. At that point I didn't
consider it feasible to try further investigation.
These are temporary building guests, they get destroyed after
completion anyway.

So, as apparently it was a single event, I might suggest we just
remember that nfscbd /can do this/ (under yet unclear circumstances)
and otherwise hope for the best.

And probably I should get rid of that daemon altogether. I think I
read something about these delegations, and it looked suitable for the
usecase, but I didn't realize that it would need to be activated
on the server also.
(The usecase is, a snapshot + clone is created from the ports repo,
then switched to a desired tag/branch, and that filetree is then
used by a single guest, exclusively.)


Thanks for Your help!

cheerio,
PMc

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Z3wG3fEYjeE9f4nF>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation