Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 3 Mar 2024 21:44:20 -0500
From:      Garrett Wollman <wollman@bimajority.org>
To:        Rick Macklem <rick.macklem@gmail.com>
Cc:        Garrett Wollman <wollman@bimajority.org>, stable@freebsd.org
Subject:   Re: 13-stable NFS server hang
Message-ID:  <26085.13700.20875.520319@hergotha.csail.mit.edu>
In-Reply-To: <CAM5tNy4BM3fwccjF53ROP-7NojsWMM2fUY2_RA-4GMWfc6Sn4g@mail.gmail.com>
References:  <26078.50375.679881.64018@hergotha.csail.mit.edu> <CAM5tNy7ZZ2bVLmYnOCWzrS9wq6yudoV5JKG5ObRU0=wLt1ofZw@mail.gmail.com> <26083.64612.717082.366639@hergotha.csail.mit.edu> <CAM5tNy4BM3fwccjF53ROP-7NojsWMM2fUY2_RA-4GMWfc6Sn4g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
<<On Sun, 3 Mar 2024 13:17:30 -0800, Rick Macklem <rick.macklem@gmail.com> said:

>> [I wrote:]
>> (and so is dirty), this might take several seconds.  I've set
>> vfs.zfs.dmu_offset_next_sync=0 on the server that was hurting the most
>> and am watching to see if we have more freezes.
>> 
>> If this does the trick, then I can delay deploying a new kernel until
>> April, after my upcoming vacation.
> Interesting. Please let us know how it goes.

It's been about 22 hours since I flipped the sysctl and it hasn't
happened once yet.  Of course I don't know what the users are up to
right now, so I'll continue to monitor.

This is the script I ended up with to monitor:

nfsstat -dW | awk 'BEGIN { n = 0 } (n == 0) && ($12 == 0) { n = n + 1; system("date"); next } (n > 0) && ($12 == 0) { system("date; procstat -k 1184 1198; netstat -n -p tcp"); exit(0) } { n = 0 }'

This should (if I haven't botched it) trigger only if two consecutive
seconds show no forward progress.

-GAWollman




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?26085.13700.20875.520319>