Date: Sat, 26 Oct 2013 01:16:35 -0400 From: Zaphod Beeblebrox <zbeeble@gmail.com> To: FreeBSD Net <freebsd-net@freebsd.org>, freebsd-fs <freebsd-fs@freebsd.org> Subject: Or it could be ZFS memory starvation and 9k packets (was Re: istgt causes massive jumbo nmbclusters loss) Message-ID: <CACpH0MfEy50Y5QOZCdn2co_JmY_QPfVRxYwK-73W0WYsHB-Fqw@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
At first I thought this was entirely the interaction of istgt and 9k packets, but after some observation (and a few more hangs) I'm reasonably positive it's a form of resource starvation related to ZFS and 9k packets. To reliably trigger the hang, I need to do something that triggers a demand for 9k packets (like istgt traffic, but also bit torrent traffic --- as you see the MTU is 9014) and it must have been some time since the system booted. ZFS is fairly busy (with both NFS and SMB guests), so it generally takes quite a bit of the 8G of memory for itself. Now... below the netstat -m shows 1399 9k bufs with 376 available. When the network gets busy, I've seen 4k or even 5k bufs in total... never near the 77k max. After some time of lesser activity, the number of 9k buffers returns to this level. When the problem occurs, the number of denied buffers will shoot up at the rate of several hundred or even several thousand per second, but the system will not be "out" of memory. Top will show 800 meg often in the free column when this happens. While it's happening, when I'm logged into the console, none of these stats seem out of place, save the number of denied 9k buffer allocations and the "cache" of 9k buffers will be less than 10 (but I've never seen it at 0). On Tue, Oct 22, 2013 at 3:42 PM, Zaphod Beeblebrox <zbeeble@gmail.com>wrote: > I have a server > > FreeBSD virtual.accountingreality.com 9.2-STABLE FreeBSD 9.2-STABLE #13 > r256549M: Tue Oct 15 16:29:48 EDT 2013 > root@virtual.accountingreality.com:/usr/obj/usr/src/sys/VRA amd64 > > That has an em0 with jumbo packets enabled: > > em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9014 > > It has (among other things): ZFS, NFS, iSCSI (via istgt) and Samba. > > Every day or two, it looses it's ability to talk to the network. ifconfig > down/up on em0 gives the message about not being able to allocate the > receive buffers... > > With everything running, but with specifically iSCSI not used, everything > seems good. When I start hitting istgt, I see the denied stat for 9k mbufs > rise very rapidly (this amount only took a few seconds): > > [1:47:347]root@virtual:/usr/local/etc/iet> netstat -m > 1313/877/2190 mbufs in use (current/cache/total) > 20/584/604/523514 mbuf clusters in use (current/cache/total/max) > 20/364 mbuf+clusters out of packet secondary zone in use (current/cache) > 239/359/598/261756 4k (page size) jumbo clusters in use > (current/cache/total/max) > 1023/376/1399/77557 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/43626 16k jumbo clusters in use (current/cache/total/max) > 10531K/6207K/16738K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > 0/50199/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > ... the denied number rises... and somewhere in the millions or more the > machine stops --- but even with the large number of denied 9k clusters, the > "9k jumbo clusters in use" line will always indicate some available. > > ... so is this a tuning or a bug issue? I've tried ietd --- basically it > doesn't want to work with a zfs zvol, it seems (refuses to use it). > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACpH0MfEy50Y5QOZCdn2co_JmY_QPfVRxYwK-73W0WYsHB-Fqw>