From owner-freebsd-fs@FreeBSD.ORG Sat Oct 26 05:16:37 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1B2A468C; Sat, 26 Oct 2013 05:16:37 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-ve0-x233.google.com (mail-ve0-x233.google.com [IPv6:2607:f8b0:400c:c01::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id BD03B2898; Sat, 26 Oct 2013 05:16:36 +0000 (UTC) Received: by mail-ve0-f179.google.com with SMTP id cz12so3484737veb.10 for ; Fri, 25 Oct 2013 22:16:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=/hvMeaanInXv5QqAZtU4dvM/HJ/CJCgogt6t5S1Ky1M=; b=n3Q4ggUXcJ7Km/pYWUpiyb82I3Eyt2wU0pNEayNpueiKJ+PRl0i0av0haSyg1xKT0d 6kiauQ+aSlfEpsoviUAzaXLCFhEg9O3ZVLYlQ7ZSkR8leZyLX9bYv3MvyMHUaGlaQ8Om hLsChT4oP2jfqflgp3a4xkRNgHihUtx7hjJnWsAox/Nq8+h09fpn/oQa+1hUU0hC6NkB uO8uHbrEZVqwElEk8geey0IR37vZNgT+xAtBIY9cB2hKGv2tMEPQM9siCy8iZwEXfzmc RN30m9AjNFeCWa+JSNxFdyAQ+RCHB6kvEJdsuwyWJiJJSoW4DJBhV55+0fQi4Qiqy99G X5Hw== MIME-Version: 1.0 X-Received: by 10.58.146.71 with SMTP id ta7mr6806976veb.23.1382764595491; Fri, 25 Oct 2013 22:16:35 -0700 (PDT) Received: by 10.220.30.130 with HTTP; Fri, 25 Oct 2013 22:16:35 -0700 (PDT) Date: Sat, 26 Oct 2013 01:16:35 -0400 Message-ID: Subject: Or it could be ZFS memory starvation and 9k packets (was Re: istgt causes massive jumbo nmbclusters loss) From: Zaphod Beeblebrox To: FreeBSD Net , freebsd-fs Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Oct 2013 05:16:37 -0000 At first I thought this was entirely the interaction of istgt and 9k packets, but after some observation (and a few more hangs) I'm reasonably positive it's a form of resource starvation related to ZFS and 9k packets. To reliably trigger the hang, I need to do something that triggers a demand for 9k packets (like istgt traffic, but also bit torrent traffic --- as you see the MTU is 9014) and it must have been some time since the system booted. ZFS is fairly busy (with both NFS and SMB guests), so it generally takes quite a bit of the 8G of memory for itself. Now... below the netstat -m shows 1399 9k bufs with 376 available. When the network gets busy, I've seen 4k or even 5k bufs in total... never near the 77k max. After some time of lesser activity, the number of 9k buffers returns to this level. When the problem occurs, the number of denied buffers will shoot up at the rate of several hundred or even several thousand per second, but the system will not be "out" of memory. Top will show 800 meg often in the free column when this happens. While it's happening, when I'm logged into the console, none of these stats seem out of place, save the number of denied 9k buffer allocations and the "cache" of 9k buffers will be less than 10 (but I've never seen it at 0). On Tue, Oct 22, 2013 at 3:42 PM, Zaphod Beeblebrox wrote: > I have a server > > FreeBSD virtual.accountingreality.com 9.2-STABLE FreeBSD 9.2-STABLE #13 > r256549M: Tue Oct 15 16:29:48 EDT 2013 > root@virtual.accountingreality.com:/usr/obj/usr/src/sys/VRA amd64 > > That has an em0 with jumbo packets enabled: > > em0: flags=8843 metric 0 mtu 9014 > > It has (among other things): ZFS, NFS, iSCSI (via istgt) and Samba. > > Every day or two, it looses it's ability to talk to the network. ifconfig > down/up on em0 gives the message about not being able to allocate the > receive buffers... > > With everything running, but with specifically iSCSI not used, everything > seems good. When I start hitting istgt, I see the denied stat for 9k mbufs > rise very rapidly (this amount only took a few seconds): > > [1:47:347]root@virtual:/usr/local/etc/iet> netstat -m > 1313/877/2190 mbufs in use (current/cache/total) > 20/584/604/523514 mbuf clusters in use (current/cache/total/max) > 20/364 mbuf+clusters out of packet secondary zone in use (current/cache) > 239/359/598/261756 4k (page size) jumbo clusters in use > (current/cache/total/max) > 1023/376/1399/77557 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/43626 16k jumbo clusters in use (current/cache/total/max) > 10531K/6207K/16738K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > 0/50199/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > ... the denied number rises... and somewhere in the millions or more the > machine stops --- but even with the large number of denied 9k clusters, the > "9k jumbo clusters in use" line will always indicate some available. > > ... so is this a tuning or a bug issue? I've tried ietd --- basically it > doesn't want to work with a zfs zvol, it seems (refuses to use it). > >