From owner-freebsd-net@FreeBSD.ORG Fri Mar 21 11:47:36 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C1367C9A for ; Fri, 21 Mar 2014 11:47:36 +0000 (UTC) Received: from mail-qc0-x230.google.com (mail-qc0-x230.google.com [IPv6:2607:f8b0:400d:c01::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7DBFADB1 for ; Fri, 21 Mar 2014 11:47:36 +0000 (UTC) Received: by mail-qc0-f176.google.com with SMTP id m20so2571688qcx.35 for ; Fri, 21 Mar 2014 04:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=GjB1ootpns5FqHPLd+EuzF8oDBtbnO+5tvNOK0xtAXc=; b=TYlnfqJLUq42EXp0m8i572FO0V400HE/tcZ/ih8AVvE2PFSCp9OkGL+OhqwlVRJDOr A9Dw0OP76jv8pZcqfheuyw24ydM8eVCMrGw+2B4ALyjRFkFJ6BaEAaEmcXncUQ29B5QP nPclY6+x9zF2/B2QhAGtMvZrFBr2kkDebB5oaEUuftAOU2aivSqsDQAXX9ATog6O7zZp 4xayL6BQ52vIR8KLUwFhIOQhigf/rRYPwhPDtukWy5aczR1E8Y9KHkdvfFiqAmnPmpq0 llCAXN+XNjvgLlXjmZqAvdttDYd70UtUXkucGoCo6A+NeCfQB2Ojxdl2faMdgdafAdVP DbvA== MIME-Version: 1.0 X-Received: by 10.140.29.68 with SMTP id a62mr39684401qga.57.1395402455764; Fri, 21 Mar 2014 04:47:35 -0700 (PDT) Received: by 10.96.79.97 with HTTP; Fri, 21 Mar 2014 04:47:35 -0700 (PDT) In-Reply-To: References: <1543350122.637684.1395368002237.JavaMail.root@uoguelph.ca> Date: Fri, 21 Mar 2014 08:47:35 -0300 Message-ID: Subject: Re: 9.2 ixgbe tx queue hang From: Christopher Forgeron To: Rick Macklem , Jack Vogel , Markus Gebert Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Mar 2014 11:47:36 -0000 Hello all, I ran Jack's ixgbe MJUM9BYTES removal patch, and let iometer hammer away at the NFS store overnight - But the problem is still there. >From what I read, I think the MJUM9BYTES removal is probably good cleanup (as long as it doesn't trade performance on a lightly memory loaded system for performance on a heavily memory loaded system). If I can stabilize my system, I may attempt those benchmarks. I think the fix will be obvious at boot for me - My 9.2 has a 'clean' netstat - Until I can boot and see a 'netstat -m' that looks similar to that, I'm going to have this problem. Markus: Do your systems show denied mbufs at boot like mine does? Turning off TSO works for me, but at a performance hit. I'll compile Rick's patch (and extra debugging) this morning and let you know soon. On Thu, Mar 20, 2014 at 11:47 PM, Christopher Forgeron wrote: > BTW - I think this will end up being a TSO issue, not the patch that Jack > applied. > > When I boot Jack's patch (MJUM9BYTES removal) this is what netstat -m > shows: > > 21489/2886/24375 mbufs in use (current/cache/total) > 4080/626/4706/6127254 mbuf clusters in use (current/cache/total/max) > 4080/587 mbuf+clusters out of packet secondary zone in use (current/cache) > 16384/50/16434/3063627 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/907741 9k jumbo clusters in use (current/cache/total/max) > > 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max) > 79068K/2173K/81241K bytes allocated to network (current/cache/total) > 18831/545/4542 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > 15626/0/0 requests for jumbo clusters denied (4k/9k/16k) > > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > > Here is an un-patched boot: > > 21550/7400/28950 mbufs in use (current/cache/total) > 4080/3760/7840/6127254 mbuf clusters in use (current/cache/total/max) > 4080/2769 mbuf+clusters out of packet secondary zone in use (current/cache) > 0/42/42/3063627 4k (page size) jumbo clusters in use > (current/cache/total/max) > 16439/129/16568/907741 9k jumbo clusters in use (current/cache/total/max) > > 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max) > 161498K/10699K/172197K bytes allocated to network (current/cache/total) > 18345/155/4099 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > 3/3723/0 requests for jumbo clusters denied (4k/9k/16k) > > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > > > > See how removing the MJUM9BYTES is just pushing the problem from the 9k > jumbo cluster into the 4k jumbo cluster? > > Compare this to my FreeBSD 9.2 STABLE machine from ~ Dec 2013 : Exact same > hardware, revisions, zpool size, etc. Just it's running an older FreeBSD. > > # uname -a > FreeBSD SAN1.XXXXX 9.2-STABLE FreeBSD 9.2-STABLE #0: Wed Dec 25 15:12:14 > AST 2013 aatech@FreeBSD-Update Server:/usr/obj/usr/src/sys/GENERIC > amd64 > > root@SAN1:/san1 # uptime > 7:44AM up 58 days, 38 mins, 4 users, load averages: 0.42, 0.80, 0.91 > > root@SAN1:/san1 # netstat -m > 37930/15755/53685 mbufs in use (current/cache/total) > 4080/10996/15076/524288 mbuf clusters in use (current/cache/total/max) > 4080/5775 mbuf+clusters out of packet secondary zone in use (current/cache) > 0/692/692/262144 4k (page size) jumbo clusters in use > (current/cache/total/max) > 32773/4257/37030/96000 9k jumbo clusters in use (current/cache/total/max) > > 0/0/0/508538 16k jumbo clusters in use (current/cache/total/max) > 312599K/67011K/379611K bytes allocated to network (current/cache/total) > > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > Lastly, please note this link: > > http://lists.freebsd.org/pipermail/freebsd-net/2012-October/033660.html > > It's so old that I assume the TSO leak that he speaks of has been patched, > but perhaps not. More things to look into tomorrow. > >