From owner-freebsd-net@FreeBSD.ORG Sat Mar 22 11:56:00 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E4226687; Sat, 22 Mar 2014 11:56:00 +0000 (UTC) Received: from mail-qa0-x233.google.com (mail-qa0-x233.google.com [IPv6:2607:f8b0:400d:c00::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 90A50693; Sat, 22 Mar 2014 11:56:00 +0000 (UTC) Received: by mail-qa0-f51.google.com with SMTP id j7so3514456qaq.24 for ; Sat, 22 Mar 2014 04:55:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=87tE4Xs5tv4+FkHisEDQw7UxbDRLOBYAQGIzHKtajxk=; b=qjojPt/dexou9yySJE2fOey/riVN9IXKrTfsxt3PXVu/QzOKBQ66Jz6cErl7cxfJ+M K7Z98yZ1bxdfTd0HpNz2sOQv7B+PRMFiflnvnLr8TdM2n0B+miSUcOxdlg7ESqp+1Zr0 LfimjaFS7wzGiJsiJLc3jcyOh57JpNumaeoTlkNw55MRhtClqtJ5DlElq63XA6WNn4uv OZq3Y1BMHuTJrxide83Sh5FCJkaaq/uASrG2zUhyDktzrgbIJPTXILWP5YGhgMMYwvQl mEtWzN79RddhzC+aASYW4WF8IwXNAZ4wN5evEDgoFaXorL8xx0TxkjMMTEEY6Wp3EgtY 1BHg== MIME-Version: 1.0 X-Received: by 10.140.107.10 with SMTP id g10mr60579003qgf.63.1395489359873; Sat, 22 Mar 2014 04:55:59 -0700 (PDT) Received: by 10.96.79.97 with HTTP; Sat, 22 Mar 2014 04:55:59 -0700 (PDT) In-Reply-To: References: <1613242078.1214156.1395455976156.JavaMail.root@uoguelph.ca> Date: Sat, 22 Mar 2014 08:55:59 -0300 Message-ID: Subject: Re: 9.2 ixgbe tx queue hang From: Christopher Forgeron To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: FreeBSD Net , Garrett Wollman , Jack Vogel , Markus Gebert X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Mar 2014 11:56:01 -0000 Status Update: Hopeful, but not done. So the 9.2-STABLE ixgbe with Rick's TSO patch has been running all night while iometer hammered away at it. It's got over 8 hours of test time on it. It's still running, the CPU queues are not clogged, and everything is functional. However, my ping_logger.py did record 23 incidents of "sendto: File too large" over the 8 hour run. That's really nothing compared to what I usually run into - Normally I'd have 23 incidents within a 5 minute span. During those 23 incidents, (ping_logger.py triggers a cpuset ping) I see it's having the same symptoms of clogging on a few CPU cores. That clogging does go away, a symptom that Markus says he sometimes experiences. So I would say the TSO patch makes things remarkably better, but something else is still up. Unfortunately, with the TSO patch in place it's now harder to trigger the error, so testing will be more difficult. Could someone confirm for me where the jumbo clusters denied/mbuf denied counters come from for netstat? Would it be from a m_defrag call that fails? I feel the netstat -m stats on boot are part of this issue - I was able to greatly reduce them during one of my test iterations. I'm going to see if I can repeat that with the TSO patch. Getting this working on the 10-STABLE ixgbe: Mike's contributed some edits (slightly different thread) I want to try on that driver. At the same time, a diff of 9.2 <-> 10.0 may give hints, as the 10.0 driver with TSO patch has issues quickly, and frequently... it's doing something that aggravates this condition. Thanks for all the help, please keep the suggestions or tidbits of info coming.