From owner-freebsd-net@FreeBSD.ORG Mon May 11 08:37:09 2015 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0A8A0D90 for ; Mon, 11 May 2015 08:37:09 +0000 (UTC) Received: from mail-qk0-x233.google.com (mail-qk0-x233.google.com [IPv6:2607:f8b0:400d:c09::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B98351055 for ; Mon, 11 May 2015 08:37:08 +0000 (UTC) Received: by qkgx75 with SMTP id x75so82349338qkg.1 for ; Mon, 11 May 2015 01:37:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=cXBbG/9/ubNOa9GaLcj6swBcC2t+YOjsUu8eq0TB+W4=; b=Xqhg1ME/981l8otNbpFjlUN8tsHiUICCVsfdHAyj3IgMMsYspDsCmRqIAbm8JbdKwR KGtgPsT9sJ1NRztfsRzk02GBMp6uAtlPhLSWDYgWuAqW9i00zra5Pb6VhjjuwAZaM/+V V5dhca2Fq+c81qwCWwonuAzeY8J3+TG76PgNZkdSprZtu/WuF1wFDoHkGbtN9aOGAQVE CItIaQGq/1XjFmmXH4d+zsU+P9KTyhdcxcNVvtz26hhzHH6NTKDOtrF0CrWIU2GMiSad 0hNgtkavnEzWklByA/r0my9yGCA+IhkIjBxoRXhIe9tF9tqxFkLvc72N/dSTMC9Zk38G jnow== MIME-Version: 1.0 X-Received: by 10.140.34.215 with SMTP id l81mr12054510qgl.43.1431333427835; Mon, 11 May 2015 01:37:07 -0700 (PDT) Received: by 10.96.110.229 with HTTP; Mon, 11 May 2015 01:37:07 -0700 (PDT) In-Reply-To: <1107864458-32391@kerio.tuxis.nl> References: <1107864458-32391@kerio.tuxis.nl> Date: Mon, 11 May 2015 05:37:07 -0300 Message-ID: Subject: Re: [Bug 199174] em tx and rx hang From: Christopher Forgeron To: Mark Schouten Cc: FreeBSD Net Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 May 2015 08:37:09 -0000 I'd go a step further and say it's the _exact_ same problem. If you're using anything other than 4k clusters on a heavily loaded system, you'll probably have issues. It's not just the MTU - Case in point - I set my MTU to 4000, but since my iSCSI block size is 8k, I noticed that I still had plenty of 9k Jumbo Clusters in use. I still crash within 1/2 - 1 and 1/2 days of uptime. Ususally 'ix0 is flapping' or perhaps a kernel panic, or just dead ix's that won't transmit. I patched my ixgbe.c to only use 4k clusters, and now I can use a MTU of 9000 again without issue. I want to take the time to dig up more of my info on this to present to the list, but I've lost a lot of time to tracking this down.. still cleaning up as we speak. The worst about the Jumbo Clusters bug is that it's very specific to a particular load - My systems were fine, until I took on a new Exchange 2013 load that started popping all the FreeBSD SAN's - And these were load-tested production machines that had been in service for months without issues. In one of these threads, Garrett Wollman points out his ideas for a fix - I second the idea of a large ring buffer being created at boot for the network cards to use - and like him, I regretfully have no time to spare to help.. well, perhaps I can get some time for this.. but I can only help, not lead. Here's one of the last machines popping on me tonight before I could get to it with a patched kernel. This is a unusual error, usually the 'ix0 flapping' is the most common. May 11 04:04:06 aa_fast_b kernel: panic: solaris assert: 0 == dmu_buf_hold_array(os, object, off set, size, FALSE, FTAG, &numbufs, &dbp), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opens olaris/uts/common/fs/zfs/dmu.c, line: 830 May 11 04:04:06 aa_fast_b kernel: cpuid = 1 May 11 04:04:06 aa_fast_b kernel: KDB: stack backtrace: May 11 04:04:06 aa_fast_b kernel: #0 0xffffffff80962fd0 at kdb_backtrace+0x60 May 11 04:04:06 aa_fast_b kernel: #1 0xffffffff809280f5 at panic+0x155 May 11 04:04:06 aa_fast_b kernel: #2 0xffffffff81bbe1fd at assfail+0x1d May 11 04:04:06 aa_fast_b kernel: #3 0xffffffff81983388 at dmu_write+0x98 May 11 04:04:06 aa_fast_b kernel: #4 0xffffffff819c8ec5 at space_map_write+0x3c5 May 11 04:04:06 aa_fast_b kernel: #5 0xffffffff819afb30 at metaslab_sync+0x4e0 May 11 04:04:06 aa_fast_b kernel: #6 0xffffffff819cf69b at vdev_sync+0xcb May 11 04:04:06 aa_fast_b kernel: #7 0xffffffff819c0fdb at spa_sync+0x5db May 11 04:04:06 aa_fast_b kernel: #8 0xffffffff819ca3f6 at txg_sync_thread+0x3a6 May 11 04:04:06 aa_fast_b kernel: #9 0xffffffff808f8b3a at fork_exit+0x9a May 11 04:04:06 aa_fast_b kernel: #10 0xffffffff80d0ac8e at fork_trampoline+0xe May 11 04:04:06 aa_fast_b kernel: Uptime: 1d12h7m45s May 11 04:04:06 aa_fast_b kernel: (da1:iscsi7:0:0:0): Synchronize cache failed May 11 04:04:06 aa_fast_b kernel: (da3:iscsi5:0:0:0): Synchronize cache failed May 11 04:04:06 aa_fast_b kernel: (da4:iscsi11:0:0:0): Synchronize cache failed May 11 04:04:06 aa_fast_b kernel: (da7:iscsi4:0:0:0): Synchronize cache failed May 11 04:04:06 aa_fast_b kernel: (da8:iscsi6:0:0:0): Synchronize cache failed May 11 04:04:06 aa_fast_b kernel: (da9:iscsi10:0:0:0): Synchronize cache failed May 11 04:04:06 aa_fast_b kernel: (da10:iscsi1:0:0:0): Synchronize cache failed It's lots of fun.. it really is. I'm glad I have a lot of redundancy and backups. On Mon, May 11, 2015 at 5:13 AM, Mark Schouten wrote: > Please note that these issues look very much like the issues I had, before > I switched from an MTU of 9000 to 1500 ... > > > Met vriendelijke groeten, > > -- > Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ > Mark Schouten | Tuxis Internet Engineering > KvK: 61527076 | http://www.tuxis.nl/ > T: 0318 200208 | info@tuxis.nl > > > > Van: > Aan: > Verzonden: 8-5-2015 19:42 > Onderwerp: [Bug 199174] em tx and rx hang > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199174 > > --- Comment #15 from Sean Bruno --- > (In reply to david.keller from comment #14) > Nothing fancy here. > > Server runs "iperf -p 8000 -s" (8core amd box) > Client under test runs this forever: > > #!/bin/sh > > FILE=test.out > > if [ -f ${FILE} ]; then > rm $FILE; > fi > > while [ 1 ]; do > date; > iperf -p 8000 -c 192.168.100.1 -t 600 -P ${1} >> $FILE; > done > > -- > You are receiving this mail because: > You are the assignee for the bug. > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > >