From owner-freebsd-current@FreeBSD.ORG  Sun Nov 18 19:34:00 2007
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4F7C216A41B
	for <freebsd-current@freebsd.org>; Sun, 18 Nov 2007 19:34:00 +0000 (UTC)
	(envelope-from jfvogel@gmail.com)
Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.190])
	by mx1.freebsd.org (Postfix) with ESMTP id A0C5A13C46A
	for <freebsd-current@freebsd.org>; Sun, 18 Nov 2007 19:33:59 +0000 (UTC)
	(envelope-from jfvogel@gmail.com)
Received: by nf-out-0910.google.com with SMTP id b2so1215074nfb
	for <freebsd-current@freebsd.org>; Sun, 18 Nov 2007 11:33:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=UH2JP6MGT51iujtLp4jRHTLg6rUJJ2S2J1e+fYx+qQs=;
	b=MIoCoS+e3doNgZ140i/7/b9Tgu3tnhqHRxdoTHR9x5Go8ee1w5E31JCCSZ2mDgMN9JwEZxFRG+RrCmRuJSYXU9ezYvxmZ51EI/zXHAVgq+kd1EBPHgyKJY6TPWt8jiNvD3YTo+mQIQxL/fqk8lxcAsTXUHeUOjlS1CgexN2PvRU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=DitIB649+aRkYW62A32nWRzSstTJsBjSBt3I91cA7vYmQBdOZw7tFawnAynlBAj9siDM2S1mOuAN1LZ8EqkopO2A/2pXUrmVjXnvVee+9xiTkQ0NCdUoFpXdu8FOoat7PXPR917jEtvLnWZW2CxvHurRApMT8hi4GWmo3YpnqtY=
Received: by 10.86.65.11 with SMTP id n11mr4262613fga.1195414419210;
	Sun, 18 Nov 2007 11:33:39 -0800 (PST)
Received: by 10.86.100.19 with HTTP; Sun, 18 Nov 2007 11:33:39 -0800 (PST)
Message-ID: <2a41acea0711181133n5f63f932m714a4a6b790937c0@mail.gmail.com>
Date: Sun, 18 Nov 2007 11:33:39 -0800
From: "Jack Vogel" <jfvogel@gmail.com>
To: "Mike Andrews" <mandrews@bit0.com>
In-Reply-To: <20071118030305.N99375@mindcrime.int.bit0.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20071117003504.R31357@mindcrime.int.bit0.com>
	<b1fa29170711171308x62a6371dnbb939748c5c59ae2@mail.gmail.com>
	<20071117170537.F59492@mindcrime.int.bit0.com>
	<b1fa29170711171519r65473426s1b9f3d9666ff6a92@mail.gmail.com>
	<20071117182232.T59492@mindcrime.int.bit0.com>
	<b1fa29170711171619x24233a3cw4361e0f3ca395e4c@mail.gmail.com>
	<473F9552.50402@bit0.com>
	<b1fa29170711171804x36e4ae51ie03d01e4bc0220ac@mail.gmail.com>
	<473FBD1A.8010207@bit0.com>
	<20071118030305.N99375@mindcrime.int.bit0.com>
Cc: Denis Shaposhnikov <dsh@vlink.ru>, Kip Macy <kip.macy@gmail.com>,
	Mike Silbersack <silby@freebsd.org>,
	Andre Oppermann <andre@freebsd.org>, freebsd-current@freebsd.org
Subject: Re: bizarre em + TSO + MSS issue in RELENG_7
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Nov 2007 19:34:00 -0000

On Nov 18, 2007 12:58 AM, Mike Andrews <mandrews@bit0.com> wrote:
>
> On Sat, 17 Nov 2007, Mike Andrews wrote:
>
> > Kip Macy wrote:
> >> On Nov 17, 2007 5:28 PM, Mike Andrews <mandrews@bit0.com> wrote:
> >>> Kip Macy wrote:
> >>>> On Nov 17, 2007 3:23 PM, Mike Andrews <mandrews@bit0.com> wrote:
> >>>>> On Sat, 17 Nov 2007, Kip Macy wrote:
> >>>>>
> >>>>>> On Nov 17, 2007 2:33 PM, Mike Andrews <mandrews@bit0.com> wrote:
> >>>>>>> On Sat, 17 Nov 2007, Kip Macy wrote:
> >>>>>>>
> >>>>>>>> On Nov 17, 2007 10:33 AM, Denis Shaposhnikov <dsh@vlink.ru> wrote:
> >>>>>>>>> On Sat, 17 Nov 2007 00:42:54 -0500 (EST)
> >>>>>>>>> Mike Andrews <mandrews@bit0.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Has anyone run into problems with MSS not being respected when
> >>>>>>>>>> using
> >>>>>>>>>> TSO, specifically on em cards?
> >>>>>>>>> Yes, I wrote about this problem on the beginning of 2007, see
> >>>>>>>>>
> >>>>>>>>>     http://tinyurl.com/3e5ak5
> >>>>>>>>>
> >>>>>>>> if_em.c:3502
> >>>>>>>>        /*
> >>>>>>>>         * Payload size per packet w/o any headers.
> >>>>>>>>         * Length of all headers up to payload.
> >>>>>>>>         */
> >>>>>>>>        TXD->tcp_seg_setup.fields.mss =
> >>>>>>>> htole16(mp->m_pkthdr.tso_segsz);
> >>>>>>>>        TXD->tcp_seg_setup.fields.hdr_len = hdr_len;
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Please print out the value of tso_segsz here. It appears to be being
> >>>>>>>> set correctly. The only thing I can think of is that t_maxopd is not
> >>>>>>>> correct. As tso_segsz is correct here:
> >>>>>>> It repeatedly prints 1368 during a 1 meg file transfer over a
> >>>>>>> connection
> >>>>>>> with a 1380 MSS.  Any other printf's I can add?  I'm working on a web
> >>>>>>> page
> >>>>>>> with tcpdump / firewall log output illustrating the issue...
> >>>>>> Mike -
> >>>>>> Denis' tcpdump output doesn't show oversized segments, something else
> >>>>>> appears to be happening there. Can you post your tcpdump output
> >>>>>> somewhere?
> >>>>> URL sent off-list.
> >>>>        if (tso) {
> >>>>                m->m_pkthdr.csum_flags = CSUM_TSO;
> >>>>                m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen;
> >>>>        }
> >>>>
> >>>>
> >>>> Please print the value of maxopd and optlen under "if (tso)" in
> >>>> tcp_output. I think the calculated optlen may be too small.
> >>>
> >>> maxopt=1380 - optlen=12 = tso_segsz=1368
> >>>
> >>> Weird though, after this reboot, I had to re-copy a 4 meg file 5 times
> >>> to start getting the firewall to log any drops.  Transfer rate was
> >>> around 240KB/sec before the firewall started to drop, then it went down
> >>> to about 64KB/sec during the 5th copy, and stayed there for subsequent
> >>> copies.  The actual packet size the firewall said it was dropping was
> >>> varying all over the place still, yet the maxopt/optlen/tso_segsz values
> >>> stayed constant.  But it's interesting that it didn't start dropping
> >>> immediately after the reboot -- though the transfer rate was still
> >>> sub-optimal.
> >>
> >> Ok, next theory :D. You shouldn't be seeing "bad len" packets from
> >> tcpdump. I'm wondering if that means you're sending down more than
> >> 64k. Can you please print out the value of mp->m_pkthdr.len around the
> >> same place that you printed out tso_segsz? 64k is the generally
> >> accepted limit for TSO, I'm wondering if the card firmware does
> >> something weird if you give it more.
> >
> > OK.  In that last message, where I said it took 5 times to start reproducing
> > the problem... this time it took until I actually toggled TSO back off and
> > back on again, and then it started acting up again.  I don't know what the
> > actual trigger is... it's very weird.
> >
> > Initially, w/ TSO on and it wasn't dropping yet (but was still transferring
> > slow)...
> >
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=8306
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=8306
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=8306
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=8306
> > (etc, always 8306)
> >
> > After toggling off/on which caused the drops to start (and the speed to drop
> > even further):
> >
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=7507
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=3053
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1677
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=3037
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=2264
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1656
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1902
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1888
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1640
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1871
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=2461
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1849
> > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=2092
> >
> > and so on, with more seemingly random lengths... but none of them ever over
> > 8306, much less 64K.
>
>
> Got a few more data points here.
>
> I can reproduce this on an i386 kernel, so it isn't amd64 specific.
>
> I can reproduce this on an 82541EI nic, so it isn't 82573 specific.
>
> I can't reproduce this on a Marvell Yukon II (msk) nic; it works fine
> whether TSO is on or off.
>
> I can't reproduce this on a bge nic because it doesn't support TSO :)
> That's the only other gigabit nic I've got easy access to.
>
> I can reproduce this with just a Cisco 877W IOS-based router and no Cisco
> PIX / ASA firewalls in the way, with the servers on the LAN interface with
> "ip tcp adjust-mss 1340" on it, and the downloading client on the Cisco's
> 802.11G interface.  This time, the client is a Macbook Pro running
> Leopard, and I'm running "tcpdump -i en1 -s 1500 -n -v length \> 1394" on
> the Macbook (not the server this time) to find oversize packets, which is
> actually handier because I can see how trashed they really get :)
>
> I can't reproduce this between two machines on the same subnet (though I
> can reproduce throughput problems alone).  I haven't tried lowering the
> system MSS on one end yet (is there a sysctl to lower the MSS for outbound
> connections without lowering the MTU as well?).  If I could do this it
> would greatly simplify testing for everyone as they wouldn't have to stick
> an MSS-clamping router in the middle.  It doesn't have to be Cisco.
>
> With this setup, copying to the Mac through the 877W from:
>
> msk-based server, TSO disabled: tcpdump reports no problems, file
> transfers are fast
>
> msk-based server, TSO enabled: tcpdump reports no problems, file
> transfers are fast
>
> em-based server, TSO disabled: tcpdump reports no problems, file
> transfers are fast
>
> em-based server, TSO enabled: tcpdump reports numerous oversize packets of
> varying sizes just as before, AND numerous packets with bad TCP checksums.
> The checksum problems aren't limited to only the large packets though.
> (That's probably what's causing the throughput problems.)  Toggling rxcsum
> and txcsum flags on the server made no difference.  What I haven't tried
> yet is hexdumping the packets to see what exactly is getting trashed.
>
> The problem still comes and goes; sometimes it'll work for a few minutes
> after boot, sometimes not; it might be dependent on what other traffic's
> going through the box.

Hmmm, OK so the data is pointing to something in the em TSO  or encap
code. I will look into this tomorrow. So the necessary elements are systems
on two subnets and em doing the transmitting with TSO?

Jack