From owner-freebsd-current@FreeBSD.ORG Sun Nov 18 19:34:00 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F7C216A41B for ; Sun, 18 Nov 2007 19:34:00 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.190]) by mx1.freebsd.org (Postfix) with ESMTP id A0C5A13C46A for ; Sun, 18 Nov 2007 19:33:59 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: by nf-out-0910.google.com with SMTP id b2so1215074nfb for ; Sun, 18 Nov 2007 11:33:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=UH2JP6MGT51iujtLp4jRHTLg6rUJJ2S2J1e+fYx+qQs=; b=MIoCoS+e3doNgZ140i/7/b9Tgu3tnhqHRxdoTHR9x5Go8ee1w5E31JCCSZ2mDgMN9JwEZxFRG+RrCmRuJSYXU9ezYvxmZ51EI/zXHAVgq+kd1EBPHgyKJY6TPWt8jiNvD3YTo+mQIQxL/fqk8lxcAsTXUHeUOjlS1CgexN2PvRU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=DitIB649+aRkYW62A32nWRzSstTJsBjSBt3I91cA7vYmQBdOZw7tFawnAynlBAj9siDM2S1mOuAN1LZ8EqkopO2A/2pXUrmVjXnvVee+9xiTkQ0NCdUoFpXdu8FOoat7PXPR917jEtvLnWZW2CxvHurRApMT8hi4GWmo3YpnqtY= Received: by 10.86.65.11 with SMTP id n11mr4262613fga.1195414419210; Sun, 18 Nov 2007 11:33:39 -0800 (PST) Received: by 10.86.100.19 with HTTP; Sun, 18 Nov 2007 11:33:39 -0800 (PST) Message-ID: <2a41acea0711181133n5f63f932m714a4a6b790937c0@mail.gmail.com> Date: Sun, 18 Nov 2007 11:33:39 -0800 From: "Jack Vogel" To: "Mike Andrews" In-Reply-To: <20071118030305.N99375@mindcrime.int.bit0.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071117003504.R31357@mindcrime.int.bit0.com> <20071117170537.F59492@mindcrime.int.bit0.com> <20071117182232.T59492@mindcrime.int.bit0.com> <473F9552.50402@bit0.com> <473FBD1A.8010207@bit0.com> <20071118030305.N99375@mindcrime.int.bit0.com> Cc: Denis Shaposhnikov , Kip Macy , Mike Silbersack , Andre Oppermann , freebsd-current@freebsd.org Subject: Re: bizarre em + TSO + MSS issue in RELENG_7 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Nov 2007 19:34:00 -0000 On Nov 18, 2007 12:58 AM, Mike Andrews wrote: > > On Sat, 17 Nov 2007, Mike Andrews wrote: > > > Kip Macy wrote: > >> On Nov 17, 2007 5:28 PM, Mike Andrews wrote: > >>> Kip Macy wrote: > >>>> On Nov 17, 2007 3:23 PM, Mike Andrews wrote: > >>>>> On Sat, 17 Nov 2007, Kip Macy wrote: > >>>>> > >>>>>> On Nov 17, 2007 2:33 PM, Mike Andrews wrote: > >>>>>>> On Sat, 17 Nov 2007, Kip Macy wrote: > >>>>>>> > >>>>>>>> On Nov 17, 2007 10:33 AM, Denis Shaposhnikov wrote: > >>>>>>>>> On Sat, 17 Nov 2007 00:42:54 -0500 (EST) > >>>>>>>>> Mike Andrews wrote: > >>>>>>>>> > >>>>>>>>>> Has anyone run into problems with MSS not being respected when > >>>>>>>>>> using > >>>>>>>>>> TSO, specifically on em cards? > >>>>>>>>> Yes, I wrote about this problem on the beginning of 2007, see > >>>>>>>>> > >>>>>>>>> http://tinyurl.com/3e5ak5 > >>>>>>>>> > >>>>>>>> if_em.c:3502 > >>>>>>>> /* > >>>>>>>> * Payload size per packet w/o any headers. > >>>>>>>> * Length of all headers up to payload. > >>>>>>>> */ > >>>>>>>> TXD->tcp_seg_setup.fields.mss = > >>>>>>>> htole16(mp->m_pkthdr.tso_segsz); > >>>>>>>> TXD->tcp_seg_setup.fields.hdr_len = hdr_len; > >>>>>>>> > >>>>>>>> > >>>>>>>> Please print out the value of tso_segsz here. It appears to be being > >>>>>>>> set correctly. The only thing I can think of is that t_maxopd is not > >>>>>>>> correct. As tso_segsz is correct here: > >>>>>>> It repeatedly prints 1368 during a 1 meg file transfer over a > >>>>>>> connection > >>>>>>> with a 1380 MSS. Any other printf's I can add? I'm working on a web > >>>>>>> page > >>>>>>> with tcpdump / firewall log output illustrating the issue... > >>>>>> Mike - > >>>>>> Denis' tcpdump output doesn't show oversized segments, something else > >>>>>> appears to be happening there. Can you post your tcpdump output > >>>>>> somewhere? > >>>>> URL sent off-list. > >>>> if (tso) { > >>>> m->m_pkthdr.csum_flags = CSUM_TSO; > >>>> m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen; > >>>> } > >>>> > >>>> > >>>> Please print the value of maxopd and optlen under "if (tso)" in > >>>> tcp_output. I think the calculated optlen may be too small. > >>> > >>> maxopt=1380 - optlen=12 = tso_segsz=1368 > >>> > >>> Weird though, after this reboot, I had to re-copy a 4 meg file 5 times > >>> to start getting the firewall to log any drops. Transfer rate was > >>> around 240KB/sec before the firewall started to drop, then it went down > >>> to about 64KB/sec during the 5th copy, and stayed there for subsequent > >>> copies. The actual packet size the firewall said it was dropping was > >>> varying all over the place still, yet the maxopt/optlen/tso_segsz values > >>> stayed constant. But it's interesting that it didn't start dropping > >>> immediately after the reboot -- though the transfer rate was still > >>> sub-optimal. > >> > >> Ok, next theory :D. You shouldn't be seeing "bad len" packets from > >> tcpdump. I'm wondering if that means you're sending down more than > >> 64k. Can you please print out the value of mp->m_pkthdr.len around the > >> same place that you printed out tso_segsz? 64k is the generally > >> accepted limit for TSO, I'm wondering if the card firmware does > >> something weird if you give it more. > > > > OK. In that last message, where I said it took 5 times to start reproducing > > the problem... this time it took until I actually toggled TSO back off and > > back on again, and then it started acting up again. I don't know what the > > actual trigger is... it's very weird. > > > > Initially, w/ TSO on and it wasn't dropping yet (but was still transferring > > slow)... > > > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306 > > (etc, always 8306) > > > > After toggling off/on which caused the drops to start (and the speed to drop > > even further): > > > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=7507 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=3053 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1677 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=3037 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=2264 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1656 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1902 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1888 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1640 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1871 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=2461 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1849 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=2092 > > > > and so on, with more seemingly random lengths... but none of them ever over > > 8306, much less 64K. > > > Got a few more data points here. > > I can reproduce this on an i386 kernel, so it isn't amd64 specific. > > I can reproduce this on an 82541EI nic, so it isn't 82573 specific. > > I can't reproduce this on a Marvell Yukon II (msk) nic; it works fine > whether TSO is on or off. > > I can't reproduce this on a bge nic because it doesn't support TSO :) > That's the only other gigabit nic I've got easy access to. > > I can reproduce this with just a Cisco 877W IOS-based router and no Cisco > PIX / ASA firewalls in the way, with the servers on the LAN interface with > "ip tcp adjust-mss 1340" on it, and the downloading client on the Cisco's > 802.11G interface. This time, the client is a Macbook Pro running > Leopard, and I'm running "tcpdump -i en1 -s 1500 -n -v length \> 1394" on > the Macbook (not the server this time) to find oversize packets, which is > actually handier because I can see how trashed they really get :) > > I can't reproduce this between two machines on the same subnet (though I > can reproduce throughput problems alone). I haven't tried lowering the > system MSS on one end yet (is there a sysctl to lower the MSS for outbound > connections without lowering the MTU as well?). If I could do this it > would greatly simplify testing for everyone as they wouldn't have to stick > an MSS-clamping router in the middle. It doesn't have to be Cisco. > > With this setup, copying to the Mac through the 877W from: > > msk-based server, TSO disabled: tcpdump reports no problems, file > transfers are fast > > msk-based server, TSO enabled: tcpdump reports no problems, file > transfers are fast > > em-based server, TSO disabled: tcpdump reports no problems, file > transfers are fast > > em-based server, TSO enabled: tcpdump reports numerous oversize packets of > varying sizes just as before, AND numerous packets with bad TCP checksums. > The checksum problems aren't limited to only the large packets though. > (That's probably what's causing the throughput problems.) Toggling rxcsum > and txcsum flags on the server made no difference. What I haven't tried > yet is hexdumping the packets to see what exactly is getting trashed. > > The problem still comes and goes; sometimes it'll work for a few minutes > after boot, sometimes not; it might be dependent on what other traffic's > going through the box. Hmmm, OK so the data is pointing to something in the em TSO or encap code. I will look into this tomorrow. So the necessary elements are systems on two subnets and em doing the transmitting with TSO? Jack