From owner-freebsd-current@FreeBSD.ORG Sun Nov 18 06:00:26 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CE20916A420 for ; Sun, 18 Nov 2007 06:00:26 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from fk-out-0910.google.com (fk-out-0910.google.com [209.85.128.190]) by mx1.freebsd.org (Postfix) with ESMTP id 1B6D613C457 for ; Sun, 18 Nov 2007 06:00:25 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: by fk-out-0910.google.com with SMTP id b27so1804911fka for ; Sat, 17 Nov 2007 22:00:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=mgkHYYJFd4LUKPMpp/dPOutpLDSF37UrwZ9uKVpO3S4=; b=mJYStXLsoFYaBE8FSz2P+wUX/2Ebj3HlU0RtpEKlstcob+PqeC6HkA9J/Xs7qxQgIzDMVncekmEcu+BoENgQJQxKxeGWkMsHZpiWOK6Ht7OlxaFxlGFmPpEU++y4+hq/CwodB28XtYiiD911JbNrGIViyqUjhZ15/rgJ8OlCB0E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=PWA1Mb/RsCeaPZUp5BAmmfuAuJbUB22CW/nb58LvTuvxvmimdPTV6hy58Yd7rdZhI9mYkdWiWk7XOczkCMnMrJMFpUm0nqMwcFGU8zY30CcK3x8txSs5nP1PpbHQ64HTszV0LbDTNzf6hxteNqc2Ru2JisrFeN68OBeB+kcmgek= Received: by 10.86.51.2 with SMTP id y2mr3604621fgy.1195365616343; Sat, 17 Nov 2007 22:00:16 -0800 (PST) Received: by 10.86.100.19 with HTTP; Sat, 17 Nov 2007 22:00:16 -0800 (PST) Message-ID: <2a41acea0711172200n160ff8f2rb2d0b81dfab236ea@mail.gmail.com> Date: Sat, 17 Nov 2007 22:00:16 -0800 From: "Jack Vogel" To: pyunyh@gmail.com In-Reply-To: <20071118054409.GA1044@cdnetworks.co.kr> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071117003504.R31357@mindcrime.int.bit0.com> <20071117170537.F59492@mindcrime.int.bit0.com> <20071117182232.T59492@mindcrime.int.bit0.com> <473F9552.50402@bit0.com> <473FBD1A.8010207@bit0.com> <20071118054409.GA1044@cdnetworks.co.kr> Cc: Mike Silbersack , Andre Oppermann , Kip Macy , Denis Shaposhnikov , freebsd-current@freebsd.org, Mike Andrews Subject: Re: bizarre em + TSO + MSS issue in RELENG_7 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Nov 2007 06:00:26 -0000 On Nov 17, 2007 9:44 PM, Pyun YongHyeon wrote: > > On Sat, Nov 17, 2007 at 11:18:34PM -0500, Mike Andrews wrote: > > Kip Macy wrote: > > >On Nov 17, 2007 5:28 PM, Mike Andrews wrote: > > >>Kip Macy wrote: > > >>>On Nov 17, 2007 3:23 PM, Mike Andrews wrote: > > >>>>On Sat, 17 Nov 2007, Kip Macy wrote: > > >>>> > > >>>>>On Nov 17, 2007 2:33 PM, Mike Andrews wrote: > > >>>>>>On Sat, 17 Nov 2007, Kip Macy wrote: > > >>>>>> > > >>>>>>>On Nov 17, 2007 10:33 AM, Denis Shaposhnikov wrote: > > >>>>>>>>On Sat, 17 Nov 2007 00:42:54 -0500 (EST) > > >>>>>>>>Mike Andrews wrote: > > >>>>>>>> > > >>>>>>>>>Has anyone run into problems with MSS not being respected when > > >>>>>>>>>using > > >>>>>>>>>TSO, specifically on em cards? > > >>>>>>>>Yes, I wrote about this problem on the beginning of 2007, see > > >>>>>>>> > > >>>>>>>> http://tinyurl.com/3e5ak5 > > >>>>>>>> > > >>>>>>>if_em.c:3502 > > >>>>>>> /* > > >>>>>>> * Payload size per packet w/o any headers. > > >>>>>>> * Length of all headers up to payload. > > >>>>>>> */ > > >>>>>>> TXD->tcp_seg_setup.fields.mss = > > >>>>>>> htole16(mp->m_pkthdr.tso_segsz); > > >>>>>>> TXD->tcp_seg_setup.fields.hdr_len = hdr_len; > > >>>>>>> > > >>>>>>> > > >>>>>>>Please print out the value of tso_segsz here. It appears to be being > > >>>>>>>set correctly. The only thing I can think of is that t_maxopd is not > > >>>>>>>correct. As tso_segsz is correct here: > > >>>>>>It repeatedly prints 1368 during a 1 meg file transfer over a > > >>>>>>connection > > >>>>>>with a 1380 MSS. Any other printf's I can add? I'm working on a web > > >>>>>>page > > >>>>>>with tcpdump / firewall log output illustrating the issue... > > >>>>>Mike - > > >>>>>Denis' tcpdump output doesn't show oversized segments, something else > > >>>>>appears to be happening there. Can you post your tcpdump output > > >>>>>somewhere? > > >>>>URL sent off-list. > > >>> if (tso) { > > >>> m->m_pkthdr.csum_flags = CSUM_TSO; > > >>> m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen; > > >>> } > > >>> > > >>> > > >>>Please print the value of maxopd and optlen under "if (tso)" in > > >>>tcp_output. I think the calculated optlen may be too small. > > >> > > >>maxopt=1380 - optlen=12 = tso_segsz=1368 > > >> > > >>Weird though, after this reboot, I had to re-copy a 4 meg file 5 times > > >>to start getting the firewall to log any drops. Transfer rate was > > >>around 240KB/sec before the firewall started to drop, then it went down > > >>to about 64KB/sec during the 5th copy, and stayed there for subsequent > > >>copies. The actual packet size the firewall said it was dropping was > > >>varying all over the place still, yet the maxopt/optlen/tso_segsz values > > >>stayed constant. But it's interesting that it didn't start dropping > > >>immediately after the reboot -- though the transfer rate was still > > >>sub-optimal. > > > > > >Ok, next theory :D. You shouldn't be seeing "bad len" packets from > > >tcpdump. I'm wondering if that means you're sending down more than > > >64k. Can you please print out the value of mp->m_pkthdr.len around the > > >same place that you printed out tso_segsz? 64k is the generally > > >accepted limit for TSO, I'm wondering if the card firmware does > > >something weird if you give it more. > > > > OK. In that last message, where I said it took 5 times to start > > reproducing the problem... this time it took until I actually toggled > > TSO back off and back on again, and then it started acting up again. I > > don't know what the actual trigger is... it's very weird. > > > > Initially, w/ TSO on and it wasn't dropping yet (but was still > > transferring slow)... > > > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306 > > (etc, always 8306) > > > > After toggling off/on which caused the drops to start (and the speed to > > drop even further): > > > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=7507 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=3053 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1677 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=3037 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=2264 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1656 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1902 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1888 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1640 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1871 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=2461 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1849 > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=2092 > > > > and so on, with more seemingly random lengths... but none of them ever > > over 8306, much less 64K. > > It seems that em_tso_setup() doesn't clear txd_upper/txd_lower in > failure path so that unintialized value could be used in subsequent > Tx descriptor setup. > How about clearing those variable?(Patch attached) > > It seems that em(4) uses EM_TSO_SIZE(64K) to create DMA tag. A packet > can have 64K payload under TSO so its the mximum size of the mbuf > chain would be 64K + sizeof(link layer). So I guess the EM_TSO_SIZE > should be increased to hold sizeof(link layer). > It had been a long time since I looked into em(4) so I'm not sure. Huh? They are set to 0 on entry, and not touched again before you go into the setup routine, your change has no effect. Jack