From owner-freebsd-current@FreeBSD.ORG Sun Nov 18 05:46:11 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C7B316A419 for ; Sun, 18 Nov 2007 05:46:11 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.181]) by mx1.freebsd.org (Postfix) with ESMTP id 40E3A13C458 for ; Sun, 18 Nov 2007 05:46:09 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so1637669waf for ; Sat, 17 Nov 2007 21:45:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:received:received:date:from:to:cc:subject:message-id:reply-to:references:mime-version:content-type:content-disposition:in-reply-to:user-agent; bh=hUc8xIB69tGvj/aziiiSb6zLnZbk/+YEXb9DMXSxwco=; b=BnED7aUFWeww7H2Uuzp4tCBuPOIpCLXr2BQ4bvmj7tvfoehvMBn7Sl6gdJCB8vcY4AuQBGA70W7jwHfNsgTpIWDinm8mkw3lWukrV2sKu81FLgdSBTX6QhVK+v0ChuZD+7berRypvttcNKS1wPvyEOL79dsa29mG4B0yJJkaI7A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:date:from:to:cc:subject:message-id:reply-to:references:mime-version:content-type:content-disposition:in-reply-to:user-agent; b=N1cR5GFsO2MKQnCrvz8GuxPu1JhLV3V0D2kQUpb6Tm9PMd0tEOuG+wiy9fyHEsVuznF0CDdubTGyuAmltmm3RWNyv5PIDJecsJBnbdjObMsrU7tG2VUZ7aw8vebvNXt0qwGR3DBgL8cFkSMCkTZJ/zdWsWI/h0EWuW3IQmeFgCA= Received: by 10.114.106.1 with SMTP id e1mr383302wac.1195364758524; Sat, 17 Nov 2007 21:45:58 -0800 (PST) Received: from michelle.cdnetworks.co.kr ( [211.53.35.84]) by mx.google.com with ESMTPS id j21sm5739402wah.2007.11.17.21.45.54 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 17 Nov 2007 21:45:56 -0800 (PST) Received: from michelle.cdnetworks.co.kr (localhost.cdnetworks.co.kr [127.0.0.1]) by michelle.cdnetworks.co.kr (8.13.5/8.13.5) with ESMTP id lAI5iCCx001625 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 18 Nov 2007 14:44:12 +0900 (KST) (envelope-from pyunyh@gmail.com) Received: (from yongari@localhost) by michelle.cdnetworks.co.kr (8.13.5/8.13.5/Submit) id lAI5i9Vk001624; Sun, 18 Nov 2007 14:44:09 +0900 (KST) (envelope-from pyunyh@gmail.com) Date: Sun, 18 Nov 2007 14:44:09 +0900 From: Pyun YongHyeon To: Mike Andrews Message-ID: <20071118054409.GA1044@cdnetworks.co.kr> References: <20071117003504.R31357@mindcrime.int.bit0.com> <20071117213316.499be43b@vlink.ru> <20071117170537.F59492@mindcrime.int.bit0.com> <20071117182232.T59492@mindcrime.int.bit0.com> <473F9552.50402@bit0.com> <473FBD1A.8010207@bit0.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="BOKacYhQ+x31HxR3" Content-Disposition: inline In-Reply-To: <473FBD1A.8010207@bit0.com> User-Agent: Mutt/1.4.2.1i Cc: Denis Shaposhnikov , Kip Macy , Mike Silbersack , Andre Oppermann , freebsd-current@freebsd.org Subject: Re: bizarre em + TSO + MSS issue in RELENG_7 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Nov 2007 05:46:11 -0000 --BOKacYhQ+x31HxR3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sat, Nov 17, 2007 at 11:18:34PM -0500, Mike Andrews wrote: > Kip Macy wrote: > >On Nov 17, 2007 5:28 PM, Mike Andrews wrote: > >>Kip Macy wrote: > >>>On Nov 17, 2007 3:23 PM, Mike Andrews wrote: > >>>>On Sat, 17 Nov 2007, Kip Macy wrote: > >>>> > >>>>>On Nov 17, 2007 2:33 PM, Mike Andrews wrote: > >>>>>>On Sat, 17 Nov 2007, Kip Macy wrote: > >>>>>> > >>>>>>>On Nov 17, 2007 10:33 AM, Denis Shaposhnikov wrote: > >>>>>>>>On Sat, 17 Nov 2007 00:42:54 -0500 (EST) > >>>>>>>>Mike Andrews wrote: > >>>>>>>> > >>>>>>>>>Has anyone run into problems with MSS not being respected when > >>>>>>>>>using > >>>>>>>>>TSO, specifically on em cards? > >>>>>>>>Yes, I wrote about this problem on the beginning of 2007, see > >>>>>>>> > >>>>>>>> http://tinyurl.com/3e5ak5 > >>>>>>>> > >>>>>>>if_em.c:3502 > >>>>>>> /* > >>>>>>> * Payload size per packet w/o any headers. > >>>>>>> * Length of all headers up to payload. > >>>>>>> */ > >>>>>>> TXD->tcp_seg_setup.fields.mss = > >>>>>>> htole16(mp->m_pkthdr.tso_segsz); > >>>>>>> TXD->tcp_seg_setup.fields.hdr_len = hdr_len; > >>>>>>> > >>>>>>> > >>>>>>>Please print out the value of tso_segsz here. It appears to be being > >>>>>>>set correctly. The only thing I can think of is that t_maxopd is not > >>>>>>>correct. As tso_segsz is correct here: > >>>>>>It repeatedly prints 1368 during a 1 meg file transfer over a > >>>>>>connection > >>>>>>with a 1380 MSS. Any other printf's I can add? I'm working on a web > >>>>>>page > >>>>>>with tcpdump / firewall log output illustrating the issue... > >>>>>Mike - > >>>>>Denis' tcpdump output doesn't show oversized segments, something else > >>>>>appears to be happening there. Can you post your tcpdump output > >>>>>somewhere? > >>>>URL sent off-list. > >>> if (tso) { > >>> m->m_pkthdr.csum_flags = CSUM_TSO; > >>> m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen; > >>> } > >>> > >>> > >>>Please print the value of maxopd and optlen under "if (tso)" in > >>>tcp_output. I think the calculated optlen may be too small. > >> > >>maxopt=1380 - optlen=12 = tso_segsz=1368 > >> > >>Weird though, after this reboot, I had to re-copy a 4 meg file 5 times > >>to start getting the firewall to log any drops. Transfer rate was > >>around 240KB/sec before the firewall started to drop, then it went down > >>to about 64KB/sec during the 5th copy, and stayed there for subsequent > >>copies. The actual packet size the firewall said it was dropping was > >>varying all over the place still, yet the maxopt/optlen/tso_segsz values > >>stayed constant. But it's interesting that it didn't start dropping > >>immediately after the reboot -- though the transfer rate was still > >>sub-optimal. > > > >Ok, next theory :D. You shouldn't be seeing "bad len" packets from > >tcpdump. I'm wondering if that means you're sending down more than > >64k. Can you please print out the value of mp->m_pkthdr.len around the > >same place that you printed out tso_segsz? 64k is the generally > >accepted limit for TSO, I'm wondering if the card firmware does > >something weird if you give it more. > > OK. In that last message, where I said it took 5 times to start > reproducing the problem... this time it took until I actually toggled > TSO back off and back on again, and then it started acting up again. I > don't know what the actual trigger is... it's very weird. > > Initially, w/ TSO on and it wasn't dropping yet (but was still > transferring slow)... > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306 > (etc, always 8306) > > After toggling off/on which caused the drops to start (and the speed to > drop even further): > > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=7507 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=3053 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1677 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=3037 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=2264 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1656 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1902 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1888 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1640 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1871 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=2461 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1849 > BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=2092 > > and so on, with more seemingly random lengths... but none of them ever > over 8306, much less 64K. It seems that em_tso_setup() doesn't clear txd_upper/txd_lower in failure path so that unintialized value could be used in subsequent Tx descriptor setup. How about clearing those variable?(Patch attached) It seems that em(4) uses EM_TSO_SIZE(64K) to create DMA tag. A packet can have 64K payload under TSO so its the mximum size of the mbuf chain would be 64K + sizeof(link layer). So I guess the EM_TSO_SIZE should be increased to hold sizeof(link layer). It had been a long time since I looked into em(4) so I'm not sure. -- Regards, Pyun YongHyeon --BOKacYhQ+x31HxR3 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="em.tso.patch" Index: if_em.c =================================================================== RCS file: /home/ncvs/src/sys/dev/em/if_em.c,v retrieving revision 1.184 diff -u -r1.184 if_em.c --- if_em.c 10 Sep 2007 21:50:40 -0000 1.184 +++ if_em.c 18 Nov 2007 05:42:35 -0000 @@ -1791,6 +1791,7 @@ m_head = *m_headp; /* Do hardware assists */ + txd_upper = txd_lower = 0; if (em_tso_setup(adapter, m_head, &txd_upper, &txd_lower)) /* we need to make a final sentinel transmit desc */ tso_desc = TRUE; --BOKacYhQ+x31HxR3--