From owner-freebsd-net@FreeBSD.ORG Thu Jul 10 22:31:50 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8BD4C19C; Thu, 10 Jul 2014 22:31:50 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 3C78B2CEF; Thu, 10 Jul 2014 22:31:49 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqcEAEAQv1ODaFve/2dsb2JhbABZg2Bagm+9dAqGb1MBgSV1hAMBAQEDAQEBASAEJyALBRYOCgICDRkCKQEJJgYIBwQBHASIGQgNrHOZLReBLI1BBgEBGwEzB4J3gUwFmBSENJJNg18hNX0IFyI X-IronPort-AV: E=Sophos;i="5.01,640,1400040000"; d="scan'208";a="139312936" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 10 Jul 2014 18:31:43 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 25977B4045; Thu, 10 Jul 2014 18:31:43 -0400 (EDT) Date: Thu, 10 Jul 2014 18:31:43 -0400 (EDT) From: Rick Macklem To: John Baldwin Message-ID: <1610703198.9975909.1405031503143.JavaMail.root@uoguelph.ca> In-Reply-To: <201407101325.46156.jhb@freebsd.org> Subject: Re: NFS client READ performance on -current MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: "Russell L. Carter" , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jul 2014 22:31:50 -0000 John Baldwin wrote: > On Thursday, July 03, 2014 8:51:01 pm Rick Macklem wrote: > > Russell L. Carter wrote: > > > > > > > > > On 07/02/14 19:09, Rick Macklem wrote: > > > > > > > Could you please post the dmesg stuff for the network > > > > interface, > > > > so I can tell what driver is being used? I'll take a look at > > > > it, > > > > in case it needs to be changed to use m_defrag(). > > > > > > em0: port > > > 0xd020-0xd03f > > > mem > > > 0xfe4a0000-0xfe4bffff,0xfe480000-0xfe49ffff irq 44 at device 0.0 > > > on > > > pci2 > > > em0: Using an MSI interrupt > > > em0: Ethernet address: 00:15:17:bc:29:ba > > > 001.000007 [2323] netmap_attach success for em0 tx > > > 1/1024 > > > rx > > > 1/1024 queues/slots > > > > > > This is one of those dual nic cards, so there is em1 as well... > > > > > Well, I took a quick look at the driver and it does use m_defrag(), > > but > > I think that the "retry:" label it does a goto after doing so might > > be in > > the wrong place. > > > > The attached untested patch might fix this. > > > > Is it convenient to build a kernel with this patch applied and then > > try > > it with TSO enabled? > > > > rick > > ps: It does have the transmit segment limit set to 32. I have no > > idea if > > this is a hardware limitation. > > I think the retry is not in the wrong place, but the overhead of all > those > pullups is apparently quite severe. The m_defrag() call after the first failure will just barely squeeze the just under 64K TSO segment into 32 mbuf clusters. Then I think any m_pullup() done during the retry will allocate an mbuf (at a glance it seems to always do this when the old mbuf is a cluster) and prepend that to the list. --> Now the list is > 32 mbufs again and the bus_dmammap_load_mbuf_sg() will fail again on the retry, this time fatally, I think? I can't see any reason to re-do all the stuff using m_pullup() and Russell reported that moving the "retry:" fixed his problem, from what I understood. > It would be interesting to test > the > following in addition to your change to see if it improves > performance > further: > > Index: if_em.c > =================================================================== > --- if_em.c (revision 268495) > +++ if_em.c (working copy) > @@ -1959,7 +1959,9 @@ retry: > if (error == EFBIG && remap) { > struct mbuf *m; > > - m = m_defrag(*m_headp, M_NOWAIT); > + m = m_collapse(*m_headp, M_NOWAIT, EM_MAX_SCATTER); > + if (m == NULL) > + m = m_defrag(*m_headp, M_NOWAIT); Since a just under 64K TSO segment barely fits in 32 mbuf clusters, I'm at least 99% sure the m_collapse() will fail, but it can't hurt to try it. (If it supported 33 or 34, I think m_collapse() would have a reasonable chance of success.) Right now the NFS and krpc code creates 2 small mbufs in front of the read/write data clusters and I think the TCP layer adds another one. Even if this was modified to put it all in one cluster, I don't think m_collapse() would succeed, since it only copies the data up and deletes an mbuf from the chain if it will all fit in the preceding one. Since the read/write data clusters are full (except the last one), they can't fit in the M_TRAILINGSPACE() of the preceding one unless it is empty from my reading of m_collapse(). rick > if (m == NULL) { > adapter->mbuf_alloc_failed++; > m_freem(*m_headp); > > > -- > John Baldwin > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscribe@freebsd.org" >