From owner-freebsd-net@FreeBSD.ORG  Thu Jul 10 22:31:50 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8BD4C19C;
 Thu, 10 Jul 2014 22:31:50 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id 3C78B2CEF;
 Thu, 10 Jul 2014 22:31:49 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqcEAEAQv1ODaFve/2dsb2JhbABZg2Bagm+9dAqGb1MBgSV1hAMBAQEDAQEBASAEJyALBRYOCgICDRkCKQEJJgYIBwQBHASIGQgNrHOZLReBLI1BBgEBGwEzB4J3gUwFmBSENJJNg18hNX0IFyI
X-IronPort-AV: E=Sophos;i="5.01,640,1400040000"; d="scan'208";a="139312936"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 10 Jul 2014 18:31:43 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 25977B4045;
 Thu, 10 Jul 2014 18:31:43 -0400 (EDT)
Date: Thu, 10 Jul 2014 18:31:43 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <1610703198.9975909.1405031503143.JavaMail.root@uoguelph.ca>
In-Reply-To: <201407101325.46156.jhb@freebsd.org>
Subject: Re: NFS client READ performance on -current
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926)
Cc: "Russell L. Carter" <rcarter@pinyon.org>, freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Jul 2014 22:31:50 -0000

John Baldwin wrote:
> On Thursday, July 03, 2014 8:51:01 pm Rick Macklem wrote:
> > Russell L. Carter wrote:
> > > 
> > > 
> > > On 07/02/14 19:09, Rick Macklem wrote:
> > > 
> > > > Could you please post the dmesg stuff for the network
> > > > interface,
> > > > so I can tell what driver is being used? I'll take a look at
> > > > it,
> > > > in case it needs to be changed to use m_defrag().
> > > 
> > > em0: <Intel(R) PRO/1000 Network Connection 7.4.2> port
> > > 0xd020-0xd03f
> > > mem
> > > 0xfe4a0000-0xfe4bffff,0xfe480000-0xfe49ffff irq 44 at device 0.0
> > > on
> > > pci2
> > > em0: Using an MSI interrupt
> > > em0: Ethernet address: 00:15:17:bc:29:ba
> > > 001.000007 [2323] netmap_attach             success for em0 tx
> > > 1/1024
> > > rx
> > > 1/1024 queues/slots
> > > 
> > > This is one of those dual nic cards, so there is em1 as well...
> > > 
> > Well, I took a quick look at the driver and it does use m_defrag(),
> > but
> > I think that the "retry:" label it does a goto after doing so might
> > be in
> > the wrong place.
> > 
> > The attached untested patch might fix this.
> > 
> > Is it convenient to build a kernel with this patch applied and then
> > try
> > it with TSO enabled?
> > 
> > rick
> > ps: It does have the transmit segment limit set to 32. I have no
> > idea if
> >     this is a hardware limitation.
> 
> I think the retry is not in the wrong place, but the overhead of all
> those
> pullups is apparently quite severe.
The m_defrag() call after the first failure will just barely squeeze
the just under 64K TSO segment into 32 mbuf clusters. Then I think any
m_pullup() done during the retry will allocate an mbuf
(at a glance it seems to always do this when the old mbuf is a cluster)
and prepend that to the list.
--> Now the list is > 32 mbufs again and the bus_dmammap_load_mbuf_sg()
    will fail again on the retry, this time fatally, I think?

I can't see any reason to re-do all the stuff using m_pullup() and Russell
reported that moving the "retry:" fixed his problem, from what I understood.

>  It would be interesting to test
> the
> following in addition to your change to see if it improves
> performance
> further:
> 
> Index: if_em.c
> ===================================================================
> --- if_em.c	(revision 268495)
> +++ if_em.c	(working copy)
> @@ -1959,7 +1959,9 @@ retry:
>  	if (error == EFBIG && remap) {
>  		struct mbuf *m;
>  
> -		m = m_defrag(*m_headp, M_NOWAIT);
> +		m = m_collapse(*m_headp, M_NOWAIT, EM_MAX_SCATTER);
> +		if (m == NULL)
> +			m = m_defrag(*m_headp, M_NOWAIT);
Since a just under 64K TSO segment barely fits in 32 mbuf clusters,
I'm at least 99% sure the m_collapse() will fail, but it can't hurt to
try it. (If it supported 33 or 34, I think m_collapse() would have a
reasonable chance of success.)

Right now the NFS and krpc code creates 2 small mbufs in front of the
read/write data clusters and I think the TCP layer adds another one.
Even if this was modified to put it all in one cluster, I don't think
m_collapse() would succeed, since it only copies the data up and deletes
an mbuf from the chain if it will all fit in the preceding one. Since
the read/write data clusters are full (except the last one), they can't
fit in the M_TRAILINGSPACE() of the preceding one unless it is empty
from my reading of m_collapse().

rick

>  		if (m == NULL) {
>  			adapter->mbuf_alloc_failed++;
>  			m_freem(*m_headp);
> 
> 
> --
> John Baldwin
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
>