From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  1 01:53:52 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 20EFE305;
 Tue,  1 Apr 2014 01:53:52 +0000 (UTC)
Received: from mail-wg0-x22f.google.com (mail-wg0-x22f.google.com
 [IPv6:2a00:1450:400c:c00::22f])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7D3AF159;
 Tue,  1 Apr 2014 01:53:51 +0000 (UTC)
Received: by mail-wg0-f47.google.com with SMTP id x12so6588872wgg.18
 for <multiple recipients>; Mon, 31 Mar 2014 18:53:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=XxdVTjfGHaIxV71Kd4ticJ+z9xDXWhv0EeCfZM0UiyE=;
 b=HiGeEyqL5Bdkl8wtcVmxVHPeK4d1QGCENUOoGyQfDVaFyV//ZEtOi01BNmNvdLIl17
 nH+pc899V1wavGsB1qCCKg+PAPud2P//gM+LmTwnIG/tYM/X1qULD5eLEwGBbD0x7W1z
 GLYsvR85g7F76JupLnnHo+9kDD80jKXGz3ocBRKHnGwY5Miil9Vk+zPp3P58gue41x32
 nLHcj/J0y4LsjAzKz2eQlzpsmyZd0Z4xJZIYr35xSoDToxvJeEXZM3XjM62nGMpJ/Gr3
 /oSp+sIFIGj7z18m50fhj8jUykuKAgHCmMFmys3yfRg/6XaCyeN/cRXx+fgCYOApStiB
 rZ6w==
MIME-Version: 1.0
X-Received: by 10.180.94.196 with SMTP id de4mr16371875wib.16.1396317229858;
 Mon, 31 Mar 2014 18:53:49 -0700 (PDT)
Received: by 10.216.190.199 with HTTP; Mon, 31 Mar 2014 18:53:49 -0700 (PDT)
In-Reply-To: <2056019527.3811582.1396316605342.JavaMail.root@uoguelph.ca>
References: <CAOfEmZhUtUhX_OOGV6R4ogTJPTL0cEPGDv3WgPM2M3hiPs9mxQ@mail.gmail.com>
 <2056019527.3811582.1396316605342.JavaMail.root@uoguelph.ca>
Date: Tue, 1 Apr 2014 09:53:49 +0800
Message-ID: <CAOfEmZikpVQo84Tq-4Lp5EXHYkXk3XmF69XPkod1-PtQKdEuAw@mail.gmail.com>
Subject: Re: RFC: How to fix the NFS/iSCSI vs TSO problem
From: Marcelo Araujo <araujobsdport@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>,
 Alexander Motin <mav@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: araujo@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2014 01:53:52 -0000

2014-04-01 9:43 GMT+08:00 Rick Macklem <rmacklem@uoguelph.ca>:

> Marcelo Araujo wrote:
> >
> > Hello Rick,
> >
> >
> > We have made couple more benchmarks here with additional options,
> > such like '64 threads' and readahead=8.
> >
> I can't remember, but if you haven't already done so, another thing to
> try are these sysctls on the server:
> sysctl vfs.nfsd.tcphighwater=100000
> sysctl vfs.nfsd.tcpcachetimeo=300
>
> These should reduce the server's CPU overhead (how important these
> setting are depends on how current your kernel is).
>

I haven't done it, I don't have these sysctl on my system.


>
> >
> > Now, we add nfsstat and netstat -m into the table.
> > Here attached is the full benchmark, and I can say, this patch really
> > improved the read speed.
> >
> I noticed a significant reduction in CPU usage on the server (about 20%).
> An interesting question would be "Is this CPU reduction a result of
> avoiding the m_defrag() calls in the ix driver?".
>

I do believe it is because avoid m_defrag(), but I didn't try to dig into
it to check if is really m_defrag().


> Unfortunately, the only way I can think of answering this is doing the
> benchmarks on hardware without the 32 mbuf chain limitation, but I
> doubt that you can do that?
>

No, I don't have any hardware without the 32mbuf limitation.


>
> Put another way, it would be interesting to compare "with vs without"
> the patch on machines where the network interface can handle 35 mbufs
> in the transmit chain, so there aren't m_defrag() calls being done for
> the non-patched case.
>
> Anyhow, have fun with it, rick
>

Maybe Christopher can do this benchmark as well in his environment.


>
> >
> > I understand your concern about add more one sysctl, however maybe we
> > can do something like ZFS does, if it detect the system is AMD and
> > have more than X of RAM it enables some options by default, or a
> > kind of warning can be displayed show the new sysctl option.
> >
> >
> > Of, course other people opinion will be very welcome.
> >
> >
> > Best Regards,
> >
> >
> >
> > 2014-03-29 6:44 GMT+08:00 Rick Macklem < rmacklem@uoguelph.ca > :
> >
> >
> >
> >
> > Marcelo Araujo wrote:
> > > 2014-03-28 5:37 GMT+08:00 Rick Macklem < rmacklem@uoguelph.ca >:
> > >
> > > > Christopher Forgeron wrote:
> > > > > I'm quite sure the problem is on 9.2-RELEASE, not 9.1-RELEASE
> > > > > or
> > > > > earlier,
> > > > > as a 9.2-STABLE from last year I have doesn't exhibit the
> > > > > problem.
> > > > > New
> > > > > code in if.c at line 660 looks to be what is starting this,
> > > > > which
> > > > > makes me
> > > > > wonder how TSO was being handled before 9.2.
> > > > >
> > > > > I also like Rick's NFS patch for cluster size. I notice an
> > > > > improvement, but
> > > > > don't have solid numbers yet. I'm still stress testing it as we
> > > > > speak.
> > > > >
> > > > Unfortunately, this causes problems for small i386 systems, so I
> > > > am reluctant to commit it to head. Maybe a variant that is only
> > > > enabled for amd64 systems with lots of memory would be ok?
> > > >
> > > >
> > > Rick,
> > >
> > > Maybe you can create a SYSCTL to enable/disable it by the end user
> > > will be
> > > more reasonable. Also, of course, it is so far safe if only 64Bits
> > > CPU can
> > > enable this SYSCTL. Any other option seems not OK, will be hard to
> > > judge
> > > what is lots of memory and what is not, it will depends what is
> > > running
> > > onto the system.
> > >
> > I guess adding it so it can be optionally enabled via a sysctl isn't
> > a bad idea. I think the largest risk here is "how do you tell people
> > what the risk of enabling this is"?
> >
> > There are already a bunch of sysctls related to NFS that few people
> > know how to use. (I recall that Alexander has argued that folk don't
> > want
> > worry about these tunables and I tend to agree.)
> >
> > If I do a variant of the patch that uses m_getjcl(..M_WAITOK..), then
> > at least the "breakage" is thread(s) sleeping on "btallo", which is
> > fairly easy to check for, althouggh rather obscure.
> > (Btw, I've never reproduced this for a patch that changes the code to
> > always use MJUMPAGESIZE mbuf clusters.
> > I can only reproduce it intermittently when the patch mixes
> > allocation of
> > MCLBYTES clusters and MJUMPAGESIZE clusters.)
> >
> > I've been poking at it to try and figure out how to get
> > m_getjcl(..M_NOWAIT..)
> > to return NULL instead of looping when it runs out of boundary tags
> > (to
> > see if that can result in a stable implementation of the patch), but
> > haven't had much luck yet.
> >
> > Bottom line:
> > I just don't like committing a patch that can break the system in
> > such an
> > obscure way, even if it is enabled via a sysctl.
> >
> > Others have an opinion on this?
> >
> > Thanks, rick
> >
> >
> > > The SYSCTL will be great, and in case you don't have time to do it,
> > > I
> > > can
> > > give you a hand.
> > >
> > > I'm gonna do more benchmarks today and will send another report,
> > > but
> > > in our
> > > product here, I'm inclined to use this patch, because 10~20% speed
> > > up
> > > in
> > > read for me is a lot. :-)
> > >
> > > Thank you so much and best regards,
> > > --
> > > Marcelo Araujo
> > > araujo@FreeBSD.org
> >
> >
> > > _______________________________________________
> > > freebsd-net@freebsd.org mailing list
> > > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > > To unsubscribe, send any mail to
> > > " freebsd-net-unsubscribe@freebsd.org "
> > >
> >
> >
> >
> >
> > --
> > Marcelo Araujo
> > araujo@FreeBSD.org
>



-- 
Marcelo Araujo
araujo@FreeBSD.org