From owner-freebsd-fs@FreeBSD.ORG Tue Apr 1 01:53:52 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 20EFE305; Tue, 1 Apr 2014 01:53:52 +0000 (UTC) Received: from mail-wg0-x22f.google.com (mail-wg0-x22f.google.com [IPv6:2a00:1450:400c:c00::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7D3AF159; Tue, 1 Apr 2014 01:53:51 +0000 (UTC) Received: by mail-wg0-f47.google.com with SMTP id x12so6588872wgg.18 for ; Mon, 31 Mar 2014 18:53:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=XxdVTjfGHaIxV71Kd4ticJ+z9xDXWhv0EeCfZM0UiyE=; b=HiGeEyqL5Bdkl8wtcVmxVHPeK4d1QGCENUOoGyQfDVaFyV//ZEtOi01BNmNvdLIl17 nH+pc899V1wavGsB1qCCKg+PAPud2P//gM+LmTwnIG/tYM/X1qULD5eLEwGBbD0x7W1z GLYsvR85g7F76JupLnnHo+9kDD80jKXGz3ocBRKHnGwY5Miil9Vk+zPp3P58gue41x32 nLHcj/J0y4LsjAzKz2eQlzpsmyZd0Z4xJZIYr35xSoDToxvJeEXZM3XjM62nGMpJ/Gr3 /oSp+sIFIGj7z18m50fhj8jUykuKAgHCmMFmys3yfRg/6XaCyeN/cRXx+fgCYOApStiB rZ6w== MIME-Version: 1.0 X-Received: by 10.180.94.196 with SMTP id de4mr16371875wib.16.1396317229858; Mon, 31 Mar 2014 18:53:49 -0700 (PDT) Received: by 10.216.190.199 with HTTP; Mon, 31 Mar 2014 18:53:49 -0700 (PDT) In-Reply-To: <2056019527.3811582.1396316605342.JavaMail.root@uoguelph.ca> References: <2056019527.3811582.1396316605342.JavaMail.root@uoguelph.ca> Date: Tue, 1 Apr 2014 09:53:49 +0800 Message-ID: Subject: Re: RFC: How to fix the NFS/iSCSI vs TSO problem From: Marcelo Araujo To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: FreeBSD Filesystems , Alexander Motin X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: araujo@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2014 01:53:52 -0000 2014-04-01 9:43 GMT+08:00 Rick Macklem : > Marcelo Araujo wrote: > > > > Hello Rick, > > > > > > We have made couple more benchmarks here with additional options, > > such like '64 threads' and readahead=8. > > > I can't remember, but if you haven't already done so, another thing to > try are these sysctls on the server: > sysctl vfs.nfsd.tcphighwater=100000 > sysctl vfs.nfsd.tcpcachetimeo=300 > > These should reduce the server's CPU overhead (how important these > setting are depends on how current your kernel is). > I haven't done it, I don't have these sysctl on my system. > > > > > Now, we add nfsstat and netstat -m into the table. > > Here attached is the full benchmark, and I can say, this patch really > > improved the read speed. > > > I noticed a significant reduction in CPU usage on the server (about 20%). > An interesting question would be "Is this CPU reduction a result of > avoiding the m_defrag() calls in the ix driver?". > I do believe it is because avoid m_defrag(), but I didn't try to dig into it to check if is really m_defrag(). > Unfortunately, the only way I can think of answering this is doing the > benchmarks on hardware without the 32 mbuf chain limitation, but I > doubt that you can do that? > No, I don't have any hardware without the 32mbuf limitation. > > Put another way, it would be interesting to compare "with vs without" > the patch on machines where the network interface can handle 35 mbufs > in the transmit chain, so there aren't m_defrag() calls being done for > the non-patched case. > > Anyhow, have fun with it, rick > Maybe Christopher can do this benchmark as well in his environment. > > > > > I understand your concern about add more one sysctl, however maybe we > > can do something like ZFS does, if it detect the system is AMD and > > have more than X of RAM it enables some options by default, or a > > kind of warning can be displayed show the new sysctl option. > > > > > > Of, course other people opinion will be very welcome. > > > > > > Best Regards, > > > > > > > > 2014-03-29 6:44 GMT+08:00 Rick Macklem < rmacklem@uoguelph.ca > : > > > > > > > > > > Marcelo Araujo wrote: > > > 2014-03-28 5:37 GMT+08:00 Rick Macklem < rmacklem@uoguelph.ca >: > > > > > > > Christopher Forgeron wrote: > > > > > I'm quite sure the problem is on 9.2-RELEASE, not 9.1-RELEASE > > > > > or > > > > > earlier, > > > > > as a 9.2-STABLE from last year I have doesn't exhibit the > > > > > problem. > > > > > New > > > > > code in if.c at line 660 looks to be what is starting this, > > > > > which > > > > > makes me > > > > > wonder how TSO was being handled before 9.2. > > > > > > > > > > I also like Rick's NFS patch for cluster size. I notice an > > > > > improvement, but > > > > > don't have solid numbers yet. I'm still stress testing it as we > > > > > speak. > > > > > > > > > Unfortunately, this causes problems for small i386 systems, so I > > > > am reluctant to commit it to head. Maybe a variant that is only > > > > enabled for amd64 systems with lots of memory would be ok? > > > > > > > > > > > Rick, > > > > > > Maybe you can create a SYSCTL to enable/disable it by the end user > > > will be > > > more reasonable. Also, of course, it is so far safe if only 64Bits > > > CPU can > > > enable this SYSCTL. Any other option seems not OK, will be hard to > > > judge > > > what is lots of memory and what is not, it will depends what is > > > running > > > onto the system. > > > > > I guess adding it so it can be optionally enabled via a sysctl isn't > > a bad idea. I think the largest risk here is "how do you tell people > > what the risk of enabling this is"? > > > > There are already a bunch of sysctls related to NFS that few people > > know how to use. (I recall that Alexander has argued that folk don't > > want > > worry about these tunables and I tend to agree.) > > > > If I do a variant of the patch that uses m_getjcl(..M_WAITOK..), then > > at least the "breakage" is thread(s) sleeping on "btallo", which is > > fairly easy to check for, althouggh rather obscure. > > (Btw, I've never reproduced this for a patch that changes the code to > > always use MJUMPAGESIZE mbuf clusters. > > I can only reproduce it intermittently when the patch mixes > > allocation of > > MCLBYTES clusters and MJUMPAGESIZE clusters.) > > > > I've been poking at it to try and figure out how to get > > m_getjcl(..M_NOWAIT..) > > to return NULL instead of looping when it runs out of boundary tags > > (to > > see if that can result in a stable implementation of the patch), but > > haven't had much luck yet. > > > > Bottom line: > > I just don't like committing a patch that can break the system in > > such an > > obscure way, even if it is enabled via a sysctl. > > > > Others have an opinion on this? > > > > Thanks, rick > > > > > > > The SYSCTL will be great, and in case you don't have time to do it, > > > I > > > can > > > give you a hand. > > > > > > I'm gonna do more benchmarks today and will send another report, > > > but > > > in our > > > product here, I'm inclined to use this patch, because 10~20% speed > > > up > > > in > > > read for me is a lot. :-) > > > > > > Thank you so much and best regards, > > > -- > > > Marcelo Araujo > > > araujo@FreeBSD.org > > > > > > > _______________________________________________ > > > freebsd-net@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > > To unsubscribe, send any mail to > > > " freebsd-net-unsubscribe@freebsd.org " > > > > > > > > > > > > > -- > > Marcelo Araujo > > araujo@FreeBSD.org > -- Marcelo Araujo araujo@FreeBSD.org