Date: Mon, 31 Mar 2014 22:17:14 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: araujo@FreeBSD.org Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>, Alexander Motin <mav@freebsd.org> Subject: Re: RFC: How to fix the NFS/iSCSI vs TSO problem Message-ID: <1331037009.3823520.1396318634433.JavaMail.root@uoguelph.ca> In-Reply-To: <CAOfEmZikpVQo84Tq-4Lp5EXHYkXk3XmF69XPkod1-PtQKdEuAw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Marcelo Araujo wrote: > 2014-04-01 9:43 GMT+08:00 Rick Macklem <rmacklem@uoguelph.ca>: > > > Marcelo Araujo wrote: > > > > > > Hello Rick, > > > > > > > > > We have made couple more benchmarks here with additional options, > > > such like '64 threads' and readahead=8. > > > > > I can't remember, but if you haven't already done so, another thing > > to > > try are these sysctls on the server: > > sysctl vfs.nfsd.tcphighwater=100000 > > sysctl vfs.nfsd.tcpcachetimeo=300 > > > > These should reduce the server's CPU overhead (how important these > > setting are depends on how current your kernel is). > > > > I haven't done it, I don't have these sysctl on my system. > Ok, yes, I now notice you are using a 9.1 system. > > > > > > > > > Now, we add nfsstat and netstat -m into the table. > > > Here attached is the full benchmark, and I can say, this patch > > > really > > > improved the read speed. > > > > > I noticed a significant reduction in CPU usage on the server (about > > 20%). > > An interesting question would be "Is this CPU reduction a result of > > avoiding the m_defrag() calls in the ix driver?". > > > > I do believe it is because avoid m_defrag(), but I didn't try to dig > into > it to check if is really m_defrag(). > For FreeBSD9.1, you could make this small change to sys/netinet/tcp_output.c so that it will avoid the m_defrag() calls in the "ix" driver. (line #750, 751) Replace: if (len > IP_MAXPACKET - hdrlen) { len = IP_MAXPACKET - hdrlen; with if (len > 29 * MCLBYTES - hdrlen) { len = 29 * MCLBYTES - hdrlen; I think this will keep the TSO segments at 32 mbufs in the chain. It would be interesting to see if a system with this patch still demonstrated the 20% reduction in CPU when the 4kmcl.patch is applied to it. > > > Unfortunately, the only way I can think of answering this is doing > > the > > benchmarks on hardware without the 32 mbuf chain limitation, but I > > doubt that you can do that? > > > > No, I don't have any hardware without the 32mbuf limitation. > Your previous post mentioned this network interface: NIC - 10G Intel X540 that is based on 82599 chipset. This one has the 32 mbuf limitation. (See above w.r.t. testing without m_defrag() calls.) Again, have fun with it, rick > > > > > Put another way, it would be interesting to compare "with vs > > without" > > the patch on machines where the network interface can handle 35 > > mbufs > > in the transmit chain, so there aren't m_defrag() calls being done > > for > > the non-patched case. > > > > Anyhow, have fun with it, rick > > > > Maybe Christopher can do this benchmark as well in his environment. > > > > > > > > > > I understand your concern about add more one sysctl, however > > > maybe we > > > can do something like ZFS does, if it detect the system is AMD > > > and > > > have more than X of RAM it enables some options by default, or a > > > kind of warning can be displayed show the new sysctl option. > > > > > > > > > Of, course other people opinion will be very welcome. > > > > > > > > > Best Regards, > > > > > > > > > > > > 2014-03-29 6:44 GMT+08:00 Rick Macklem < rmacklem@uoguelph.ca > : > > > > > > > > > > > > > > > Marcelo Araujo wrote: > > > > 2014-03-28 5:37 GMT+08:00 Rick Macklem < rmacklem@uoguelph.ca > > > > >: > > > > > > > > > Christopher Forgeron wrote: > > > > > > I'm quite sure the problem is on 9.2-RELEASE, not > > > > > > 9.1-RELEASE > > > > > > or > > > > > > earlier, > > > > > > as a 9.2-STABLE from last year I have doesn't exhibit the > > > > > > problem. > > > > > > New > > > > > > code in if.c at line 660 looks to be what is starting this, > > > > > > which > > > > > > makes me > > > > > > wonder how TSO was being handled before 9.2. > > > > > > > > > > > > I also like Rick's NFS patch for cluster size. I notice an > > > > > > improvement, but > > > > > > don't have solid numbers yet. I'm still stress testing it > > > > > > as we > > > > > > speak. > > > > > > > > > > > Unfortunately, this causes problems for small i386 systems, > > > > > so I > > > > > am reluctant to commit it to head. Maybe a variant that is > > > > > only > > > > > enabled for amd64 systems with lots of memory would be ok? > > > > > > > > > > > > > > Rick, > > > > > > > > Maybe you can create a SYSCTL to enable/disable it by the end > > > > user > > > > will be > > > > more reasonable. Also, of course, it is so far safe if only > > > > 64Bits > > > > CPU can > > > > enable this SYSCTL. Any other option seems not OK, will be hard > > > > to > > > > judge > > > > what is lots of memory and what is not, it will depends what is > > > > running > > > > onto the system. > > > > > > > I guess adding it so it can be optionally enabled via a sysctl > > > isn't > > > a bad idea. I think the largest risk here is "how do you tell > > > people > > > what the risk of enabling this is"? > > > > > > There are already a bunch of sysctls related to NFS that few > > > people > > > know how to use. (I recall that Alexander has argued that folk > > > don't > > > want > > > worry about these tunables and I tend to agree.) > > > > > > If I do a variant of the patch that uses m_getjcl(..M_WAITOK..), > > > then > > > at least the "breakage" is thread(s) sleeping on "btallo", which > > > is > > > fairly easy to check for, althouggh rather obscure. > > > (Btw, I've never reproduced this for a patch that changes the > > > code to > > > always use MJUMPAGESIZE mbuf clusters. > > > I can only reproduce it intermittently when the patch mixes > > > allocation of > > > MCLBYTES clusters and MJUMPAGESIZE clusters.) > > > > > > I've been poking at it to try and figure out how to get > > > m_getjcl(..M_NOWAIT..) > > > to return NULL instead of looping when it runs out of boundary > > > tags > > > (to > > > see if that can result in a stable implementation of the patch), > > > but > > > haven't had much luck yet. > > > > > > Bottom line: > > > I just don't like committing a patch that can break the system in > > > such an > > > obscure way, even if it is enabled via a sysctl. > > > > > > Others have an opinion on this? > > > > > > Thanks, rick > > > > > > > > > > The SYSCTL will be great, and in case you don't have time to do > > > > it, > > > > I > > > > can > > > > give you a hand. > > > > > > > > I'm gonna do more benchmarks today and will send another > > > > report, > > > > but > > > > in our > > > > product here, I'm inclined to use this patch, because 10~20% > > > > speed > > > > up > > > > in > > > > read for me is a lot. :-) > > > > > > > > Thank you so much and best regards, > > > > -- > > > > Marcelo Araujo > > > > araujo@FreeBSD.org > > > > > > > > > > _______________________________________________ > > > > freebsd-net@freebsd.org mailing list > > > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > > > To unsubscribe, send any mail to > > > > " freebsd-net-unsubscribe@freebsd.org " > > > > > > > > > > > > > > > > > > > -- > > > Marcelo Araujo > > > araujo@FreeBSD.org > > > > > > -- > Marcelo Araujo > araujo@FreeBSD.org >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1331037009.3823520.1396318634433.JavaMail.root>