From owner-freebsd-stable Thu Jul 25 7:28: 5 2002 Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B381E37B400 for ; Thu, 25 Jul 2002 07:27:59 -0700 (PDT) Received: from mail.thinkburst.com (juno.geocomm.com [204.214.64.110]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0946B43E70 for ; Thu, 25 Jul 2002 07:27:55 -0700 (PDT) (envelope-from jbozza@thinkburst.com) Received: from mailgate.thinkburstmedia.com (gateway.thinkburstmedia.com [204.214.64.100]) by mail.thinkburst.com (Postfix) with ESMTP id 347F5AEE3 for ; Thu, 25 Jul 2002 09:27:55 -0500 (CDT) Received: (qmail 4622 invoked from network); 25 Jul 2002 14:27:53 -0000 From: "Jaime Bozza" To: "'Matthew Dillon'" Cc: Subject: RE: RE: Abominable NFSv3 read performance / FreeBSD server / Solaris client Date: Thu, 25 Jul 2002 09:26:39 -0500 Message-ID: <02d401c233e7$49ef80d0$6401010a@bozza> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.3416 In-Reply-To: <200207250002.g6P02m07030238@apollo.backplane.com> X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4910.0300 Importance: Normal Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Matt, First of all, I want to thank you for all the help in explaining the dumps and so forth. I have a much better understanding of the interaction of NFS between the two systems. You've been more than patient with my stumbling and learning. In comparing the differences between the freebsd-dump and the solaris-dump (new ones in which I've been able to increase the advertised window, but not more than around 32K, even with much higher buffers), one difference I noticed is that the FreeBSD client seems to be advertising a sliding window. (RFC1323 I assume?) Even if I set tcp_wscale_always on the Solaris client, it still only advertises the same window size every time. I've fiddled with both the tcp and the nfs tunables and it just seems that the Solaris system I'm testing with can't seem to handle that much data in its buffers. (I was able to cut the time in half using the default rsize of 32768, but the system still just couldn't seem to handle the blocks as quickly as a smaller rsize.) I also installed tcpdump on the Solaris system so I could look at dumps between Solaris to Solaris and compare. From that, I noticed the Solaris server advertises a much smaller (around 24k) window no matter what, even with the client advertising something higher. (I tried setting xmit_hiwat in the startup scripts and restarting the Solaris server to assure the setting was changed before the nfs daemons came online) I may still not be getting the settings correct, but I'm at a loss at what I'm missing. Regardless, thanks again for the help. I have enough data to make the connections work similar, even if the behind the scenes aren't anything alike. Jaime -----Original Message----- From: Matthew Dillon [mailto:dillon@apollo.backplane.com] Sent: Wednesday, July 24, 2002 7:03 PM To: Jaime Bozza Cc: stable@FreeBSD.ORG Subject: Re: RE: Abominable NFSv3 read performance / FreeBSD server / Solaris client :Ok, put me in my corner. I *knew* there was something wrong with the :tcpdump, but I sat there looking at it and just thinking it was :different because of the OS. A big DUH from me. : :Ok, attached are three dumps (using your params below) from the FreeBSD :server side. All are TCP (I had to force TCP on the FreeBSD client :since it defaults to UDP - Solaris doesn't really give you the choice) :Even though it may not be relevant, I gave two dumps from Solaris, one :with 8K rsize and one with 32K rsize. (Since 32K is where the massive :increase in time occurs) : :Just to test your point, a UDP connection with a Solaris client showed a :similar tcpdump (to the FreeBSD UDP dump) and the speed was also :similar, so I think the network itself is fine. : :10.1.2.10 = FreeBSD Server :10.1.2.9 = Solaris Client :10.1.2.50 = FreeBSD Client : :Jaime Well, looking at the solaris8k-dump the solaris client is way behind on its acks. It's acking the 16K point after the FreeBSD server has pushed out 33KB, so the FreeBSD server is probably hitting the Solaris client's TCP window limit. The FreeBSD server is then not restarting the transmit as quickly as it could, but the basic problem is that Solaris is advertising too small a window I think. 16:48:42.376155 10.1.2.10.2049 > 10.1.2.9.3887210299: reply ERR 1460 (DF) 16:48:42.376609 10.1.2.9.1016 > 10.1.2.10.2049: . ack 47461 win 24820 (DF) 16:48:42.376835 10.1.2.9.1016 > 10.1.2.10.2049: . ack 50381 win 24820 (DF) 16:48:42.377041 10.1.2.9.1016 > 10.1.2.10.2049: . ack 53301 win 24820 (DF) 16:48:42.456437 10.1.2.9.290121090 > 10.1.2.10.2049: 172 read fh 957,375898/2134869 8192 bytes @ 0x000016000 (DF) Above, solaris is queueing the next read command. 16:48:42.456486 10.1.2.10.2049 > 10.1.2.9.1381004381: reply ERR 588 (DF) Above, FreeBSD is *finishing* sending the data from the previous read command. Normally FreeBSD would have burst this data up top just after the 3887210299 point but it didn't probably because it ran out of window space. And below FreeBSD is starting to send the data for the most recent read command. 16:48:42.456656 10.1.2.10.2049 > 10.1.2.9.290121090: reply ok 1460 read (DF) 16:48:42.456668 10.1.2.10.2049 > 10.1.2.9.1859070703: reply ERR 1460 (DF) Note: When you are tring to read the tcpdump output just ignore the 'reply ERR' stuff, it's just TCPDUMP trying to interpret data blocks in the stream as commands when they're really just data blocks. What you need to do is get Solaris to advertise a much larger window. Perhaps you've tried this already and did not seem to work, but perhaps that is because you didn't reset the TCP connection (killing and restarting the NFSD's on the FreeBSD server should suffice to reset the Solaris client's TCP connections). ( At the same time, make sure you keep FreeBSD's transmit buffers bumped up to at least 65535, but the main issue appears to be Solaris's advertised window ). If you do a dump on FreeBSD and Solaris does not advertise larger windows (it's advertising 24820 most of the time in the dumps you've provided to date), then you have not managed to get Solaris to advertise a larger window. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message