From owner-freebsd-hackers Thu Nov 29 20:15: 2 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from out007pub.verizon.net (out007pub.verizon.net [206.46.170.107]) by hub.freebsd.org (Postfix) with ESMTP id 22F4537B419 for ; Thu, 29 Nov 2001 20:14:59 -0800 (PST) Received: from bellatlantic.net (pool-151-198-135-109.mad.east.verizon.net [151.198.135.109]) by out007pub.verizon.net with ESMTP ; id fAU4GgN24071 Thu, 29 Nov 2001 22:16:42 -0600 (CST) Message-ID: <3C0707BD.B8FA31F8@bellatlantic.net> Date: Thu, 29 Nov 2001 23:14:53 -0500 From: Sergey Babkin Reply-To: babkin@freebsd.org X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 4.0-19990626-CURRENT i386) X-Accept-Language: en, ru MIME-Version: 1.0 To: jc@irbs.com Cc: freebsd-hackers@freebsd.org Subject: Re: FreeBSD performing worse than Linux? References: <20011128153817.T61580@monorchid.lemis.com> <15364.38174.938500.946169@caddis.yogotech.com> <20011129004234.A16101@exuma.irbs.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG John Capo wrote: > > Now this thread comes along and I realize there is something wrong > so I did a little testing. > > find / -print on one of my servers in a ssh session will fill the > pipe to my office, 256K frame, and run nicely then get into the > starting and stopping mode after a good amount of data has been > sent. find / -print | dd obs=1 will screw up within a few seconds > and stay that way. Netstat in another ssh session shows data ready > to go: > > tcp4 0 15928 server.22 client.4427 ESTABLISHED > > This is a fragment from a dump on the server side while running > find / -print | dd obs=1 > > 21:41:46.328381 client.4427 > server.22: . ack 11249 win 17328 (DF) [tos 0x10] > 21:41:46.335863 client.4427 > server.22: . ack 11345 win 17328 (DF) [tos 0x10] > 21:41:46.342216 client.4427 > server.22: . ack 11441 win 17328 (DF) [tos 0x10] > 21:41:46.396051 client.4427 > server.22: . ack 11489 win 17376 (DF) [tos 0x10] > 21:41:46.418208 client.4427 > server.22: . ack 11489 win 17376 (DF) [tos 0x10] > 21:41:47.460903 server.22 > client.4427: . 11489:12937(1448) ack 144 win 17376 (DF) [tos 0x10] > 21:41:47.569133 client.4427 > server.22: . ack 12937 win 15928 (DF) [tos 0x10] I would say that some weird thing is going on on the server side. Apparently the server was sending the data fast enough to exhaust the client's window (this part we don't see in the log, it happened somewhere in the preceeding packets). Then as these packets reach the client, the client send the acks for them. The server should continue sending the data to fill up the window but it does not. It spends a whole extra second in a coma and only then sends the next packet 11489:12937. So the bug should be somewhere around the code that resumes transmission after filling up the window. Also the duplicate ACK for 11489 suggest that maybe the server has send the packet 11441:11489 twice (if you have the full log saved, you can check if it really was so) by some weird reason which may also indicate a bug. In fact, the origin of the bug may be corruption of some field in the protocol control block that screws up the TCP state to both send that packet twice and have difficulties restarting after it. > 21:41:49.001039 client.4427 > server.22: P 144:192(48) ack 12937 win 17376 (DF) [tos 0x10] > 21:41:49.001073 server.22 > client.4427: . 28049:29497(1448) ack 192 win 17328 (DF) [tos 0x10] > 21:41:49.001085 server.22 > client.4427: P 29497:30313(816) ack 192 win 17328 (DF) [tos 0x10] > 21:41:49.109131 client.4427 > server.22: . ack 12937 win 17376 (DF) [tos 0x10] And here a _very_ pathological thing has happened: the server just forgot to send the data between sequence numbers 12937 and 28049. Since the dump was done on the server side, this suggests that something very bad has happened with the TCP state on the server side. Possibly the value of the current sequence number in the protocol control block got overwritten by something. -SB To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message