Date: Thu, 29 Nov 2001 23:14:53 -0500 From: Sergey Babkin <babkin@bellatlantic.net> To: jc@irbs.com Cc: freebsd-hackers@freebsd.org Subject: Re: FreeBSD performing worse than Linux? Message-ID: <3C0707BD.B8FA31F8@bellatlantic.net> References: <20011128153817.T61580@monorchid.lemis.com> <15364.38174.938500.946169@caddis.yogotech.com> <20011129004234.A16101@exuma.irbs.com>
next in thread | previous in thread | raw e-mail | index | archive | help
John Capo wrote: > > Now this thread comes along and I realize there is something wrong > so I did a little testing. > > find / -print on one of my servers in a ssh session will fill the > pipe to my office, 256K frame, and run nicely then get into the > starting and stopping mode after a good amount of data has been > sent. find / -print | dd obs=1 will screw up within a few seconds > and stay that way. Netstat in another ssh session shows data ready > to go: > > tcp4 0 15928 server.22 client.4427 ESTABLISHED > > This is a fragment from a dump on the server side while running > find / -print | dd obs=1 > > 21:41:46.328381 client.4427 > server.22: . ack 11249 win 17328 <nop,nop,timestamp 53827689 105528699> (DF) [tos 0x10] > 21:41:46.335863 client.4427 > server.22: . ack 11345 win 17328 <nop,nop,timestamp 53827689 105528699> (DF) [tos 0x10] > 21:41:46.342216 client.4427 > server.22: . ack 11441 win 17328 <nop,nop,timestamp 53827690 105528699> (DF) [tos 0x10] > 21:41:46.396051 client.4427 > server.22: . ack 11489 win 17376 <nop,nop,timestamp 53827696 105528699> (DF) [tos 0x10] > 21:41:46.418208 client.4427 > server.22: . ack 11489 win 17376 <nop,nop,timestamp 53827698 105528699> (DF) [tos 0x10] > 21:41:47.460903 server.22 > client.4427: . 11489:12937(1448) ack 144 win 17376 <nop,nop,timestamp 105528895 53827698> (DF) [tos 0x10] > 21:41:47.569133 client.4427 > server.22: . ack 12937 win 15928 <nop,nop,timestamp 53827813 105528895> (DF) [tos 0x10] I would say that some weird thing is going on on the server side. Apparently the server was sending the data fast enough to exhaust the client's window (this part we don't see in the log, it happened somewhere in the preceeding packets). Then as these packets reach the client, the client send the acks for them. The server should continue sending the data to fill up the window but it does not. It spends a whole extra second in a coma and only then sends the next packet 11489:12937. So the bug should be somewhere around the code that resumes transmission after filling up the window. Also the duplicate ACK for 11489 suggest that maybe the server has send the packet 11441:11489 twice (if you have the full log saved, you can check if it really was so) by some weird reason which may also indicate a bug. In fact, the origin of the bug may be corruption of some field in the protocol control block that screws up the TCP state to both send that packet twice and have difficulties restarting after it. > 21:41:49.001039 client.4427 > server.22: P 144:192(48) ack 12937 win 17376 <nop,nop,timestamp 53827954 105528895> (DF) [tos 0x10] > 21:41:49.001073 server.22 > client.4427: . 28049:29497(1448) ack 192 win 17328 <nop,nop,timestamp 105529049 53827954> (DF) [tos 0x10] > 21:41:49.001085 server.22 > client.4427: P 29497:30313(816) ack 192 win 17328 <nop,nop,timestamp 105529049 53827954> (DF) [tos 0x10] > 21:41:49.109131 client.4427 > server.22: . ack 12937 win 17376 <nop,nop,timestamp 53827967 105528895> (DF) [tos 0x10] And here a _very_ pathological thing has happened: the server just forgot to send the data between sequence numbers 12937 and 28049. Since the dump was done on the server side, this suggests that something very bad has happened with the TCP state on the server side. Possibly the value of the current sequence number in the protocol control block got overwritten by something. -SB To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C0707BD.B8FA31F8>