From owner-freebsd-current Wed Jan 16 14:29:38 2002 Delivered-To: freebsd-current@freebsd.org Received: from mail.acns.ab.ca (h24-64-56-135.cg.shawcable.net [24.64.56.135]) by hub.freebsd.org (Postfix) with ESMTP id A0F6037B400 for ; Wed, 16 Jan 2002 14:29:09 -0800 (PST) Received: from colnta.acns.ab.ca (colnta.acns.ab.ca [192.168.1.2]) by mail.acns.ab.ca (8.11.6/8.11.3) with ESMTP id g0GMT9I81929; Wed, 16 Jan 2002 15:29:09 -0700 (MST) (envelope-from davidc@colnta.acns.ab.ca) Received: (from davidc@localhost) by colnta.acns.ab.ca (8.11.6/8.11.3) id g0GMT8H01682; Wed, 16 Jan 2002 15:29:08 -0700 (MST) (envelope-from davidc) Date: Wed, 16 Jan 2002 15:29:08 -0700 From: Chad David To: Terry Lambert Cc: Chad David , current@freebsd.org Subject: Re: socket shutdown delay? Message-ID: <20020116152908.A1476@colnta.acns.ab.ca> Mail-Followup-To: Terry Lambert , Chad David , current@freebsd.org References: <20020116070908.A803@colnta.acns.ab.ca> <3C45F32A.5B517F7E@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3C45F32A.5B517F7E@mindspring.com>; from tlambert2@mindspring.com on Wed, Jan 16, 2002 at 01:39:54PM -0800 Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Jan 16, 2002 at 01:39:54PM -0800, Terry Lambert wrote: > Chad David wrote: > > Has anyone noticed (or fixed) a bug in -current where socket connections > > on the local machine do not shutdown properly? During stress testing > > I'm seeing thousands (2316 right now) of these: > > > > tcp4 0 0 192.168.1.2.8080 192.168.1.2.2215 FIN_WAIT_2 > > tcp4 0 0 192.168.1.2.2215 192.168.1.2.8080 LAST_ACK > > > > Both the client and the server are dead, but the connections stay in this > > state. > > > > I tested with the server on -current and the client on another box, and > > all of the server sockets end up in TIME_WAIT. Is there something delaying > > the last ack on local connections? > > A connection goes into FIN_WAIT_2 when it has received the ACK > of the FIN, but not received a FIN (or sent an ACK) itself, thus > permitting it to enter TIME_WAIT state for 2MSL before proceeding > to the CLOSED state, as a result of a server initiated close. > > A connection goes into LAST_ACK when it has sent a FIN and not > received the ACK of the FIN before proceeding to the CLOSED > state, as a result of a client initiated close. I've got TCP/IP Illistrated V1 right beside me, so I basically knew what was happening. Just not why. Like I said in the original email, connections from another machine end up in TIME_WAIT right away, it is only local connection. > > Since it's showing IP addresses, you appear to be using real > network connections, rather than loopback connections. In this case yes. Connections to 127.0.0.1 result in the same thing. > > There are basically several ways to cause this: > > 1) You have something on your network, like a dummynet, > that is deteministically dropping the the ACK to > the client when the server goes from FIN_WAIT_1, > so that the server goes to CLOSING instead of going > to FIN_WAIT_2 (client closes first), or the FIN in > the other direction so that the server doesn't go > to TIME_WAIT from FIN_WAIT_2 (server closes first). Nothing like that on the box. > > 2) You have intentionally disabled KEEPALIVE, so that > a close results in an RST instead of a normal > shutdown of the TCP connection (I can't tell if > you are doing a real call to "shutdown(2)", or if > you are just relying on the OS resource tracking > behaviour that is implicit to "close(2)" (but only > if you don't set KEEPALIVE, and have disabled the > sysctl default of always doing KEEPALIVE on every > connection). In this case, it's possible that the > RST was lost on the wire, and since RSTs are not > retransmitted, you have shot yourself in the foot. > > Note: You often see this type of foolish foot > shooting when running MAST, WAST, or > webbench, which try to factor out response > speed and measure connection speed, so that > they benchmark the server, not the FS or > other OS latencies in the document delivery > path (which is why these tools suck as real > world benchmarks go). You could also cause > this (unlikely) with a bad firewall rule. I haven't changed any sysctls, and other than SO_REUSEADDR, the default sockopts are being used. I also do not call shutdown() on either end, and both the client and server processes have exited and the connections still do not clear up (in time they do, around 10 minutes). > > 3) You've exhausted your mbufs before you've exhausted > the number of simultaneous connections you are > permitted, because you have incorrectly tuned your > kernel, and therefore all your connections are sitting > in a starvation deadlock, waiting for packets that can > never be sent because there are no mbufs available. The client eventually fails with EADDRNOTAVAIL. Here are the mbuf stats before and after. Before test: ------------------------------------------------------------------------ colnta->netstat -m mbuf usage: GEN list: 0/0 (in use/in pool) CPU #0 list: 51/144 (in use/in pool) CPU #1 list: 51/144 (in use/in pool) Total: 102/288 (in use/in pool) Maximum number allowed on each CPU list: 512 Maximum possible: 67584 Allocated mbuf types: 102 mbufs allocated to data 0% of mbuf map consumed mbuf cluster usage: GEN list: 0/0 (in use/in pool) CPU #0 list: 50/86 (in use/in pool) CPU #1 list: 51/88 (in use/in pool) Total: 101/174 (in use/in pool) Maximum number allowed on each CPU list: 128 Maximum possible: 33792 0% of cluster map consumed 420 KBytes of wired memory reserved (54% in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines After test: ------------------------------------------------------------------------ colnta->netstat -m mbuf usage: GEN list: 0/0 (in use/in pool) CPU #0 list: 59/144 (in use/in pool) CPU #1 list: 43/144 (in use/in pool) Total: 102/288 (in use/in pool) Maximum number allowed on each CPU list: 512 Maximum possible: 67584 Allocated mbuf types: 102 mbufs allocated to data 0% of mbuf map consumed mbuf cluster usage: GEN list: 0/0 (in use/in pool) CPU #0 list: 58/86 (in use/in pool) CPU #1 list: 43/88 (in use/in pool) Total: 101/174 (in use/in pool) Maximum number allowed on each CPU list: 128 Maximum possible: 33792 0% of cluster map consumed 420 KBytes of wired memory reserved (54% in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines and colnta->netstat -an | grep FIN_WAIT_2 | wc 2814 16884 219492 and a few minutes later: colnta->netstat -an | grep FIN_WAIT_2 | wc 1434 8604 111852 The box currently has 630MB free memory, and is 98.8% idle. I'm not sure what other information would be useful? > > 4) You've got local hacks that your aren't telling us > about (shame on you!). Nope. Stock -current, none of my patches applied. > > 5) You have found an introduced bug in -current. > > Note: I personally think this one is unlikely. Me too, but I can't think of any reason why the machine wouldn't send the last ack. I must be starving something... I'll go over my code again, and see if I can find a bug. > > 6) Maybe something I haven't thought of... > > Note: I personally think this one is unlikely, > too... ;^) Well if you don't know, where does that leave me? :). > > See RFC 793 (or Stevens) for details on the state machine for > both ends of the connection, and you will see how your machine > got into this mess in the first place. I've been reading it... Thanks. -- Chad David davidc@acns.ab.ca www.FreeBSD.org davidc@freebsd.org ACNS Inc. Calgary, Alberta Canada Fourthly, The constant breeders, beside the gain of eight shillings sterling per annum by the sale of their children, will be rid of the charge of maintaining them after the first year. - Johnathan Swift To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message