Date: Wed, 25 Jun 2008 20:35:30 +0100 (BST) From: Robert Watson <rwatson@FreeBSD.org> To: Ali Niknam <freebsd-net@transip.nl> Cc: net@freebsd.org Subject: Re: FreeBSD 7.0: sockets stuck in CLOSED state... Message-ID: <20080625195523.N29013@fledge.watson.org> In-Reply-To: <486283B0.3060805@transip.nl> References: <486283B0.3060805@transip.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 25 Jun 2008, Ali Niknam wrote: > Recently i've been upgrading some of my machines from FreeBSD 6.x amd64 to > FreeBSD 7.0 amd64. > > After upgrading I noticed a weird error/bug. It seems that after several > thousand TCP connections some seem to hang in 'CLOSED' state. Sounds like there's a bug somewhere. Before we start trying to track it down, I'll tell you a little more about how this works so that we can interpret the output you're seeing. In FreeBSD, as with all UNIX/Berkeley sockets systems, each socket is actually represented by a set of data structures representing different layers of abstraction. At the top level of struct file, representing a file descriptor. Next down is struct socket, representing a socket. Then the protocol code has struct inpcb, representing a generic IP connection, and struct tcpcb (or struct tcptw once we enter TIMEWAIT), representing a TCP connection. Confusingly, these data structures don't always exist all at once. For example, if you close the file descriptor, freeing struct file, the socket and protocol state may persist for some time until the TCP connection closes (all data has been sent, or various other close modes). One important difference between FreeBSD 6.x and FreeBSD 7.x is that, in FreeBSD 7.x, we've reduced the degree to which these data structures exist in isolation. If you look at the mailing list threads discussing the change, you'll see it described as "strengthening invariants". The most important part of the change was making it an invariant that so->so_pcb, the pointer from the socket to the protocol layer state, always remains stable and valid. This had a number of benefits: because the pointer is always stable, it no longer requires locks to following, lowering overhead and improving parallelism. It also simplifies the code by removing lots of error handling, and improved code stability by avoiding the inevitable bugs associated with complex error handling. If you look at bug reports over the years, we've had quite a few panics reported (and fixed) in which the disappearance of protocol layer state, such as when a connection is reset while still in use by a process, and these are now all believed to be eliminated. So the code is faster, cleaner, and more stable. But there are a few interesting side effects. One is that we retain state at the TCP layer for longer than we used to. Specifically, if a TCP connection closes, the inpcb remains allocated until the file descriptor is closed (i.e., the application notices the connection has closed and invokes close() on the file descriptor). This has a few impacts: one is that TCP connections now appear in netstat in the CLOSED state for longer than before, and another is that open sockets that are associated with CLOSED TCP connections now count against the global resource limit on the number of simultaneous TCP connections. I say "longer than before", but I should be clear that, in practice, assuming all is working properly, there's no measurable behavioral change *except* for improved performance, cleanliness, and stability. This is because applications generally open a socket, run a protocol, and when the protocol wraps up, they then close() the file descriptor in order to close the connection. So, with that introduction, we're interested in resolving: (1) Is this an application bug (leaking file descriptors) that only manifests in 7.x due to changes in kernel state management, leading to the sockets being visible in netstat and counting against the resource limit? (2) Is this a *new* bug in TCP in 7.x, perhaps a result of the state-related changes I've described? (3) Is this an *old* bug in TCP that is only now manifesting because of the changes in kernel state management? The first is the easiest to resolve, as all we need to do is see whether the number of file descriptors for the application goes upwards in an improbable manner. You can use fstat, procstat, sockstat, or various other tools (such as lsof) to see whether the process is leaking file descriptors. You can also instrument your application to keep track of the file descriptor numbers being returned to see whether, perhaps, that number only goes up over time, and gets really big. If it turns out that your application *is* properly closing sockets, then we need to decide if perhaps we're looking at a race in close and state management. In particular, I'll need the output of "netstat -na", "vmstat -z", and "vmstat -m" from the machine once it's in its rather wedged-up state. It would be most helpful if you could actually shut down to single-user mode, killing all user processes, then waiting ten minutes, and capturing the output of those above commands to files that you can then e-mail to me. Without accusing you of having buggy code, I should say that I think there's a reasonable chance that what you're seeing is an interaction between an existing leak of resources in the application and the way the kernel state management has changed. The output from netstat pretty precisely matches that what you'd expect: lots of TCP connections in the CLOSED state reflecting a series of connections built by an application but then not properly discarded. Likewise, when the application is killed, all of the connections go away -- most likely because the file descriptors are all closed, allowing them to be garbage collected and connection state freed. If it is this sort of bug, then most likely you're missing a call to close() in a work loop somewhere, and in some exceptional case, you fall out of the loop without calling close(). If it turns out that you can get to single-user, wait ten minutes to make sure all the connections wind down, and there are still connections visible in netstat, then we may indeed be looking at a kernel bug, and the debugging information using netstat and vmstat will allow us to start to investigate. Robert N M Watson Computer Laboratory University of Cambridge > > netstat -n gives: > ... > tcp4 0 0 1.2.3.4.* 4.5.6.7.42149 CLOSED > tcp4 39 0 1.2.3.4.* 4.5.6.7.54103 CLOSED > tcp4 35 0 1.2.3.4.* 4.5.6.7.41718 CLOSED > tcp4 38 0 1.2.3.4.* 4.5.6.7.55618 CLOSED > tcp4 41 0 1.2.3.4.* 4.5.6.7.44230 CLOSED > tcp4 39 0 1.2.3.4.* 4.5.6.7.49439 CLOSED > ... > > These never go away; they gradually increase and increase until the > application starts giving errors (probably because some socket or > filedescriptor limit is reached). When the application is killed these > entries disappear. > > The application in question is a self written DNS server, multithreaded, and > running fine for years without any troubles on both BSD 5.x as well as 6.x. > Also 32bits as well as 64bits on 6.x. > > Ofcourse that doesn't mean that the application is error free, however, after > doing extensive testing I really can not find anything wrong with the > application itself, so I'm thinking maybe there's a change somewhere that > causes this? I know that tcp/network has been completely redone... > > What basically happens in the application is this: > - one main tcp thread runs an infinite while loop waiting for new > connections to arrive > - as soon as one arrives a new thread is spawned that handles the newly > created stream > - it reads some bytes, writes some bytes, then closes it > - thread exits > > What appears to happen is this: after the new thread is spawned it tries to > read 2 bytes (DNS tcp length information). It gets back 0 bytes (EOF) and > therefore closes the sockets and calls pthread_exit. However in netstat that > same stream oftenly appears to have bytes 'stuck' in the in queue... > > I really can't see how this can cause hanging sockets in 'CLOSED' state. Even > if the incoming queue isnt read entirely a call to close should close it. > Also I really can't find any documentation in netstat, or elsewhere, about > the 'CLOSED' state... > > > Any help would greatly be appreciated! > > > Kind Regards, > > > Ali Niknam > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080625195523.N29013>