Date: Mon, 11 Oct 2004 04:13:15 -0400 (EDT) From: Robert Watson <rwatson@freebsd.org> To: Marc UBM Bocklet <ubm@u-boot-man.de> Cc: current@freebsd.org Subject: Re: [BETA7-panic] sodealloc(): so_count 1 Message-ID: <Pine.NEB.3.96L.1041011040958.31040C-100000@fledge.watson.org> In-Reply-To: <20041010124058.58a03dba.ubm@u-boot-man.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 10 Oct 2004, Marc UBM Bocklet wrote: > No, but I can revert the local patches, configure a dump device and try > getting one tomorrow or the day after that. Marc, Afer a couple of days of experimenting and chatting, Brian and I have developed what we hope is a less intrusive but fully functional fix for this problem. I ran it through a barrage of tests yesterday, although I couldn't reproduce the problem originally, and the system still appears to run :-). I've committed the patch to CVS HEAD (6.x), and will merge to 5.x in a few days, and assuming that your testing of the change doesn't reveal that it didn't fix the problem. I have included a copy of the patch committed (minus $FreeBSD$ change) below. If you could give this a spin, it would be much appreciated. Thanks for your (and Vlad's) patience as we worked this out! (And many thanks to Brian for doing so much of the work to fix it). Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research Index: uipc_socket.c =================================================================== RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.212 retrieving revision 1.213 diff -u -r1.212 -r1.213 --- uipc_socket.c 5 Sep 2004 14:33:21 -0000 1.212 +++ uipc_socket.c 11 Oct 2004 08:11:26 -0000 1.213 @@ -316,22 +316,34 @@ return (0); } +/* + * Attempt to free a socket. This should really be sotryfree(). + * + * We free the socket if the protocol is no longer interested in the socket, + * there's no file descriptor reference, and the refcount is 0. While the + * calling macro sotryfree() tests the refcount, sofree() has to test it + * again as it's possible to race with an accept()ing thread if the socket is + * in an listen queue of a listen socket, as being in the listen queue + * doesn't elevate the reference count. sofree() acquires the accept mutex + * early for this test in order to avoid that race. + */ void sofree(so) struct socket *so; { struct socket *head; - KASSERT(so->so_count == 0, ("socket %p so_count not 0", so)); - SOCK_LOCK_ASSERT(so); + SOCK_UNLOCK(so); + ACCEPT_LOCK(); + SOCK_LOCK(so); - if (so->so_pcb != NULL || (so->so_state & SS_NOFDREF) == 0) { + if (so->so_pcb != NULL || (so->so_state & SS_NOFDREF) == 0 || + so->so_count != 0) { SOCK_UNLOCK(so); + ACCEPT_UNLOCK(); return; } - SOCK_UNLOCK(so); - ACCEPT_LOCK(); head = so->so_head; if (head != NULL) { KASSERT((so->so_qstate & SQ_COMP) != 0 || @@ -353,6 +365,7 @@ * the listening socket is closed. */ if ((so->so_qstate & SQ_COMP) != 0) { + SOCK_UNLOCK(so); ACCEPT_UNLOCK(); return; } @@ -365,6 +378,7 @@ (so->so_qstate & SQ_INCOMP) == 0, ("sofree: so_head == NULL, but still SQ_COMP(%d) or SQ_INCOMP(%d)", so->so_qstate & SQ_COMP, so->so_qstate & SQ_INCOMP)); + SOCK_UNLOCK(so); ACCEPT_UNLOCK(); SOCKBUF_LOCK(&so->so_snd); so->so_snd.sb_flags |= SB_NOINTR;
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1041011040958.31040C-100000>