Date: Sat, 6 Jun 2009 08:25:11 +0100 (BST) From: Robert Watson <rwatson@FreeBSD.org> To: Barney Cordoba <barney_cordoba@yahoo.com> Cc: net@freebsd.org Subject: Re: panic in sbflush Message-ID: <alpine.BSF.2.00.0906060819020.41475@fledge.watson.org> In-Reply-To: <11451.10207.qm@web63902.mail.re1.yahoo.com> References: <11451.10207.qm@web63902.mail.re1.yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 5 Jun 2009, Barney Cordoba wrote: > I'm getting a panic in sbflush where mbcnt is 0 and sb_mb is not empty. Any > clues as to what might cause this? It happening during a load test. sbflush() panics are typically symptoms of bugs elsewhere in the network stack or kernel, often race conditions. In essence, sbflush() is called when a socket is closed and packets have to be drained from the receive socket buffer. During that draining, we sanity check that the cached length of the data in the socket buffer (sb_cc) matches the actual length of data in the buffer. If sb_cc, sb_mb, or sb_mbcnt is non-zero at the end of the function, we panic. Most of the time, it's a driver race condition where an mbuf has been injected into the stack using ifp->if_input(), but the driver has then modified the mbuf after injection (perhaps by setting a length, clearing a pointer, etc). We had a spate of them after we moved to direct dispatch because the timing changed, leading to packets being processed before the return of if_input() rather than "some time later". Once in a while it's a bug in TCP or socket buffer handling, or in some intermediate encapsulation/decapsulation layer along similar lines to the driver race scenario. I think the most recent case I'm aware of was actually a socket buffer bug, but that's fairly unusual in the history of reports of this panic. There is a kernel debugging option to perform run-time sanity checking of the sockbuf structure so that the corruption is found earlier, called "options SOCKBUF_DEBUG". My experience is that it's good for finding deterministic socket buffer corruption bugs, but that it changes the timing significantly so tends to mask narrow race conditions involving "inject the packet and then change it". Hope that helps, Robert N M Watson Computer Laboratory University of Cambridge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.0906060819020.41475>