From owner-freebsd-net@FreeBSD.ORG Tue Jan 3 09:16:04 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7F8661065672; Tue, 3 Jan 2012 09:16:04 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 53B408FC08; Tue, 3 Jan 2012 09:16:04 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id E7D6C46B06; Tue, 3 Jan 2012 04:16:03 -0500 (EST) Date: Tue, 3 Jan 2012 09:16:03 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Maxim Sobolev In-Reply-To: <4EFE5E12.7080103@FreeBSD.org> Message-ID: References: <4EB804D2.2090101@FreeBSD.org> <4EB86276.6080801@sippysoft.com> <4EB86866.9060102@sippysoft.com> <4EB86FCF.3050306@FreeBSD.org> <4ECEE6F0.4010301@FreeBSD.org> <4EFE158C.2040705@FreeBSD.org> <4EFE5B70.9050807@FreeBSD.org> <4EFE5E12.7080103@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, "Bjoern A. Zeeb" , Jack Vogel Subject: Re: Panic in the udp_input() under heavy load X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jan 2012 09:16:04 -0000 On Fri, 30 Dec 2011, Maxim Sobolev wrote: > On 12/30/2011 4:46 PM, Maxim Sobolev wrote: >> I see. Would you guys mind if I put that NULL pointer check into the code >> for the time being and turn it into some kind of big nasty warning in >> 8-stable branch only? > > I could also open a ticket, put all debug information collected to date in > there. And encourage people to report to it once they see this warning on > their system. Then it would provide more information about the exposure. It > is definitely looks like locking issue somewhere, not just bad luck or flaky > hardware, as we see it happening consistently on top 4 most UDP-loaded > systems here, and it correlates well with the load. With my small NULL catch > the machines have been running happily for a month now, so there is no > visible side-effects. Please do file the PR so that all the information is in one place -- this is a network stack hacking week for me, so I should be able to take a closer look. Could you characterise the traffic load on these boxes a bit more? Also, is there regular monitoring using netstat/bsnmp/etc going on? I'd like to try and identify ways in which this workload differs from other common high-UDP workloads being used on 8.x that aren't seeing this problem... Robert