From owner-freebsd-net@FreeBSD.ORG  Tue Jan  3 09:16:04 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7F8661065672;
	Tue,  3 Jan 2012 09:16:04 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 53B408FC08;
	Tue,  3 Jan 2012 09:16:04 +0000 (UTC)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
	by cyrus.watson.org (Postfix) with ESMTPS id E7D6C46B06;
	Tue,  3 Jan 2012 04:16:03 -0500 (EST)
Date: Tue, 3 Jan 2012 09:16:03 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Maxim Sobolev <sobomax@FreeBSD.org>
In-Reply-To: <4EFE5E12.7080103@FreeBSD.org>
Message-ID: <alpine.BSF.2.00.1201030914130.34067@fledge.watson.org>
References: <4EB804D2.2090101@FreeBSD.org>
	<alpine.BSF.2.00.1111071818250.4603@ai.fobar.qr>
	<4EB86276.6080801@sippysoft.com> <4EB86866.9060102@sippysoft.com>
	<alpine.BSF.2.00.1111072324340.4603@ai.fobar.qr>
	<4EB86FCF.3050306@FreeBSD.org>
	<alpine.BSF.2.00.1111080239500.1358@fledge.watson.org>
	<4ECEE6F0.4010301@FreeBSD.org>
	<F63603B1-7B35-4ECE-82E6-835CD91B93F8@FreeBSD.org>
	<4EFE158C.2040705@FreeBSD.org>
	<AB3D0536-CDD7-4595-911C-7C17FE1DFB23@FreeBSD.org>
	<4EFE5B70.9050807@FreeBSD.org> <4EFE5E12.7080103@FreeBSD.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-net@freebsd.org, "Bjoern A. Zeeb" <bz@FreeBSD.ORG>,
	Jack Vogel <jfvogel@gmail.com>
Subject: Re: Panic in the udp_input() under heavy load
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jan 2012 09:16:04 -0000


On Fri, 30 Dec 2011, Maxim Sobolev wrote:

> On 12/30/2011 4:46 PM, Maxim Sobolev wrote:
>> I see. Would you guys mind if I put that NULL pointer check into the code 
>> for the time being and turn it into some kind of big nasty warning in 
>> 8-stable branch only?
>
> I could also open a ticket, put all debug information collected to date in 
> there. And encourage people to report to it once they see this warning on 
> their system. Then it would provide more information about the exposure. It 
> is definitely looks like locking issue somewhere, not just bad luck or flaky 
> hardware, as we see it happening consistently on top 4 most UDP-loaded 
> systems here, and it correlates well with the load. With my small NULL catch 
> the machines have been running happily for a month now, so there is no 
> visible side-effects.

Please do file the PR so that all the information is in one place -- this is a 
network stack hacking week for me, so I should be able to take a closer look.

Could you characterise the traffic load on these boxes a bit more?  Also, is 
there regular monitoring using netstat/bsnmp/etc going on?  I'd like to try 
and identify ways in which this workload differs from other common high-UDP 
workloads being used on 8.x that aren't seeing this problem...

Robert