Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Jan 2012 09:16:03 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Maxim Sobolev <sobomax@FreeBSD.org>
Cc:        freebsd-net@freebsd.org, "Bjoern A. Zeeb" <bz@FreeBSD.ORG>, Jack Vogel <jfvogel@gmail.com>
Subject:   Re: Panic in the udp_input() under heavy load
Message-ID:  <alpine.BSF.2.00.1201030914130.34067@fledge.watson.org>
In-Reply-To: <4EFE5E12.7080103@FreeBSD.org>
References:  <4EB804D2.2090101@FreeBSD.org> <alpine.BSF.2.00.1111071818250.4603@ai.fobar.qr> <4EB86276.6080801@sippysoft.com> <4EB86866.9060102@sippysoft.com> <alpine.BSF.2.00.1111072324340.4603@ai.fobar.qr> <4EB86FCF.3050306@FreeBSD.org> <alpine.BSF.2.00.1111080239500.1358@fledge.watson.org> <4ECEE6F0.4010301@FreeBSD.org> <F63603B1-7B35-4ECE-82E6-835CD91B93F8@FreeBSD.org> <4EFE158C.2040705@FreeBSD.org> <AB3D0536-CDD7-4595-911C-7C17FE1DFB23@FreeBSD.org> <4EFE5B70.9050807@FreeBSD.org> <4EFE5E12.7080103@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Fri, 30 Dec 2011, Maxim Sobolev wrote:

> On 12/30/2011 4:46 PM, Maxim Sobolev wrote:
>> I see. Would you guys mind if I put that NULL pointer check into the code 
>> for the time being and turn it into some kind of big nasty warning in 
>> 8-stable branch only?
>
> I could also open a ticket, put all debug information collected to date in 
> there. And encourage people to report to it once they see this warning on 
> their system. Then it would provide more information about the exposure. It 
> is definitely looks like locking issue somewhere, not just bad luck or flaky 
> hardware, as we see it happening consistently on top 4 most UDP-loaded 
> systems here, and it correlates well with the load. With my small NULL catch 
> the machines have been running happily for a month now, so there is no 
> visible side-effects.

Please do file the PR so that all the information is in one place -- this is a 
network stack hacking week for me, so I should be able to take a closer look.

Could you characterise the traffic load on these boxes a bit more?  Also, is 
there regular monitoring using netstat/bsnmp/etc going on?  I'd like to try 
and identify ways in which this workload differs from other common high-UDP 
workloads being used on 8.x that aren't seeing this problem...

Robert



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1201030914130.34067>