From owner-freebsd-current@FreeBSD.ORG Tue Aug 4 15:18:07 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F1A7A106564A for ; Tue, 4 Aug 2009 15:18:07 +0000 (UTC) (envelope-from lstewart@freebsd.org) Received: from lauren.room52.net (lauren.room52.net [210.50.193.198]) by mx1.freebsd.org (Postfix) with ESMTP id 82A858FC19 for ; Tue, 4 Aug 2009 15:18:07 +0000 (UTC) (envelope-from lstewart@freebsd.org) Received: from lstewart-laptop.caia.swin.edu.au (c149.al.cl.cam.ac.uk [128.232.110.149]) (authenticated bits=0) by lauren.room52.net (8.14.3/8.14.3) with ESMTP id n74FHfdu011887 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 5 Aug 2009 01:17:57 +1000 (EST) (envelope-from lstewart@freebsd.org) Message-ID: <4A785110.9060705@freebsd.org> Date: Tue, 04 Aug 2009 16:17:36 +0100 From: Lawrence Stewart User-Agent: Thunderbird 2.0.0.22 (X11/20090722) MIME-Version: 1.0 To: Kamigishi Rei References: <4A6F0A35.7050809@haruhiism.net> <4A724BA1.7050303@haruhiism.net> In-Reply-To: <4A724BA1.7050303@haruhiism.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,SPF_SOFTFAIL autolearn=disabled version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on lauren.room52.net Cc: FreeBSD Current Subject: Re: [follow-up] FreeBSD/amd64 r195146 to r195848, fatal trap 12 under network load X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Aug 2009 15:18:08 -0000 Kamigishi Rei wrote: > Kamigishi Rei wrote: >> Revisions mentioned are those which were tested by me; r195849+ has >> the corruption padded somewhere else so it might produce a panic with >> a different set of options. For reference, my test kernel uses a >> GENERIC config from May 09 snapshot without WITNESS and with >> IPFIREWALL, IPFIREWALL_DEFAULT_TO_ACCEPT and DEVICE_POLLING enabled. > r195981 (latest checkout) traps with the *GENERIC* kernel (with WITNESS > enabled). Same backtrace, same cause, and UP systems are not affected > again. > Apparently, my diagnostics patch from the previous message seems to pad > the corruption somewhere, so I can't use it to check lo_witness or other > fields of nws_mtx at the time when mtx_lock gets corrupted. > > Trap can be triggered with "ping -f -s 65507 localhost", iperf (just > "iperf -c localhost" works for me), or by generating some high-speed > network throughput (even a mysql query over localhost will do as we have > a race here). Running ping will mostly trigger the trap inside > swi_net(); iperf - inside netisr_queue_internal(). > > I will be grateful if someone could provide me some information on how > to further debug it. Currently, I suspect that there's something about > handling modspace (incorrect dereference somewhere, or something like > that). For the benefit of the list, we've finally got this reproduced on a netperf cluster node after much gnashing of teeth. Stay tuned for updates. Cheers, Lawrence