From owner-freebsd-current@FreeBSD.ORG Wed Jun 5 10:16:00 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 700A538A for ; Wed, 5 Jun 2013 10:16:00 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) by mx1.freebsd.org (Postfix) with ESMTP id F04BC1FB6 for ; Wed, 5 Jun 2013 10:15:59 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.6/8.14.6) with ESMTP id r55AFu0w099626; Wed, 5 Jun 2013 14:15:56 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.6/8.14.6/Submit) id r55AFudU099625; Wed, 5 Jun 2013 14:15:56 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Wed, 5 Jun 2013 14:15:56 +0400 From: Gleb Smirnoff To: Konstantin Belousov Subject: Re: Recurring panic Message-ID: <20130605101556.GD67170@glebius.int.ru> References: <20130605095043.GB67170@glebius.int.ru> <20130605101345.GY3047@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <20130605101345.GY3047@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Ian FREISLICH , current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jun 2013 10:16:00 -0000 On Wed, Jun 05, 2013 at 01:13:45PM +0300, Konstantin Belousov wrote: K> On Wed, Jun 05, 2013 at 01:50:43PM +0400, Gleb Smirnoff wrote: K> > On Wed, Jun 05, 2013 at 10:18:21AM +0200, Ian FREISLICH wrote: K> > I> I have the following recurring panic on all my heavily network K> > I> loaded -CURRENT routers. The current process is always different. K> > I> K> > I> Gleb, can you please chime in with what you've managed to uncover. K> > K> > The panics appear on selfd mutex. The mtx_lock value is a free mutex, but K> > it has 1 extra bit set: K> > K> > (kgdb) p/x sfp->sf_mtx->mtx_lock K> > $3 = 0x1000004 K> > K> > Rarely (only one panic observed) more than one bit is set: K> > K> > $3 = 0x21000004 K> > K> > It is important that selfd mutexes are taken from mtxpool(9), which K> > is allocated at a early boot stage. Thus, across reboots all possible K> > sfp->sf_mtx mutexes usually fall into the same virtual memory region. K> > I'm not sure, but I suppose, they fall into same physical region. K> > K> > This can lead one to idea that RAM in the box has problems. But it K> > is running ECC memory, and it doesn't experience other random panics. K> > K> > The only special about the box is that it is running pf(4) with huge K> > ruleset and a lot of traffic. So the pf(4) is the number one suspected, K> > albeit it isn't closely related to selfds. K> > K> So is the virtual address of the corrupted word same for each panic ? K> If yes, set up the hw watchpoint in ddb. Nope, they are different, but close to each other, since live in the same mtxpool. -- Totus tuus, Glebius.