From owner-freebsd-net@FreeBSD.ORG Wed Feb 16 10:54:41 2011 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E21C6106564A for ; Wed, 16 Feb 2011 10:54:41 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.64.117]) by mx1.freebsd.org (Postfix) with ESMTP id 4D5F88FC08 for ; Wed, 16 Feb 2011 10:54:40 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.4/8.14.4) with ESMTP id p1GANV5I087658; Wed, 16 Feb 2011 13:23:31 +0300 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.4/8.14.4/Submit) id p1GANVC5087657; Wed, 16 Feb 2011 13:23:31 +0300 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Wed, 16 Feb 2011 13:23:30 +0300 From: Gleb Smirnoff To: Eugene Grosbein Message-ID: <20110216102330.GJ42041@glebius.int.ru> References: <20110131144838.GO62007@FreeBSD.org> <4D46F655.9000701@rdtc.ru> <20110131204816.GV62007@glebius.int.ru> <4D5A989E.8020703@sentex.net> <4D5B4F07.6080801@rdtc.ru> <20110216084635.GI42041@glebius.int.ru> <4D5B9309.30508@rdtc.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <4D5B9309.30508@rdtc.ru> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Przemyslaw Frasunek , Mike Tancsa , mav@FreeBSD.org, bz@FreeBSD.org, "net@freebsd.org" , julian@FreeBSD.org Subject: Re: Netgraph/mpd5 stability issues X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Feb 2011 10:54:42 -0000 On Wed, Feb 16, 2011 at 03:04:09PM +0600, Eugene Grosbein wrote: E> On 16.02.2011 14:46, Gleb Smirnoff wrote: E> > On Wed, Feb 16, 2011 at 10:13:59AM +0600, Eugene Grosbein wrote: E> > E> I run AMD64 with 4GB of memory, lots of memory is free and E> > E> I still get panics often, sometimes two in a couple of hours. E> > E> It does not seem memory exhaustion to me. It seems as very low probable race E> > E> that happens occasionally but may happen any time. E> > E> E> > E> With Gleb's patch, it is obvious that panic happens at moments of user disconnect. E> > E> > I missed: did my patch fix panics in the ng_address_hook(), in this block? E> > E> > if ((hook == NULL) || E> > NG_HOOK_NOT_VALID(hook) || E> > NG_HOOK_NOT_VALID(peer = NG_HOOK_PEER(hook)) || E> > NG_NODE_NOT_VALID(peernode = NG_PEER_NODE(hook))) { E> > NG_FREE_ITEM(item); E> > TRAP_ERROR(); E> > return (ENETDOWN); E> > } E> E> It seems, yes. All my panics now are in _chkhook() being called E> with bad hook as first argument. That is because of NETGRAPH_DEBUG, not my patch :(. Unfortunately, we don't have coredumps and can't tell whether locking the destroy path helped or not. E> Only one of my panics was unrelated to netgraph, with igmp_change_state() in trace. E> E> > May be there is some kind of memory corruption? May be try memguard(9)? E> E> I can try memguard too, please tell again what setting should I use. You need to set vm.memguard.desc to a memory type you want to monitor. You can try for some time (several hours) all netgraph related memory types: vmstat -m | grep -i netgraph | awk '{print $1}' E> One more thing: I've noticed my traced show there are plenty of recursive calls, E> for example (from my letter of 07.02): ... E> Is it normal, is NETGRAPH protected from such execution flow? Yes, this is weird. For example kern_sendit() can't call kern_sendit() for sure. Most other double calls in the trace are weird too. -- Totus tuus, Glebius.