Date: Wed, 16 Feb 2011 13:23:30 +0300 From: Gleb Smirnoff <glebius@FreeBSD.org> To: Eugene Grosbein <egrosbein@rdtc.ru> Cc: Przemyslaw Frasunek <przemyslaw@frasunek.com>, Mike Tancsa <mike@sentex.net>, mav@FreeBSD.org, bz@FreeBSD.org, "net@freebsd.org" <net@FreeBSD.org>, julian@FreeBSD.org Subject: Re: Netgraph/mpd5 stability issues Message-ID: <20110216102330.GJ42041@glebius.int.ru> In-Reply-To: <4D5B9309.30508@rdtc.ru> References: <20110131144838.GO62007@FreeBSD.org> <4D46F655.9000701@rdtc.ru> <20110131204816.GV62007@glebius.int.ru> <4D5A989E.8020703@sentex.net> <4D5B4F07.6080801@rdtc.ru> <20110216084635.GI42041@glebius.int.ru> <4D5B9309.30508@rdtc.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Feb 16, 2011 at 03:04:09PM +0600, Eugene Grosbein wrote:
E> On 16.02.2011 14:46, Gleb Smirnoff wrote:
E> > On Wed, Feb 16, 2011 at 10:13:59AM +0600, Eugene Grosbein wrote:
E> > E> I run AMD64 with 4GB of memory, lots of memory is free and
E> > E> I still get panics often, sometimes two in a couple of hours.
E> > E> It does not seem memory exhaustion to me. It seems as very low probable race
E> > E> that happens occasionally but may happen any time.
E> > E>
E> > E> With Gleb's patch, it is obvious that panic happens at moments of user disconnect.
E> >
E> > I missed: did my patch fix panics in the ng_address_hook(), in this block?
E> >
E> > if ((hook == NULL) ||
E> > NG_HOOK_NOT_VALID(hook) ||
E> > NG_HOOK_NOT_VALID(peer = NG_HOOK_PEER(hook)) ||
E> > NG_NODE_NOT_VALID(peernode = NG_PEER_NODE(hook))) {
E> > NG_FREE_ITEM(item);
E> > TRAP_ERROR();
E> > return (ENETDOWN);
E> > }
E>
E> It seems, yes. All my panics now are in _chkhook() being called
E> with bad hook as first argument.
That is because of NETGRAPH_DEBUG, not my patch :(. Unfortunately, we don't have
coredumps and can't tell whether locking the destroy path helped or not.
E> Only one of my panics was unrelated to netgraph, with igmp_change_state() in trace.
E>
E> > May be there is some kind of memory corruption? May be try memguard(9)?
E>
E> I can try memguard too, please tell again what setting should I use.
You need to set vm.memguard.desc to a memory type you want to monitor.
You can try for some time (several hours) all netgraph related memory types:
vmstat -m | grep -i netgraph | awk '{print $1}'
E> One more thing: I've noticed my traced show there are plenty of recursive calls,
E> for example (from my letter of 07.02):
...
E> Is it normal, is NETGRAPH protected from such execution flow?
Yes, this is weird. For example kern_sendit() can't call kern_sendit() for sure.
Most other double calls in the trace are weird too.
--
Totus tuus, Glebius.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110216102330.GJ42041>
