Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Feb 2011 13:23:30 +0300
From:      Gleb Smirnoff <glebius@FreeBSD.org>
To:        Eugene Grosbein <egrosbein@rdtc.ru>
Cc:        Przemyslaw Frasunek <przemyslaw@frasunek.com>, Mike Tancsa <mike@sentex.net>, mav@FreeBSD.org, bz@FreeBSD.org, "net@freebsd.org" <net@FreeBSD.org>, julian@FreeBSD.org
Subject:   Re: Netgraph/mpd5 stability issues
Message-ID:  <20110216102330.GJ42041@glebius.int.ru>
In-Reply-To: <4D5B9309.30508@rdtc.ru>
References:  <20110131144838.GO62007@FreeBSD.org> <4D46F655.9000701@rdtc.ru> <20110131204816.GV62007@glebius.int.ru> <4D5A989E.8020703@sentex.net> <4D5B4F07.6080801@rdtc.ru> <20110216084635.GI42041@glebius.int.ru> <4D5B9309.30508@rdtc.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Feb 16, 2011 at 03:04:09PM +0600, Eugene Grosbein wrote:
E> On 16.02.2011 14:46, Gleb Smirnoff wrote:
E> > On Wed, Feb 16, 2011 at 10:13:59AM +0600, Eugene Grosbein wrote:
E> > E> I run AMD64 with 4GB of memory, lots of memory is free and
E> > E> I still get panics often, sometimes two in a couple of hours.
E> > E> It does not seem memory exhaustion to me. It seems as very low probable race
E> > E> that happens occasionally but may happen any time.
E> > E> 
E> > E> With Gleb's patch, it is obvious that panic happens at moments of user disconnect.
E> > 
E> > I missed: did my patch fix panics in the ng_address_hook(), in this block?
E> > 
E> >         if ((hook == NULL) ||   
E> >             NG_HOOK_NOT_VALID(hook) ||
E> >             NG_HOOK_NOT_VALID(peer = NG_HOOK_PEER(hook)) ||
E> >             NG_NODE_NOT_VALID(peernode = NG_PEER_NODE(hook))) {
E> >                 NG_FREE_ITEM(item);
E> >                 TRAP_ERROR();
E> >                 return (ENETDOWN);
E> >         }
E> 
E> It seems, yes. All my panics now are in _chkhook() being called
E> with bad hook as first argument.

That is because of NETGRAPH_DEBUG, not my patch :(. Unfortunately, we don't have
coredumps and can't tell whether locking the destroy path helped or not.

E> Only one of my panics was unrelated to netgraph, with igmp_change_state() in trace.
E> 
E> > May be there is some kind of memory corruption? May be try memguard(9)?
E> 
E> I can try memguard too, please tell again what setting should I use.

You need to set vm.memguard.desc to a memory type you want to monitor.
You can try for some time (several hours) all netgraph related memory types:

vmstat -m | grep -i netgraph | awk '{print $1}'

E> One more thing: I've noticed my traced show there are plenty of recursive calls,
E> for example (from my letter of 07.02):
...
E> Is it normal, is NETGRAPH protected from such execution flow?

Yes, this is weird. For example kern_sendit() can't call kern_sendit() for sure.
Most other double calls in the trace are weird too.

-- 
Totus tuus, Glebius.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110216102330.GJ42041>