From owner-freebsd-net@FreeBSD.ORG Wed Feb 16 09:04:21 2011 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA16A106566B; Wed, 16 Feb 2011 09:04:21 +0000 (UTC) (envelope-from egrosbein@rdtc.ru) Received: from eg.sd.rdtc.ru (eg.sd.rdtc.ru [62.231.161.221]) by mx1.freebsd.org (Postfix) with ESMTP id 239088FC0A; Wed, 16 Feb 2011 09:04:20 +0000 (UTC) Received: from eg.sd.rdtc.ru (localhost [127.0.0.1]) by eg.sd.rdtc.ru (8.14.4/8.14.4) with ESMTP id p1G94EdI053588; Wed, 16 Feb 2011 15:04:14 +0600 (NOVT) (envelope-from egrosbein@rdtc.ru) Message-ID: <4D5B9309.30508@rdtc.ru> Date: Wed, 16 Feb 2011 15:04:09 +0600 From: Eugene Grosbein User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; ru-RU; rv:1.9.2.13) Gecko/20110112 Thunderbird/3.1.7 MIME-Version: 1.0 To: Gleb Smirnoff References: <20110131144838.GO62007@FreeBSD.org> <4D46F655.9000701@rdtc.ru> <20110131204816.GV62007@glebius.int.ru> <4D5A989E.8020703@sentex.net> <4D5B4F07.6080801@rdtc.ru> <20110216084635.GI42041@glebius.int.ru> In-Reply-To: <20110216084635.GI42041@glebius.int.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: Przemyslaw Frasunek , Mike Tancsa , mav@freebsd.org, bz@freebsd.org, "net@freebsd.org" Subject: Re: Netgraph/mpd5 stability issues X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Feb 2011 09:04:21 -0000 On 16.02.2011 14:46, Gleb Smirnoff wrote: > On Wed, Feb 16, 2011 at 10:13:59AM +0600, Eugene Grosbein wrote: > E> I run AMD64 with 4GB of memory, lots of memory is free and > E> I still get panics often, sometimes two in a couple of hours. > E> It does not seem memory exhaustion to me. It seems as very low probable race > E> that happens occasionally but may happen any time. > E> > E> With Gleb's patch, it is obvious that panic happens at moments of user disconnect. > > I missed: did my patch fix panics in the ng_address_hook(), in this block? > > if ((hook == NULL) || > NG_HOOK_NOT_VALID(hook) || > NG_HOOK_NOT_VALID(peer = NG_HOOK_PEER(hook)) || > NG_NODE_NOT_VALID(peernode = NG_PEER_NODE(hook))) { > NG_FREE_ITEM(item); > TRAP_ERROR(); > return (ENETDOWN); > } It seems, yes. All my panics now are in _chkhook() being called with bad hook as first argument. > All the panics reported by you and Mike recently have traces unrelated > to netgraph, and also traces look weird. No, almost all my panics are related to netgraph, chains are like ip_fastforward() - ng_rmnode_self() - ng_address_hook() - trap sendto() - kern_sendit() - sosend_generic() - ng_parse_get_token() - ... - trap Only one of my panics was unrelated to netgraph, with igmp_change_state() in trace. > May be there is some kind of memory corruption? May be try memguard(9)? I can try memguard too, please tell again what setting should I use. One more thing: I've noticed my traced show there are plenty of recursive calls, for example (from my letter of 07.02): panic: page fault cpuid = 1 KDB: stack backtrace: X_db_sym_numargs() at 0xffffffff801a227a = X_db_sym_numargs+0x15a kdb_backtrace() at 0xffffffff8033d547 = kdb_backtrace+0x37 panic() at 0xffffffff8030b567 = panic+0x187 dblfault_handler() at 0xffffffff804c0ca0 = dblfault_handler+0x330 dblfault_handler() at 0xffffffff804c107f = dblfault_handler+0x70f trap() at 0xffffffff804c155f = trap+0x3df calltrap() at 0xffffffff804a8de4 = calltrap+0x8 --- trap 0xc, rip = 0xffffffff803e4f36, rsp = 0xffffff80ebff7400, rbp = 0xffffff80ebff7430 --- ng_parse_get_token() at 0xffffffff803e4f36 = ng_parse_get_token+0x6596 ng_parse_get_token() at 0xffffffff803e5ccf = ng_parse_get_token+0x732f ng_destroy_hook() at 0xffffffff803d53b2 = ng_destroy_hook+0x222 ng_rmnode() at 0xffffffff803d6118 = ng_rmnode+0xa08 ng_snd_item() at 0xffffffff803d8520 = ng_snd_item+0x3f0 ng_destroy_hook() at 0xffffffff803d52ed = ng_destroy_hook+0x15d ng_rmnode() at 0xffffffff803d57b9 = ng_rmnode+0xa9 ng_rmnode() at 0xffffffff803d7664 = ng_rmnode+0x1f54 ng_snd_item() at 0xffffffff803d8520 = ng_snd_item+0x3f0 ng_parse_get_token() at 0xffffffff803e97fa = ng_parse_get_token+0xae5a sosend_generic() at 0xffffffff80373df6 = sosend_generic+0x436 kern_sendit() at 0xffffffff803776d5 = kern_sendit+0x1a5 kern_sendit() at 0xffffffff8037790c = kern_sendit+0x3dc sendto() at 0xffffffff803779fd = sendto+0x4d syscallenter() at 0xffffffff8034a015 = syscallenter+0x1e5 syscall() at 0xffffffff804c10fb = syscall+0x4b Xfast_syscall() at 0xffffffff804a90c2 = Xfast_syscall+0xe2 --- syscall (133, FreeBSD ELF64, sendto), rip = 0x8018c971c, rsp = 0x7fffffbfeab8, rbp = 0x80203dcc0 --- Uptime: 2d17h1m42s Is it normal, is NETGRAPH protected from such execution flow? Eugene Grosbein