From owner-freebsd-pf@freebsd.org Sun Mar 5 12:43:11 2017 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BD948CF828F for ; Sun, 5 Mar 2017 12:43:11 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from venus.codepro.be (venus.codepro.be [IPv6:2a01:4f8:162:1127::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.codepro.be", Issuer "Gandi Standard SSL CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8B77B15F6 for ; Sun, 5 Mar 2017 12:43:11 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from [172.16.1.189] (s224.GtokyoFL6.vectant.ne.jp [222.228.90.224]) (Authenticated sender: kp) by venus.codepro.be (Postfix) with ESMTPSA id 1BAC01EF5E; Sun, 5 Mar 2017 13:43:07 +0100 (CET) From: "Kristof Provost" To: Ross Cc: freebsd-pf@freebsd.org Subject: Re: sonewconn: pru_attach() failed and kernel panic in PF Date: Sun, 05 Mar 2017 21:42:59 +0900 Message-ID: In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Mailer: MailMate (2.0BETAr6080) X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Mar 2017 12:43:11 -0000 On 27 Feb 2017, at 21:08, Ross wrote: > Hello > > One of my machines panics almost every day. It is always like this: > first > there is a number of messages about "sonewconn: pcb > 0xfffff80085478740: > pru_attach() failed" at the same time and then panic. Here's an > example: > > ... many lines of sonewconn ... > Feb 27 13:41:43 core kernel: sonewconn: pcb 0xfffff8008575bcb0: > pru_attach() failed > Feb 27 13:41:43 core kernel: I wonder if you’re running low on memory by any chance. I think I know why you’re crashing, but I suspect your root problem is that you’re running low on memory and that’s why you’re seeing the pru_attach() failures, and eventually running into the pf panic. > Feb 27 13:41:43 core kernel: KDB: stack backtrace: > Feb 27 13:41:43 core kernel: #0 0xffffffff80b312c7 at > kdb_backtrace+0x67 > Feb 27 13:41:43 core kernel: #1 0xffffffff80ae5c92 at vpanic+0x182 > Feb 27 13:41:43 core kernel: #2 0xffffffff80ae5b03 at panic+0x43 > Feb 27 13:41:43 core kernel: #3 0xffffffff80fd6d51 at trap_fatal+0x351 > Feb 27 13:41:43 core kernel: #4 0xffffffff80fd6f43 at > trap_pfault+0x1e3 > Feb 27 13:41:43 core kernel: #5 0xffffffff80fd64ec at trap+0x26c > Feb 27 13:41:43 core kernel: #6 0xffffffff80fb9d61 at calltrap+0x8 > Feb 27 13:41:43 core kernel: #7 0xffffffff80e4185e at > uma_zfree_arg+0x4fe > Feb 27 13:41:43 core kernel: #8 0xffffffff82442165 at > pf_get_translation+0x2c5 There’s only a couple of calls to uma_zfree() in pf_get_translations(). These are: * uma_zfree(V_pf_state_key_z, skp); * uma_zfree(V_pf_state_key_z, *nkp); * uma_zfree(V_pf_state_key_z, *skp); Going by the inconsistent pointer use the first one is rather suspect. Looking a bit deeper, pf_get_translation() is only called from one place, and it always passes stack variables for skp and nkp, so the first call ends up trying to free that, which won’t work too well. That’s a bug (and I’ll fix it), but you’re only running into it because pf_state_key_clone() returned NULL, which will only happen under memory pressure. > What should I do to fix it? > You’ll need to look at your system and figure out who’s running away with all of your memory. Regards, Kristof