From owner-freebsd-stable@freebsd.org Fri Nov 20 10:30:42 2020 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 6D87C2C4CA2 for ; Fri, 20 Nov 2020 10:30:42 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Cct9Z2nSyz4sFM; Fri, 20 Nov 2020 10:30:42 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from venus.codepro.be (venus.codepro.be [5.9.86.228]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx1.codepro.be", Issuer "Let's Encrypt Authority X3" (verified OK)) (Authenticated sender: kp) by smtp.freebsd.org (Postfix) with ESMTPSA id 24A0DD359; Fri, 20 Nov 2020 10:30:42 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: by venus.codepro.be (Postfix, authenticated sender kp) id CAF6F4AE6C; Fri, 20 Nov 2020 11:30:38 +0100 (CET) From: "Kristof Provost" To: peter.blok@bsd4all.org Cc: "FreeBSD Stable" Subject: Re: Commit 367705+367706 causes a pabic Date: Fri, 20 Nov 2020 11:30:37 +0100 X-Mailer: MailMate (1.13.2r5673) Message-ID: <1753B4A3-2FFC-47A5-9D0C-DC0B71BA22E8@FreeBSD.org> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Nov 2020 10:30:42 -0000 On 20 Nov 2020, at 11:18, peter.blok@bsd4all.org wrote: > I’m afraid the last Epoch fix for bridge is not solving the problem > ( or perhaps creates a new ). > We’re talking about the stable/12 branch, right? > This seems to happen when the jail epair is added to the bridge. > There must be something more to it than that. I’ve run the bridge tests on stable/12 without issue, and this is a problem we didn’t see when the bridge epochification initially went into stable/12. Do you have a custom kernel config? Other patches? What exact commands do you run to trigger the panic? > kernel trap 12 with interrupts disabled > > > Fatal trap 12: page fault while in kernel mode > cpuid = 6; apic id = 06 > fault virtual address = 0xc10 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff80695e76 > stack pointer = 0x28:0xfffffe00bf14e6e0 > frame pointer = 0x28:0xfffffe00bf14e720 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = resume, IOPL = 0 > current process = 1686 (jail) > trap number = 12 > panic: page fault > cpuid = 6 > time = 1605811310 > KDB: stack backtrace: > #0 0xffffffff8069bb85 at kdb_backtrace+0x65 > #1 0xffffffff80650a4b at vpanic+0x17b > #2 0xffffffff806508c3 at panic+0x43 > #3 0xffffffff809d0351 at trap_fatal+0x391 > #4 0xffffffff809d03af at trap_pfault+0x4f > #5 0xffffffff809cf9f6 at trap+0x286 > #6 0xffffffff809a98c8 at calltrap+0x8 > #7 0xffffffff80368a8d at ck_epoch_synchronize_wait+0x8d > #8 0xffffffff80695c8a at epoch_wait_preempt+0xaa > #9 0xffffffff80757d40 at vnet_if_init+0x120 > #10 0xffffffff8078c994 at vnet_alloc+0x114 > #11 0xffffffff8061e3f7 at kern_jail_set+0x1bb7 > #12 0xffffffff80620190 at sys_jail_set+0x40 > #13 0xffffffff809d0f07 at amd64_syscall+0x387 > #14 0xffffffff809aa1ee at fast_syscall_common+0xf8 This panic is rather odd. This isn’t even the bridge code. This is during initial creation of the vnet. I don’t really see how this could even trigger panics. That panic looks as if something corrupted the net_epoch_preempt, by overwriting the epoch->e_epoch. The bridge patches only access this variable through the well-established functions and macros. I see no obvious way that they could corrupt it. Best regards, Kristof