Date: Fri, 20 Nov 2020 11:30:37 +0100 From: "Kristof Provost" <kp@FreeBSD.org> To: peter.blok@bsd4all.org Cc: "FreeBSD Stable" <freebsd-stable@freebsd.org> Subject: Re: Commit 367705+367706 causes a pabic Message-ID: <1753B4A3-2FFC-47A5-9D0C-DC0B71BA22E8@FreeBSD.org> In-Reply-To: <CD3B0F62-3790-4C63-A92C-9694256823CD@bsd4all.org> References: <CD3B0F62-3790-4C63-A92C-9694256823CD@bsd4all.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 20 Nov 2020, at 11:18, peter.blok@bsd4all.org wrote: > I’m afraid the last Epoch fix for bridge is not solving the problem > ( or perhaps creates a new ). > We’re talking about the stable/12 branch, right? > This seems to happen when the jail epair is added to the bridge. > There must be something more to it than that. I’ve run the bridge tests on stable/12 without issue, and this is a problem we didn’t see when the bridge epochification initially went into stable/12. Do you have a custom kernel config? Other patches? What exact commands do you run to trigger the panic? > kernel trap 12 with interrupts disabled > > > Fatal trap 12: page fault while in kernel mode > cpuid = 6; apic id = 06 > fault virtual address = 0xc10 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff80695e76 > stack pointer = 0x28:0xfffffe00bf14e6e0 > frame pointer = 0x28:0xfffffe00bf14e720 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = resume, IOPL = 0 > current process = 1686 (jail) > trap number = 12 > panic: page fault > cpuid = 6 > time = 1605811310 > KDB: stack backtrace: > #0 0xffffffff8069bb85 at kdb_backtrace+0x65 > #1 0xffffffff80650a4b at vpanic+0x17b > #2 0xffffffff806508c3 at panic+0x43 > #3 0xffffffff809d0351 at trap_fatal+0x391 > #4 0xffffffff809d03af at trap_pfault+0x4f > #5 0xffffffff809cf9f6 at trap+0x286 > #6 0xffffffff809a98c8 at calltrap+0x8 > #7 0xffffffff80368a8d at ck_epoch_synchronize_wait+0x8d > #8 0xffffffff80695c8a at epoch_wait_preempt+0xaa > #9 0xffffffff80757d40 at vnet_if_init+0x120 > #10 0xffffffff8078c994 at vnet_alloc+0x114 > #11 0xffffffff8061e3f7 at kern_jail_set+0x1bb7 > #12 0xffffffff80620190 at sys_jail_set+0x40 > #13 0xffffffff809d0f07 at amd64_syscall+0x387 > #14 0xffffffff809aa1ee at fast_syscall_common+0xf8 This panic is rather odd. This isn’t even the bridge code. This is during initial creation of the vnet. I don’t really see how this could even trigger panics. That panic looks as if something corrupted the net_epoch_preempt, by overwriting the epoch->e_epoch. The bridge patches only access this variable through the well-established functions and macros. I see no obvious way that they could corrupt it. Best regards, Kristof
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1753B4A3-2FFC-47A5-9D0C-DC0B71BA22E8>