Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Nov 2020 11:30:37 +0100
From:      "Kristof Provost" <kp@FreeBSD.org>
To:        peter.blok@bsd4all.org
Cc:        "FreeBSD Stable" <freebsd-stable@freebsd.org>
Subject:   Re: Commit 367705+367706 causes a pabic
Message-ID:  <1753B4A3-2FFC-47A5-9D0C-DC0B71BA22E8@FreeBSD.org>
In-Reply-To: <CD3B0F62-3790-4C63-A92C-9694256823CD@bsd4all.org>
References:  <CD3B0F62-3790-4C63-A92C-9694256823CD@bsd4all.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 20 Nov 2020, at 11:18, peter.blok@bsd4all.org wrote:
> I’m afraid the last Epoch fix for bridge is not solving the problem 
> ( or perhaps creates a new ).
>
We’re talking about the stable/12 branch, right?

> This seems to happen when the jail epair is added to the bridge.
>
There must be something more to it than that. I’ve run the bridge 
tests on stable/12 without issue, and this is a problem we didn’t see 
when the bridge epochification initially went into stable/12.

Do you have a custom kernel config? Other patches? What exact commands 
do you run to trigger the panic?

> kernel trap 12 with interrupts disabled
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 6; apic id = 06
> fault virtual address	= 0xc10
> fault code		= supervisor read data, page not present
> instruction pointer	= 0x20:0xffffffff80695e76
> stack pointer	        = 0x28:0xfffffe00bf14e6e0
> frame pointer	        = 0x28:0xfffffe00bf14e720
> code segment		= base 0x0, limit 0xfffff, type 0x1b
> 			= DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags	= resume, IOPL = 0
> current process		= 1686 (jail)
> trap number		= 12
> panic: page fault
> cpuid = 6
> time = 1605811310
> KDB: stack backtrace:
> #0 0xffffffff8069bb85 at kdb_backtrace+0x65
> #1 0xffffffff80650a4b at vpanic+0x17b
> #2 0xffffffff806508c3 at panic+0x43
> #3 0xffffffff809d0351 at trap_fatal+0x391
> #4 0xffffffff809d03af at trap_pfault+0x4f
> #5 0xffffffff809cf9f6 at trap+0x286
> #6 0xffffffff809a98c8 at calltrap+0x8
> #7 0xffffffff80368a8d at ck_epoch_synchronize_wait+0x8d
> #8 0xffffffff80695c8a at epoch_wait_preempt+0xaa
> #9 0xffffffff80757d40 at vnet_if_init+0x120
> #10 0xffffffff8078c994 at vnet_alloc+0x114
> #11 0xffffffff8061e3f7 at kern_jail_set+0x1bb7
> #12 0xffffffff80620190 at sys_jail_set+0x40
> #13 0xffffffff809d0f07 at amd64_syscall+0x387
> #14 0xffffffff809aa1ee at fast_syscall_common+0xf8

This panic is rather odd. This isn’t even the bridge code. This is 
during initial creation of the vnet. I don’t really see how this could 
even trigger panics.
That panic looks as if something corrupted the net_epoch_preempt, by 
overwriting the epoch->e_epoch. The bridge patches only access this 
variable through the well-established functions and macros. I see no 
obvious way that they could corrupt it.

Best regards,
Kristof



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1753B4A3-2FFC-47A5-9D0C-DC0B71BA22E8>