Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 25 Aug 2019 17:30:34 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Rebecca Cran <rebecca@bsdio.com>, markj@freebsd.org
Cc:        FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: Panic on boot with r351461 (AMD ThreadRipper 2990WX)
Message-ID:  <20190825143034.GO71821@kib.kiev.ua>
In-Reply-To: <9e94aea8-7d63-0f9e-2f1e-c1492e9dc455@bsdio.com>
References:  <6e5687b2-ab3f-a570-37ab-72c8a9776167@bsdio.com> <20190824203305.GF71821@kib.kiev.ua> <d7200dbc-62b3-fd86-ca61-32d559987338@bsdio.com> <20190824230801.GK71821@kib.kiev.ua> <f15ba651-28ef-d9db-3646-ab8cb49b3d18@bsdio.com> <20190825062407.GL71821@kib.kiev.ua> <9e94aea8-7d63-0f9e-2f1e-c1492e9dc455@bsdio.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Aug 25, 2019 at 07:17:20AM -0600, Rebecca Cran wrote:
> On 2019-08-25 00:24, Konstantin Belousov wrote:
> > What are the panic messages ?
> 
> Fatal trap 18: integer divide fault while in kernel mode
> 
> instruction pointer = 0x20:0xffffffff80f1027c
> 
> stack pointer = 0x28:0xffffffff845809f0
> 
> frame pointer = 0x28:0xffffffff84580a00
> 
> code segment = base 0x0, limit 0xffffff, type 0x1b
> 
>     = DPL 0, pres 1, long 1, def32 0, gran 1
> 
> processor eflags = resume, IOPL = 0
> 
> current process = 0 ()
> 
> trap number = 18
> 
> panic: integer divide fault
> 
> cpuid = 0
> 
> time = 1
> 
> 
> > What is the source line ?
> 
> (gdb) info line *0xffffffff80f1027c
> Line 102 of "/usr/src/sys/vm/vm_domainset.c" starts at address
> 0xffffffff80f10267 <vm_domainset_iter_first+151>
>    and ends at 0xffffffff80f1027f <vm_domainset_iter_first+175>.

There was one more source line I asked about.

So what happens, IMO, is that for memory-less domains ds_cnt is zero
because ds_mask is zero, which causes the exception on divide.  You
can try the following combined patch, but I really dislike the fact
that I cannot safely use DOMAINSET_FIXED (if my diagnosis is correct).

I would prefer for kmem_malloc_domainset(DOMAINSET_FIXED(unpopulated domain))
to fail with NULL result, and then I would manually fall-back to
DOMAINSET_PREF().

OTOH, I think the chunk for mp_realloc_cpu() is the final fix.

diff --git a/sys/amd64/amd64/mp_machdep.c b/sys/amd64/amd64/mp_machdep.c
index b38c688f8b4..2c3dc8744f6 100644
--- a/sys/amd64/amd64/mp_machdep.c
+++ b/sys/amd64/amd64/mp_machdep.c
@@ -402,6 +402,8 @@ mp_realloc_pcpu(int cpuid, int domain)
 		return;
 	m = vm_page_alloc_domain(NULL, 0, domain,
 	    VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ);
+	if (m == NULL)
+		return;
 	na = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m));
 	pagecopy((void *)oa, (void *)na);
 	pmap_qenter((vm_offset_t)&__pcpu[cpuid], &m, 1);
@@ -481,10 +483,10 @@ native_start_all_aps(void)
 		    M_ZERO);
 		mce_stack = (char *)kmem_malloc(PAGE_SIZE, M_WAITOK | M_ZERO);
 		nmi_stack = (char *)kmem_malloc_domainset(
-		    DOMAINSET_FIXED(domain), PAGE_SIZE, M_WAITOK | M_ZERO);
+		    DOMAINSET_PREF(domain), PAGE_SIZE, M_WAITOK | M_ZERO);
 		dbg_stack = (char *)kmem_malloc_domainset(
-		    DOMAINSET_FIXED(domain), PAGE_SIZE, M_WAITOK | M_ZERO);
-		dpcpu = (void *)kmem_malloc_domainset(DOMAINSET_FIXED(domain),
+		    DOMAINSET_PREF(domain), PAGE_SIZE, M_WAITOK | M_ZERO);
+		dpcpu = (void *)kmem_malloc_domainset(DOMAINSET_PREF(domain),
 		    DPCPU_SIZE, M_WAITOK | M_ZERO);
 
 		bootSTK = (char *)bootstacks[cpu] +



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190825143034.GO71821>