Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 26 Mar 2018 06:35:29 -0700
From:      Mark Millard <marklmi26-fbsd@yahoo.com>
To:        Mark Johnston <markj@FreeBSD.org>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: head -r331499 amd64/threadripper panic in vm_page_free_prep during "poudriere bulk -a", after 14h 22m or so.
Message-ID:  <08B7C130-A38D-473A-8A73-CA79ED1A0044@yahoo.com>
In-Reply-To: <45B4FCDA-C743-4F35-B819-9CB064C20038@yahoo.com>
References:  <8D9C49CB-957E-40A5-8EB0-D90D8AC02060@yahoo.com> <20180325183421.GA74365@raichu> <44821CA4-19C2-4265-8E83-568452DF6471@yahoo.com> <20180325200934.GC74365@raichu> <45B4FCDA-C743-4F35-B819-9CB064C20038@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[Unfortunately, I'd not be able to get back to this
for many hours. I do not want to leave the machine
at the db> prompt that long. So this is all there
will be.]

It  got a different crash last night, after a little over 12
hours of poudriere bulk -a activity, again while I was
sleeping. Hand typed:

kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 13; apic id = 0d
fault virtual address = 0x20
fault code            = supervisor read data, page not present
instruction pointer   = 0x20:0xffffffff80b70867
stack pointer         = 0x28:0xfffffe00ebab8880
frame pointer         = 0x28:0xfffffe00ebab8890
code segment          = base 0x0, limit 0xfffff, type 0x1b
                      = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags      = resume, IOPL = 0
current process       = 44 (dom0)
[ thread pid 44 tid 100277 ]
Stopped at turnstile_broadcast+0x47: movq 0x20(%rbx,%rax,1),%rcx

(So an offset from a null pointer, apparently.)

bt shows:

Tracing pid 44 tid 100277 td 0xfffff8010f938560
turnstile_broadcast() at turnstile_broadcast+0x47/frame 0xfffffe00ebab8890
__mtx_unlock_sleep() at __mtx_unlock_sleep+0xb9/frame 0xfffffe00ebab88c0
vm_pageout_page_lock() at vm_pageout_page_lock+0x179/frame 0xfffffe00ebab8960
vm_pageout_worker() at vm_pageout_worker+0xd3a/frame 0xfffffe00ebab8a50
vm_pageout() at vm_pageout+0x133/frame 0xfffffe00ebab8a70
fork_exit() at fork_exit+0x83/frame 0xfffffe00ebab8ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ebab8ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Dump again failed, the same way but with some byte
value differences.

(da1:strovsc1:0:0:0) WRITE(10). CDB 2a 00 35 39 8c c7 00 00 08 00
(da1:storvsc1:0:0:0) CAM status Command timeout
(da1:storvsc1:0:0:0) Error 5, Retries exhausted
Aborting dump to to I/O error.

** DUMP FAILED (ERROR 5) **
Cannot dump: unknown error (error=5)

So this appears to be repeatable (for the Optane
swap/page partition?).

show reg:

cs 0x20
ds 0x3b ll+0x1a
es 0x3b ll+0x1a
fs 0x13
gs 0x1b
ss 0x28 ll+0x7
rax 0
rcx 0xfffff8010f938501
rdx 0xfffff8010f938501
rbx 0xfffffe00ebab8880
rsp 0xfffffe00ebab8800
rsi 0
rdi 0
r8  0
r9  0
r10 0
r11 0
r12 0
r13 0xfffff8010f938560
r14 0
r15 0xffffffff81d67998 vm_dom+0x18
rip 0xffffffff80b70867 turnstile_broadcast+0x47
rflags 0x10056
turnstile_broadcast+0x47: movq 0x20(%rbx,%rax,1),%rcx

Around where rbx points:

0xfffffe00ebab8872: ab eb 0  fe ff ff 28 0  0  0  0  0  0  0
0xfffffe00ebab8880: 0  0  0  0  0  0  0  0  80 79 d6 81 ff ff
0xfffffe00ebab888e: ff ff c0 88 ab eb 0  fe ff ff 9  20 af 80
0xfffffe00ebab889c: ff ff ff ff 0  7b 2  d8 f  f8 ff ff 98 79

And it looks like we have that null pointer above.

And I'm afraid that is it: I need to be off doing other things.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?08B7C130-A38D-473A-8A73-CA79ED1A0044>