Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Oct 2015 18:10:34 +0200
From:      Frank Razenberg <frank@zzattack.org>
To:        Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@freebsd.org
Subject:   Re: 10.2-STABLE amd64 panic: page fault while in kernel mode
Message-ID:  <561E7E7A.1080600@zzattack.org>
In-Reply-To: <20151014144217.GV2257@kib.kiev.ua>
References:  <561E5E2F.90404@zzattack.org> <20151014144217.GV2257@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Thanks for looking into this.

On 10/14/2015 4:42 PM, Konstantin Belousov wrote:
> On Wed, Oct 14, 2015 at 03:52:47PM +0200, Frank Razenberg wrote:
>> After upgrading from 9.2 to 10.1 I first started noticing panics. They
>> occurred roughly weekly and since this storage machine isn't frequently
>> used I didn't look into it much further. After updating for 10.2-STABLE
>> the panics have gone from weekly to daily.
>> The machine has 32GB of non-registered ECC DDR3-1066 RAM. There's also a
>> 10-disk raidz2 pool. I've ran memtest86+ for 72 hours straight with no
>> errors.
>>
>> Crash dumps all feature the following:
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 2; apic id = 12
>> fault virtual address   = 0x1d1c0bec0
>> fault code              = supervisor read data, page not present
>> instruction pointer     = 0x20:0xffffffff804fda65
>> stack pointer           = 0x28:0xfffffe0698f21870
>> frame pointer           = 0x28:0xfffffe0698f218d0
>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>                           = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags        = interrupt enabled, resume, IOPL = 0
>> current process         = 6106 (pickup)
>> trap number             = 12
>> panic: page fault
>> cpuid = 2
>>
>>
>> (kgdb) bt
>> #0  doadump (textdump=<value optimized out>) at pcpu.h:219
>> #1  0xffffffff8053ce32 in kern_reboot (howto=260) at
>> /usr/src/sys/kern/kern_shutdown.c:455
>> #2  0xffffffff8053d215 in vpanic (fmt=<value optimized out>, ap=<value
>> optimized out>) at /usr/src/sys/kern/kern_shutdown.c:762
>> #3  0xffffffff8053d0a3 in panic (fmt=0x0) at
>> /usr/src/sys/kern/kern_shutdown.c:691
>> #4  0xffffffff807755db in trap_fatal (frame=<value optimized out>,
>> eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851
>> #5  0xffffffff807758dd in trap_pfault (frame=0xfffffe0698dbc7c0,
>> usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:674
>> #6  0xffffffff80774f7a in trap (frame=0xfffffe0698dbc7c0) at
>> /usr/src/sys/amd64/amd64/trap.c:440
>> #7  0xffffffff8075b0f2 in calltrap () at
>> /usr/src/sys/amd64/amd64/exception.S:236
>> #8  0xffffffff804fda65 in kqueue_close (fp=0xfffff803e4967190,
>> td=0xfffff80014b094a0) at /usr/src/sys/kern/kern_event.c:1750
>> #9  0xffffffff804f25f9 in _fdrop (fp=0xfffff803e4967190,
>> td=0xfffff802b5d2a000) at file.h:343
>> #10 0xffffffff804f4e9e in closef (fp=<value optimized out>, td=<value
>> optimized out>) at /usr/src/sys/kern/kern_descrip.c:2338
>> #11 0xffffffff804f4ab9 in fdescfree (td=0xfffff80014b094a0) at
>> /usr/src/sys/kern/kern_descrip.c:2106
>> #12 0xffffffff805013a9 in exit1 (td=0xfffff80014b094a0, rv=<value
>> optimized out>) at /usr/src/sys/kern/kern_exit.c:369
>> #13 0xffffffff80500e3e in sys_sys_exit (td=0xfffffe000782e060,
>> uap=<value optimized out>) at /usr/src/sys/kern/kern_exit.c:179
>> #14 0xffffffff80775efd in amd64_syscall (td=0xfffff80014b094a0,
>> traced=0) at subr_syscall.c:134
>> #15 0xffffffff8075b3db in Xfast_syscall () at
>> /usr/src/sys/amd64/amd64/exception.S:396
>> #16 0x000000080120335a in ?? ()
>>
>> Most of the dumps list 'pickup' as current process. All of them have
>> 'kqueue_close' in the backtrace.
>> I'm not sure what the next step in diagnosing the issue is. Any pointers
>> would be greatly appreciated.
> What is exact revision of the checkout you run, where the panic above
> occurs ?
Not entirely sure. Can I still find out if I've updated my source tree 
since? It's not in uname -a, but matching the dates it should be around 
~289032.
Want me to update to HEAD and do the steps below on that instead?

>
> Please load the kernel.debug + vmcore into kgdb, go to frame 8, and do
> p *kq
> p *kn
> p i
> p kq->kq_knlist[i].slh_first
> p *(kq->kq_knlist[i].slh_first)
#8  0xffffffff804fda65 in kqueue_close (fp=0xfffff801dd94b1e0, 
td=0xfffff80015bbc000) at /usr/src/sys/kern/kern_event.c:1750
1750 kn->kn_fop->f_detach(kn);
(kgdb) p *kq
$1 = {kq_lock = {lock_object = {lo_name = 0xffffffff80829725 "kqueue", 
lo_flags = 21168128, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, 
kq_refcnt = 1, kq_list = {
     tqe_next = 0xfffff8015f29fc00, tqe_prev = 0xfffff8000c749860}, 
kq_head = {tqh_first = 0x0, tqh_last = 0xfffff801dd33a038}, kq_count = 
0, kq_sel = {si_tdlist = {tqh_first = 0x0,
       tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, kl_lock 
= 0xffffffff804fc560 <knlist_mtx_lock>, kl_unlock = 0xffffffff804fc5a0 
<knlist_mtx_unlock>,
       kl_assert_locked = 0xffffffff804fc5e0 <knlist_mtx_assert_locked>, 
kl_assert_unlocked = 0xffffffff804fc5f0 <knlist_mtx_assert_unlocked>, 
kl_lockarg = 0xfffff801dd33a000},
     si_mtx = 0x0}, kq_sigio = 0x0, kq_fdp = 0xfffff8000c749800, 
kq_state = 16, kq_knlistsize = 256, kq_knlist = 0xfffff8000c7a8800, 
kq_knhashmask = 0, kq_knhash = 0x0, kq_task = {
     ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0, 
ta_func = 0xffffffff804faeb0 <kqueue_task>, ta_context = 
0xfffff801dd33a000}}
(kgdb) p *kn
No symbol "kn" in current context.
(kgdb) p i
No symbol "i" in current context.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?561E7E7A.1080600>