From owner-freebsd-stable@freebsd.org Wed Oct 14 16:10:37 2015 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0C8A3A135EC for ; Wed, 14 Oct 2015 16:10:37 +0000 (UTC) (envelope-from frank@zzattack.org) Received: from mail-wi0-x22f.google.com (mail-wi0-x22f.google.com [IPv6:2a00:1450:400c:c05::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 97E2C1D55 for ; Wed, 14 Oct 2015 16:10:36 +0000 (UTC) (envelope-from frank@zzattack.org) Received: by wicgb1 with SMTP id gb1so136103635wic.1 for ; Wed, 14 Oct 2015 09:10:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zzattack.org; s=google; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-type:content-transfer-encoding; bh=xkrRnK9KLQFae5oYRZ3S3hfh2pvO1m1gmJfujuQKCg8=; b=nkIWFtl/liw8K/ZJi/VkC/5pHL8+6ktAndYyruo93YjIsV0nvSYS/u1WSZIKEzCXpI r0Xxzy81//1aDqvY+VoTIXboVfB7P5x/iPjoYHO+y6ZrjsmZXiG/xfRHCVtQ9/XsGXqw uEwBjf0O3e5CR1MA9fTmUzSNn2q3241JfCfKA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=xkrRnK9KLQFae5oYRZ3S3hfh2pvO1m1gmJfujuQKCg8=; b=evZH+JL7gUlx/GnplXB0iprBPl/vErOd26mqZSzB+T0Oa92yP4OItU4ebyfHUDc/we hrJZWlxCwVommiOcz97kHCKbjyrSxNoyaY9WN0pVeKZvTbpau7iXoqIblI5eNJ3G1jXf 5D92qMxUGS7ee3NV14hKrmMwWOjDe9w8Wbum89JEv1R3ZI+HdPiFkHiXFthYHCrj8/xV 3KxQby2aYqpPtyyCIcA1zGtMXRMpg76oz7DipfPR29TA+jm8/WRgLQum1EhXUuUYrH8u W1RtCuWAnH5PiXQYVh7IpKiWIhShoKWXiys1ahoztCCtFqq0Rio+mYud/werFhzqIyOK kReQ== X-Gm-Message-State: ALoCoQmuq+QynfITjLR1KaJuYkRMpCWc7wVMrgMI1CF4AvMaox8g887M832mJILsEMsbfCbNJC0L X-Received: by 10.194.114.133 with SMTP id jg5mr4898316wjb.98.1444839032791; Wed, 14 Oct 2015 09:10:32 -0700 (PDT) Received: from [10.31.45.20] (38-106-201-31.ftth.glasoperator.nl. [31.201.106.38]) by smtp.googlemail.com with ESMTPSA id o10sm7723068wia.4.2015.10.14.09.10.31 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 14 Oct 2015 09:10:31 -0700 (PDT) Subject: Re: 10.2-STABLE amd64 panic: page fault while in kernel mode To: Konstantin Belousov , freebsd-stable@freebsd.org References: <561E5E2F.90404@zzattack.org> <20151014144217.GV2257@kib.kiev.ua> From: Frank Razenberg Message-ID: <561E7E7A.1080600@zzattack.org> Date: Wed, 14 Oct 2015 18:10:34 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20151014144217.GV2257@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Oct 2015 16:10:37 -0000 Thanks for looking into this. On 10/14/2015 4:42 PM, Konstantin Belousov wrote: > On Wed, Oct 14, 2015 at 03:52:47PM +0200, Frank Razenberg wrote: >> After upgrading from 9.2 to 10.1 I first started noticing panics. They >> occurred roughly weekly and since this storage machine isn't frequently >> used I didn't look into it much further. After updating for 10.2-STABLE >> the panics have gone from weekly to daily. >> The machine has 32GB of non-registered ECC DDR3-1066 RAM. There's also a >> 10-disk raidz2 pool. I've ran memtest86+ for 72 hours straight with no >> errors. >> >> Crash dumps all feature the following: >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 2; apic id = 12 >> fault virtual address = 0x1d1c0bec0 >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0xffffffff804fda65 >> stack pointer = 0x28:0xfffffe0698f21870 >> frame pointer = 0x28:0xfffffe0698f218d0 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 6106 (pickup) >> trap number = 12 >> panic: page fault >> cpuid = 2 >> >> >> (kgdb) bt >> #0 doadump (textdump=) at pcpu.h:219 >> #1 0xffffffff8053ce32 in kern_reboot (howto=260) at >> /usr/src/sys/kern/kern_shutdown.c:455 >> #2 0xffffffff8053d215 in vpanic (fmt=, ap=> optimized out>) at /usr/src/sys/kern/kern_shutdown.c:762 >> #3 0xffffffff8053d0a3 in panic (fmt=0x0) at >> /usr/src/sys/kern/kern_shutdown.c:691 >> #4 0xffffffff807755db in trap_fatal (frame=, >> eva=) at /usr/src/sys/amd64/amd64/trap.c:851 >> #5 0xffffffff807758dd in trap_pfault (frame=0xfffffe0698dbc7c0, >> usermode=) at /usr/src/sys/amd64/amd64/trap.c:674 >> #6 0xffffffff80774f7a in trap (frame=0xfffffe0698dbc7c0) at >> /usr/src/sys/amd64/amd64/trap.c:440 >> #7 0xffffffff8075b0f2 in calltrap () at >> /usr/src/sys/amd64/amd64/exception.S:236 >> #8 0xffffffff804fda65 in kqueue_close (fp=0xfffff803e4967190, >> td=0xfffff80014b094a0) at /usr/src/sys/kern/kern_event.c:1750 >> #9 0xffffffff804f25f9 in _fdrop (fp=0xfffff803e4967190, >> td=0xfffff802b5d2a000) at file.h:343 >> #10 0xffffffff804f4e9e in closef (fp=, td=> optimized out>) at /usr/src/sys/kern/kern_descrip.c:2338 >> #11 0xffffffff804f4ab9 in fdescfree (td=0xfffff80014b094a0) at >> /usr/src/sys/kern/kern_descrip.c:2106 >> #12 0xffffffff805013a9 in exit1 (td=0xfffff80014b094a0, rv=> optimized out>) at /usr/src/sys/kern/kern_exit.c:369 >> #13 0xffffffff80500e3e in sys_sys_exit (td=0xfffffe000782e060, >> uap=) at /usr/src/sys/kern/kern_exit.c:179 >> #14 0xffffffff80775efd in amd64_syscall (td=0xfffff80014b094a0, >> traced=0) at subr_syscall.c:134 >> #15 0xffffffff8075b3db in Xfast_syscall () at >> /usr/src/sys/amd64/amd64/exception.S:396 >> #16 0x000000080120335a in ?? () >> >> Most of the dumps list 'pickup' as current process. All of them have >> 'kqueue_close' in the backtrace. >> I'm not sure what the next step in diagnosing the issue is. Any pointers >> would be greatly appreciated. > What is exact revision of the checkout you run, where the panic above > occurs ? Not entirely sure. Can I still find out if I've updated my source tree since? It's not in uname -a, but matching the dates it should be around ~289032. Want me to update to HEAD and do the steps below on that instead? > > Please load the kernel.debug + vmcore into kgdb, go to frame 8, and do > p *kq > p *kn > p i > p kq->kq_knlist[i].slh_first > p *(kq->kq_knlist[i].slh_first) #8 0xffffffff804fda65 in kqueue_close (fp=0xfffff801dd94b1e0, td=0xfffff80015bbc000) at /usr/src/sys/kern/kern_event.c:1750 1750 kn->kn_fop->f_detach(kn); (kgdb) p *kq $1 = {kq_lock = {lock_object = {lo_name = 0xffffffff80829725 "kqueue", lo_flags = 21168128, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, kq_refcnt = 1, kq_list = { tqe_next = 0xfffff8015f29fc00, tqe_prev = 0xfffff8000c749860}, kq_head = {tqh_first = 0x0, tqh_last = 0xfffff801dd33a038}, kq_count = 0, kq_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff804fc560 , kl_unlock = 0xffffffff804fc5a0 , kl_assert_locked = 0xffffffff804fc5e0 , kl_assert_unlocked = 0xffffffff804fc5f0 , kl_lockarg = 0xfffff801dd33a000}, si_mtx = 0x0}, kq_sigio = 0x0, kq_fdp = 0xfffff8000c749800, kq_state = 16, kq_knlistsize = 256, kq_knlist = 0xfffff8000c7a8800, kq_knhashmask = 0, kq_knhash = 0x0, kq_task = { ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0, ta_func = 0xffffffff804faeb0 , ta_context = 0xfffff801dd33a000}} (kgdb) p *kn No symbol "kn" in current context. (kgdb) p i No symbol "i" in current context.