From owner-freebsd-current@FreeBSD.ORG Mon Mar 10 19:01:57 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CD92FE3D; Mon, 10 Mar 2014 19:01:57 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A1EA89D; Mon, 10 Mar 2014 19:01:57 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 95DDCB980; Mon, 10 Mar 2014 15:01:56 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Subject: Re: panic after resume, triggered by vt_switch_timer() ? Date: Mon, 10 Mar 2014 14:35:50 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: <20140310013936.GC1746@glenbarber.us> In-Reply-To: <20140310013936.GC1746@glenbarber.us> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201403101435.50550.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 10 Mar 2014 15:01:56 -0400 (EDT) Cc: Glen Barber X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Mar 2014 19:01:58 -0000 On Sunday, March 09, 2014 9:39:36 pm Glen Barber wrote: > I think I have hit a panic in the code path for vt_switch_timer() when > resuming from suspend. > > In both cases it happened, the laptop was suspended for >2 hours. > > In both cases, c_func is vt_switch_timer(), and c_arg is a negative > value (both times it is -2133611368). I am not sure if these are valid > values for vt_switch_timer(), but it caught my eye. > > The amount of time the laptop has been suspended may be unrelated, but > so far has been the only constant in numerous attempts to reproduce the > crash. > > The machine is running 11.0-CURRENT #202 r262562, and the only recent > change to the kernel configuration is switching from sc(4) to vt(4) > a few weeks ago. Prior to this, I could leave the machine suspended for > several hours (sometimes up to 4 when traveling), without issue. > > > Script started on Sun Mar 9 20:22:44 2014 > command: /bin/sh > # kgdb ./kernel.debug /var/crash/vmcore.last > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x30 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff8064378f > stack pointer = 0x28:0xfffffe01e47b98f0 > frame pointer = 0x28:0xfffffe01e47b99b0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 12 (swi4: clock (0)) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff8066c280 at kdb_backtrace+0x60 > #1 0xffffffff8062bc65 at panic+0x155 > #2 0xffffffff8095a78f at trap_fatal+0x38f > #3 0xffffffff8095aac1 at trap_pfault+0x321 > #4 0xffffffff8095a160 at trap+0x4a0 > #5 0xffffffff8093f682 at calltrap+0x8 > #6 0xffffffff80643b94 at softclock+0x94 > #7 0xffffffff805f6aa8 at intr_event_execute_handlers+0x1b8 > #8 0xffffffff805f6ea6 at ithread_loop+0x96 > #9 0xffffffff805f3c9a at fork_exit+0x9a > #10 0xffffffff8093fbbe at fork_trampoline+0xe > Uptime: 6h12m9s > Dumping 3121 out of 7951 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > > Reading symbols from /boot/kernel/zfs.ko.symbols...done. > Loaded symbols for /boot/kernel/zfs.ko.symbols > Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. > Loaded symbols for /boot/kernel/opensolaris.ko.symbols > Reading symbols from /boot/kernel/i915kms.ko.symbols...done. > Loaded symbols for /boot/kernel/i915kms.ko.symbols > Reading symbols from /boot/kernel/drm2.ko.symbols...done. > Loaded symbols for /boot/kernel/drm2.ko.symbols > #0 doadump (textdump=) at pcpu.h:219 > 219 __asm("movq %%gs:%1,%0" : "=r" (td) > (kgdb) bt > #0 doadump (textdump=) at pcpu.h:219 > #1 0xffffffff8062b808 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452 > #2 0xffffffff8062bca4 in panic (fmt=) > at /usr/src/sys/kern/kern_shutdown.c:759 > #3 0xffffffff8095a78f in trap_fatal (frame=, eva=) > at /usr/src/sys/amd64/amd64/trap.c:874 > #4 0xffffffff8095aac1 in trap_pfault (frame=0xfffffe01e47b9840, > usermode=) at /usr/src/sys/amd64/amd64/trap.c:691 > #5 0xffffffff8095a160 in trap (frame=0xfffffe01e47b9840) > at /usr/src/sys/amd64/amd64/trap.c:455 > #6 0xffffffff8093f682 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231 > #7 0xffffffff8064378f in softclock_call_cc (c=0xffffffff80d3ae78, cc=0xffffffff80eace80, > direct=0) at /usr/src/sys/kern/kern_timeout.c:703 > #8 0xffffffff80643b94 in softclock (arg=0xffffffff80eace80) > at /usr/src/sys/kern/kern_timeout.c:812 > #9 0xffffffff805f6aa8 in intr_event_execute_handlers (p=, > ie=0xfffff800031f3a00) at /usr/src/sys/kern/kern_intr.c:1263 > #10 0xffffffff805f6ea6 in ithread_loop (arg=0xfffff8000323cf80) > at /usr/src/sys/kern/kern_intr.c:1276 > #11 0xffffffff805f3c9a in fork_exit (callout=0xffffffff805f6e10 , > arg=0xfffff8000323cf80, frame=0xfffffe01e47b9ac0) at /usr/src/sys/kern/kern_fork.c:977 > #12 0xffffffff8093fbbe in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:605 > #13 0x0000000000000000 in ?? () > Current language: auto; currently minimal > (kgdb) frame 7 > #7 0xffffffff8064378f in softclock_call_cc (c=0xffffffff80d3ae78, cc=0xffffffff80eace80, > direct=0) at /usr/src/sys/kern/kern_timeout.c:703 > 703 class->lc_unlock(c_lock); > (kgdb) l > 698 lastfunc = c_func; > 699 } > 700 #endif > 701 CTR1(KTR_CALLOUT, "callout %p finished", c); > 702 if ((c_flags & CALLOUT_RETURNUNLOCKED) == 0) > 703 class->lc_unlock(c_lock); > 704 skip: > 705 CC_LOCK(cc); > 706 KASSERT(cc->cc_exec_entity[direct].cc_curr == c, ("mishandled cc_curr")); > 707 cc->cc_exec_entity[direct].cc_curr = NULL; > (kgdb) p *c > $1 = {c_links = {le = {le_next = 0x0, le_prev = 0xffffffff80eacf90}, sle = {sle_next = 0x0}, > tqe = {tqe_next = 0x0, tqe_prev = 0xffffffff80eacf90}}, c_time = 95902818571375, > c_precision = 4026531562, c_arg = 0xffffffff80d3adc0, > c_func = 0xffffffff80510850 , c_lock = 0x0, c_flags = 128, c_cpu = 0} > (kgdb) l *0xffffffff80510850 > 0xffffffff80510850 is in vt_switch_timer (/usr/src/sys/dev/vt/vt_core.c:150). > 145 static int vt_proc_alive(struct vt_window *); > 146 static void vt_resize(struct vt_device *); > 147 > 148 static void > 149 vt_switch_timer(void *arg) > 150 { > 151 > 152 vt_late_window_switch((struct vt_window *)arg); > 153 } > 154 > (kgdb) p *0xffffffff80d3adc0 > $2 = -2133611368 > (kgdb) quit > # ^D > Script done on Sun Mar 9 20:27:43 2014 > > Thoughts? This has been reported multiple times by both adrian@ and myself. I would try disabling vt switching during suspend as a workaround for now. I found that only ttyv0's timer object is zero'd when this happens. The timers for the other VTY's are all fine. -- John Baldwin