Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Oct 2013 13:18:58 +0100 (BST)
From:      Anton Shterenlikht <mexas@bris.ac.uk>
To:        davide@freebsd.org, mexas@bris.ac.uk
Cc:        freebsd-current@freebsd.org, freebsd-ia64@freebsd.org
Subject:   Re: panic: wrong page state m 0xe00000027a9adb40 + savecore deadlock
Message-ID:  <201310151218.r9FCIwBx043808@mech-cluster241.men.bris.ac.uk>
In-Reply-To: <CACYV=-GE%2BSUR_RrXfhaH9FekQ3QC6DuYuSpcdhAok0kH0uBShQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
>From davide.italiano@gmail.com Tue Oct 15 11:30:07 2013
>
>On Tue, Oct 15, 2013 at 10:43 AM, Anton Shterenlikht <mexas@bris.ac.uk> wrote:
>
>> Anyway, savecore eventually deadlocks:
>>
>> panic: deadlkres: possible deadlock detected for 0xe0000000127b7b00, blocked for 901401 ticks
>>
>
>[trim]
>
>>
>> Tracing command savecore pid 805 tid 100079 td 0xe0000000127b7b00
>> cpu_switch(0xe0000000127b7b00, 0xe000000011178900, 0xe000000012402fc0, 0x9ffc0000005e7e80) at cpu_switch+0xd0
>> sched_switch(0xe0000000127b7b00, 0xe000000011178900, 0x9ffc000000f15698, 0x9ffc000000f15680) at sched_switch+0x890
>> mi_switch(0x103, 0x0, 0xe0000000127b7b00, 0x9ffc00000062d1f0) at mi_switch+0x3f0
>> turnstile_wait(0xe000000012402fc0, 0xe000000012400480, 0x0, 0x9ffc000000dcb698) at turnstile_wait+0x960
>> __mtx_lock_sleep(0x9ffc0000010f9998, 0xe0000000127b7b00, 0xe000000012402fc0, 0x9ffc000000dc0558, 0x742) at __mtx_lock_sleep+0x2f0
>> __mtx_lock_flags(0x9ffc0000010f9980, 0x0, 0x9ffc000000dd4a90, 0x742) at __mtx_lock_flags+0x1e0
>> vfs_vmio_release(0xa00000009ebe72f0, 0xe00000027ed2ab70, 0x3, 0xa00000009ebe736c, 0xa00000009ebe7498, 0xa00000009ebe72f8, 0x9ffc000000dd4a90, 0x9ffc0000010f9680) at vfs_vmio_release+0x290
>> getnewbuf(0xe0000000127f4ec0, 0x0, 0x0, 0x8000, 0xa00000009ebe99a8, 0x0, 0x9ffc0000010f0798, 0xa00000009ebe72f0) at getnewbuf+0x7e0
>> getblk(0xe0000000127f4ec0, 0x4cbaa, 0x8000, 0x0, 0x0, 0x0, 0x0, 0x0) at getblk+0xee0
>> ffs_balloc_ufs2(0xe0000000127f4ec0, 0x4cbaa, 0xa0000000c60ba000, 0xe000000011165a00, 0x7f050000, 0xa00000009dd79160) at ffs_balloc_ufs2+0x2950
>> ffs_write(0xa00000009dd79248, 0x3000, 0x265d50000) at ffs_write+0x5c0
>> VOP_WRITE_APV(0x9ffc000000e94ac0, 0xa00000009dd79248, 0x0, 0x0) at VOP_WRITE_APV+0x330
>> vn_write(0xe0000000129ae820, 0xa00000009dd79360, 0xe000000011165a00, 0x0, 0xe0000000129ae830, 0xe0000000127f4ec0) at vn_write+0x450
>> vn_io_fault(0xe0000000129ae820, 0xa00000009dd79360, 0xe000000011165a00, 0x0, 0xe0000000127b7b00) at vn_io_fault+0x330
>> dofilewrite(0xe0000000127b7b00, 0x7, 0xe0000000129ae820, 0xa00000009dd79360, 0xffffffffffffffff, 0x0) at dofilewrite+0x180
>> kern_writev(0xe0000000127b7b00, 0x7, 0xa00000009dd79360) at kern_writev+0xa0
>> sys_write(0xe0000000127b7b00, 0xa00000009dd794e8, 0x9ffc000000abac80, 0x48d) at sys_write+0x100
>> syscall(0xe0000000129d04a0, 0x140857000, 0x8000, 0xe0000000127b7b00, 0x0, 0x0, 0x9ffc000000ab7280, 0x8) at syscall+0x5e0
>> --More--
>
>I'm not commenting on the first panic you got -- but on the deadlock
>reported by DEADLKRES. I think that's the vm_page lock.
>You can run kgdb /boot/${KERNEL}/kernel where ${KERNEL} is the incrimined one
>then l *vfs_vmio_release+0x290
>to get the exact point where it fails.

Like this?

# kgdb /boot/kernel/kernel
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "ia64-marcel-freebsd"...
(kgdb) l *vfs_vmio_release+0x290
0x9ffc0000006b8830 is in vfs_vmio_release (/usr/src/sys/kern/vfs_bio.c:1859).
1854                    /*
1855                     * In order to keep page LRU ordering consistent, put
1856                     * everything on the inactive queue.
1857                     */
1858                    vm_page_lock(m);
1859                    vm_page_unwire(m, 0);
1860
1861                    /*
1862                     * Might as well free the page if we can and it has
1863                     * no valid data.  We also free the page if the
(kgdb) 


>I'm unsure here because 'show alllocks' and 'show locks' outputs are
>empty -- are you building your kernel with WITNESS etc..?

I think so:

# Debugging support.  Always need this:
options         KDB             # Enable kernel debugger support.
options         KDB_TRACE       # Print a stack trace for a panic.
# For full debugger support use (turn off in stable branch):
options         DDB             # Support DDB
options         GDB             # Support remote GDB
options         DEADLKRES       # Enable the deadlock resolver
options         INVARIANTS      # Enable calls of extra sanity checking
options         INVARIANT_SUPPORT # required by INVARIANTS
options         WITNESS         # Enable checks to detect deadlocks and cycles
options         WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed
options         MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones
# textdump(4)
options TEXTDUMP_PREFERRED
options TEXTDUMP_VERBOSE
# http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-deadlocks.html
options DEBUG_LOCKS
options DEBUG_VFS_LOCKS
options DIAGNOSTIC

Also, does this look right:

$ sysctl -a | grep kdb
debug.ddb.scripting.scripts: kdb.enter.panic=textdump set; capture on; run lockinfo; show pcpu; bt; ps; alltrace; capture off; call doadump; reset
kdb.enter.witness=run lockinfo
debug.kassert.do_kdb: 0
debug.kdb.alt_break_to_debugger: 0
debug.kdb.break_to_debugger: 0
debug.kdb.trap_code: 0
debug.kdb.trap: 0
debug.kdb.panic: 0
debug.kdb.enter: 0
debug.kdb.current: ddb
debug.kdb.available: ddb gdb 
debug.witness.kdb: 0
$ 

Thank you

Anton




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201310151218.r9FCIwBx043808>