Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Jun 2013 12:54:47 +0200
From:      Andreas Longwitz <longwitz@incore.de>
To:        freebsd-fs@freebsd.org
Subject:   Reproducable ZFS crash when starting a jail in 8-stable
Message-ID:  <51CEBCF7.5040505@incore.de>

next in thread | raw e-mail | index | archive | help
The problem occurs after an update of 8-stable from r248120 to r252111.

My server has system disks da0, da1 with gmirror/gjournal for rootfs,
usr, var and home partitions and glabeled data disks da2, da3 with zfs
for prod and backup. Applications run in two jails on the zfs disks with
nullfs mounts:

At boot the servers crashs when /etc/rc.d/jail tries to start the jails,
on the console I see (I use ddb.conf to handle crash by ddb):

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x0
fault code              = supervisor read instruction, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0xffffff8245853930
frame pointer           = 0x28:0xffffff82458539e0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 4411 (initial thread)
[thread pid 4411 tid 100460 ]
Stopped at      0:      *** error reading from address 0 ***
db:0:kdb.enter.default> watchdog
No argument provided, disabling watchdog
db:0:kdb.enter.default>  call doadump
Dumping 472 out of 8179 MB:..4%..11%..21%..31%..41%..51%..61%..72%..82%..92%
Dump complete

>From kgdb output I see pid 4411 is the zfs/initial thread:

(kgdb) where
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:266
#1  0xffffffff801f877c in db_fncall (dummy1=<value optimized out>,
dummy2=<value optimized out>,
    dummy3=<value optimized out>, dummy4=<value optimized out>) at
/usr/src/sys/ddb/db_command.c:548
#2  0xffffffff801f8a2d in db_command (last_cmdp=0xffffffff8086b5c0,
cmd_table=<value optimized out>, dopager=0)
    at /usr/src/sys/ddb/db_command.c:445
#3  0xffffffff801fd0e3 in db_script_exec (scriptname=0xffffffff80657b9e
"kdb.enter.default", warnifnotfound=0)
    at /usr/src/sys/ddb/db_script.c:302
#4  0xffffffff801fd1de in db_script_kdbenter (eventname=<value optimized
out>) at /usr/src/sys/ddb/db_script.c:325
#5  0xffffffff801fadc4 in db_trap (type=<value optimized out>,
code=<value optimized out>)
    at /usr/src/sys/ddb/db_main.c:230
#6  0xffffffff80432981 in kdb_trap (type=12, code=0,
tf=0xffffff8245853880) at /usr/src/sys/kern/subr_kdb.c:654
#7  0xffffffff805dbbed in trap_fatal (frame=0xffffff8245853880,
eva=<value optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:844
#8  0xffffffff805dbf6e in trap_pfault (frame=0xffffff8245853880,
usermode=0) at /usr/src/sys/amd64/amd64/trap.c:765
#9  0xffffffff805dc32b in trap (frame=0xffffff8245853880) at
/usr/src/sys/amd64/amd64/trap.c:457
#10 0xffffffff805c2534 in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:228
#11 0x0000000000000000 in ?? ()

(kgdb) info thread
* 412 Thread 100460 (PID=4411: zfs/initial thread)  doadump () at
/usr/src/sys/kern/kern_shutdown.c:266
  411 Thread 100461 (PID=4404: sh)  sched_switch (td=0xffffff009e26d000,
newtd=<value optimized out>,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
....
221 Thread 100272 (PID=7: zfskern/txg_thread_enter)  sched_switch
(td=0xffffff0002c95000, newtd=<value optimized out>,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
  220 Thread 100271 (PID=7: zfskern/txg_thread_enter)  sched_switch
(td=0xffffff0002c95470, newtd=<value optimized out>,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
  219 Thread 100069 (PID=7: zfskern/l2arc_feed_thread)  sched_switch
(td=0xffffff0002957000, newtd=<value optimized out>,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
  218 Thread 100068 (PID=7: zfskern/arc_reclaim_thread)  sched_switch
(td=0xffffff0002957470,
    newtd=<value optimized out>, flags=<value optimized out>) at
/usr/src/sys/kern/sched_ule.c:1932
...
 156 Thread 100270 (PID=0: kernel/zfs_vn_rele_taskq)  sched_switch
(td=0xffffff0002c958e0, newtd=<value optimized out>,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
  155 Thread 100269 (PID=0: kernel/zio_ioctl_intr)  sched_switch
(td=0xffffff0002d92470, newtd=<value optimized out>,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
  154 Thread 100268 (PID=0: kernel/zio_ioctl_issue)  sched_switch
(td=0xffffff0002d9e000, newtd=<value optimized out>,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
  153 Thread 100267 (PID=0: kernel/zio_claim_intr)  sched_switch
(td=0xffffff0002d9e8e0, newtd=<value optimized out>,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
  152 Thread 100266 (PID=0: kernel/zio_claim_issue)  sched_switch
(td=0xffffff0002d9a8e0, newtd=<value optimized out>,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
....

>From the kerneldump I can give backtraces of all threads or any other
information.

Some more informations:

=== root@serv07 (pts/1) -> gmirror status
         Name    Status  Components
mirror/gmsv07  COMPLETE  da0 (ACTIVE)
                         da1 (ACTIVE)

=== root@serv07 (pts/1) -> glabel status
          Name  Status  Components
label/9241A7D4     N/A  da2
label/C2477N17     N/A  da3

=== root@serv07 (pts/2) -> zpool status
  pool: mpool
 state: ONLINE
  scan: none requested
config:

        NAME                STATE     READ WRITE CKSUM
        mpool               ONLINE       0     0     0
          mirror-0          ONLINE       0     0     0
            label/9241A7D4  ONLINE       0     0     0
            label/C2477N17  ONLINE       0     0     0

errors: No known data errors

=== root@serv07 (pts/2) -> zpool list
NAME                    USED  AVAIL  REFER  MOUNTPOINT
mpool                   108G   806G    31K  /mpool
mpool/backup            545M   806G   485M  /backup
mpool/jail_deb_backup    33K   806G    33K  /backup/jail/deb
mpool/jail_deb_prod    32,6G   806G  32,6G  /prod/jail/deb
mpool/jail_pvz_backup    32K   806G    32K  /backup/jail/pvz
mpool/jail_pvz_prod    54,7G   806G  54,7G  /prod/jail/pvz
mpool/prod             20,0G   806G  20,0G  /prod

cat /etc/fstab.deb (fstab.pvz analogue):
# Device           Mountpoint        FStype  Options   Dump    Pass#
/usr/jail/deb      /jail/deb/usr     nullfs  rw        0       0
/var/jail/deb      /jail/deb/var     nullfs  rw        0       0
/home/jail/deb     /jail/deb/home    nullfs  rw        0       0
/tmp/jail/deb      /jail/deb/tmp     nullfs  rw        0       0
/prod/jail/deb     /jail/deb/prod    nullfs  rw        0       0
/backup/jail/deb   /jail/deb/backup  nullfs  rw        0       0

On server without zfs the problem does not exist, jails on r252111
are running fine.

Andreas Longwitz




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51CEBCF7.5040505>