From owner-freebsd-fs@FreeBSD.ORG Sat Jun 29 11:00:02 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4737097A for ; Sat, 29 Jun 2013 11:00:02 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from dss.incore.de (dss.incore.de [195.145.1.138]) by mx1.freebsd.org (Postfix) with ESMTP id CB2131169 for ; Sat, 29 Jun 2013 11:00:01 +0000 (UTC) Received: from inetmail.dmz (inetmail.dmz [10.3.0.3]) by dss.incore.de (Postfix) with ESMTP id 716915C31E for ; Sat, 29 Jun 2013 12:54:49 +0200 (CEST) X-Virus-Scanned: amavisd-new at incore.de Received: from dss.incore.de ([10.3.0.3]) by inetmail.dmz (inetmail.dmz [10.3.0.3]) (amavisd-new, port 10024) with LMTP id 1bmLVREKHaNN for ; Sat, 29 Jun 2013 12:54:48 +0200 (CEST) Received: from mail.incore (fwintern.dmz [10.0.0.253]) by dss.incore.de (Postfix) with ESMTP id 1E8495C149 for ; Sat, 29 Jun 2013 12:54:48 +0200 (CEST) Received: from bsdmhs.longwitz (unknown [192.168.99.6]) by mail.incore (Postfix) with ESMTP id 7772E50881 for ; Sat, 29 Jun 2013 12:54:47 +0200 (CEST) Message-ID: <51CEBCF7.5040505@incore.de> Date: Sat, 29 Jun 2013 12:54:47 +0200 From: Andreas Longwitz User-Agent: Thunderbird 2.0.0.19 (X11/20090113) MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Reproducable ZFS crash when starting a jail in 8-stable Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Jun 2013 11:00:02 -0000 The problem occurs after an update of 8-stable from r248120 to r252111. My server has system disks da0, da1 with gmirror/gjournal for rootfs, usr, var and home partitions and glabeled data disks da2, da3 with zfs for prod and backup. Applications run in two jails on the zfs disks with nullfs mounts: At boot the servers crashs when /etc/rc.d/jail tries to start the jails, on the console I see (I use ddb.conf to handle crash by ddb): Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xffffff8245853930 frame pointer = 0x28:0xffffff82458539e0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 4411 (initial thread) [thread pid 4411 tid 100460 ] Stopped at 0: *** error reading from address 0 *** db:0:kdb.enter.default> watchdog No argument provided, disabling watchdog db:0:kdb.enter.default> call doadump Dumping 472 out of 8179 MB:..4%..11%..21%..31%..41%..51%..61%..72%..82%..92% Dump complete >From kgdb output I see pid 4411 is the zfs/initial thread: (kgdb) where #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:266 #1 0xffffffff801f877c in db_fncall (dummy1=, dummy2=, dummy3=, dummy4=) at /usr/src/sys/ddb/db_command.c:548 #2 0xffffffff801f8a2d in db_command (last_cmdp=0xffffffff8086b5c0, cmd_table=, dopager=0) at /usr/src/sys/ddb/db_command.c:445 #3 0xffffffff801fd0e3 in db_script_exec (scriptname=0xffffffff80657b9e "kdb.enter.default", warnifnotfound=0) at /usr/src/sys/ddb/db_script.c:302 #4 0xffffffff801fd1de in db_script_kdbenter (eventname=) at /usr/src/sys/ddb/db_script.c:325 #5 0xffffffff801fadc4 in db_trap (type=, code=) at /usr/src/sys/ddb/db_main.c:230 #6 0xffffffff80432981 in kdb_trap (type=12, code=0, tf=0xffffff8245853880) at /usr/src/sys/kern/subr_kdb.c:654 #7 0xffffffff805dbbed in trap_fatal (frame=0xffffff8245853880, eva=) at /usr/src/sys/amd64/amd64/trap.c:844 #8 0xffffffff805dbf6e in trap_pfault (frame=0xffffff8245853880, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:765 #9 0xffffffff805dc32b in trap (frame=0xffffff8245853880) at /usr/src/sys/amd64/amd64/trap.c:457 #10 0xffffffff805c2534 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 #11 0x0000000000000000 in ?? () (kgdb) info thread * 412 Thread 100460 (PID=4411: zfs/initial thread) doadump () at /usr/src/sys/kern/kern_shutdown.c:266 411 Thread 100461 (PID=4404: sh) sched_switch (td=0xffffff009e26d000, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 .... 221 Thread 100272 (PID=7: zfskern/txg_thread_enter) sched_switch (td=0xffffff0002c95000, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 220 Thread 100271 (PID=7: zfskern/txg_thread_enter) sched_switch (td=0xffffff0002c95470, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 219 Thread 100069 (PID=7: zfskern/l2arc_feed_thread) sched_switch (td=0xffffff0002957000, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 218 Thread 100068 (PID=7: zfskern/arc_reclaim_thread) sched_switch (td=0xffffff0002957470, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 ... 156 Thread 100270 (PID=0: kernel/zfs_vn_rele_taskq) sched_switch (td=0xffffff0002c958e0, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 155 Thread 100269 (PID=0: kernel/zio_ioctl_intr) sched_switch (td=0xffffff0002d92470, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 154 Thread 100268 (PID=0: kernel/zio_ioctl_issue) sched_switch (td=0xffffff0002d9e000, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 153 Thread 100267 (PID=0: kernel/zio_claim_intr) sched_switch (td=0xffffff0002d9e8e0, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 152 Thread 100266 (PID=0: kernel/zio_claim_issue) sched_switch (td=0xffffff0002d9a8e0, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 .... >From the kerneldump I can give backtraces of all threads or any other information. Some more informations: === root@serv07 (pts/1) -> gmirror status Name Status Components mirror/gmsv07 COMPLETE da0 (ACTIVE) da1 (ACTIVE) === root@serv07 (pts/1) -> glabel status Name Status Components label/9241A7D4 N/A da2 label/C2477N17 N/A da3 === root@serv07 (pts/2) -> zpool status pool: mpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM mpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 label/9241A7D4 ONLINE 0 0 0 label/C2477N17 ONLINE 0 0 0 errors: No known data errors === root@serv07 (pts/2) -> zpool list NAME USED AVAIL REFER MOUNTPOINT mpool 108G 806G 31K /mpool mpool/backup 545M 806G 485M /backup mpool/jail_deb_backup 33K 806G 33K /backup/jail/deb mpool/jail_deb_prod 32,6G 806G 32,6G /prod/jail/deb mpool/jail_pvz_backup 32K 806G 32K /backup/jail/pvz mpool/jail_pvz_prod 54,7G 806G 54,7G /prod/jail/pvz mpool/prod 20,0G 806G 20,0G /prod cat /etc/fstab.deb (fstab.pvz analogue): # Device Mountpoint FStype Options Dump Pass# /usr/jail/deb /jail/deb/usr nullfs rw 0 0 /var/jail/deb /jail/deb/var nullfs rw 0 0 /home/jail/deb /jail/deb/home nullfs rw 0 0 /tmp/jail/deb /jail/deb/tmp nullfs rw 0 0 /prod/jail/deb /jail/deb/prod nullfs rw 0 0 /backup/jail/deb /jail/deb/backup nullfs rw 0 0 On server without zfs the problem does not exist, jails on r252111 are running fine. Andreas Longwitz