From owner-freebsd-fs@FreeBSD.ORG Sat Jun 29 22:07:49 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8AA7990D for ; Sat, 29 Jun 2013 22:07:49 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [176.9.45.25]) by mx1.freebsd.org (Postfix) with ESMTP id 2F11917F2 for ; Sat, 29 Jun 2013 22:07:48 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.2]) by mail.vx.sk (Postfix) with ESMTP id 0073935C7E; Sun, 30 Jun 2013 00:07:48 +0200 (CEST) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk by core.vx.sk (amavisd-new, unix socket) with LMTP id RtVOoljWdKnz; Sun, 30 Jun 2013 00:07:47 +0200 (CEST) Received: from [10.9.8.144] (chello085216226145.chello.sk [85.216.226.145]) by mail.vx.sk (Postfix) with ESMTPSA id 3C82635C77; Sun, 30 Jun 2013 00:07:47 +0200 (CEST) Message-ID: <51CF5AB0.8040406@FreeBSD.org> Date: Sun, 30 Jun 2013 00:07:44 +0200 From: Martin Matuska User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130623 Thunderbird/17.0.7 MIME-Version: 1.0 To: Andreas Longwitz Subject: Re: Reproducable ZFS crash when starting a jail in 8-stable References: <51CEBCF7.5040505@incore.de> In-Reply-To: <51CEBCF7.5040505@incore.de> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Jun 2013 22:07:49 -0000 Fixed in r252380 (head), MFC scheduled for July 2, 2013 On 2013-06-29 12:54, Andreas Longwitz wrote: > The problem occurs after an update of 8-stable from r248120 to r252111. > > My server has system disks da0, da1 with gmirror/gjournal for rootfs, > usr, var and home partitions and glabeled data disks da2, da3 with zfs > for prod and backup. Applications run in two jails on the zfs disks with > nullfs mounts: > > At boot the servers crashs when /etc/rc.d/jail tries to start the jails, > on the console I see (I use ddb.conf to handle crash by ddb): > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x0 > fault code = supervisor read instruction, page not present > instruction pointer = 0x20:0x0 > stack pointer = 0x28:0xffffff8245853930 > frame pointer = 0x28:0xffffff82458539e0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 4411 (initial thread) > [thread pid 4411 tid 100460 ] > Stopped at 0: *** error reading from address 0 *** > db:0:kdb.enter.default> watchdog > No argument provided, disabling watchdog > db:0:kdb.enter.default> call doadump > Dumping 472 out of 8179 MB:..4%..11%..21%..31%..41%..51%..61%..72%..82%..92% > Dump complete > > From kgdb output I see pid 4411 is the zfs/initial thread: > > (kgdb) where > #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:266 > #1 0xffffffff801f877c in db_fncall (dummy1=, > dummy2=, > dummy3=, dummy4=) at > /usr/src/sys/ddb/db_command.c:548 > #2 0xffffffff801f8a2d in db_command (last_cmdp=0xffffffff8086b5c0, > cmd_table=, dopager=0) > at /usr/src/sys/ddb/db_command.c:445 > #3 0xffffffff801fd0e3 in db_script_exec (scriptname=0xffffffff80657b9e > "kdb.enter.default", warnifnotfound=0) > at /usr/src/sys/ddb/db_script.c:302 > #4 0xffffffff801fd1de in db_script_kdbenter (eventname= out>) at /usr/src/sys/ddb/db_script.c:325 > #5 0xffffffff801fadc4 in db_trap (type=, > code=) > at /usr/src/sys/ddb/db_main.c:230 > #6 0xffffffff80432981 in kdb_trap (type=12, code=0, > tf=0xffffff8245853880) at /usr/src/sys/kern/subr_kdb.c:654 > #7 0xffffffff805dbbed in trap_fatal (frame=0xffffff8245853880, > eva=) > at /usr/src/sys/amd64/amd64/trap.c:844 > #8 0xffffffff805dbf6e in trap_pfault (frame=0xffffff8245853880, > usermode=0) at /usr/src/sys/amd64/amd64/trap.c:765 > #9 0xffffffff805dc32b in trap (frame=0xffffff8245853880) at > /usr/src/sys/amd64/amd64/trap.c:457 > #10 0xffffffff805c2534 in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:228 > #11 0x0000000000000000 in ?? () > > (kgdb) info thread > * 412 Thread 100460 (PID=4411: zfs/initial thread) doadump () at > /usr/src/sys/kern/kern_shutdown.c:266 > 411 Thread 100461 (PID=4404: sh) sched_switch (td=0xffffff009e26d000, > newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1932 > .... > 221 Thread 100272 (PID=7: zfskern/txg_thread_enter) sched_switch > (td=0xffffff0002c95000, newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1932 > 220 Thread 100271 (PID=7: zfskern/txg_thread_enter) sched_switch > (td=0xffffff0002c95470, newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1932 > 219 Thread 100069 (PID=7: zfskern/l2arc_feed_thread) sched_switch > (td=0xffffff0002957000, newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1932 > 218 Thread 100068 (PID=7: zfskern/arc_reclaim_thread) sched_switch > (td=0xffffff0002957470, > newtd=, flags=) at > /usr/src/sys/kern/sched_ule.c:1932 > ... > 156 Thread 100270 (PID=0: kernel/zfs_vn_rele_taskq) sched_switch > (td=0xffffff0002c958e0, newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1932 > 155 Thread 100269 (PID=0: kernel/zio_ioctl_intr) sched_switch > (td=0xffffff0002d92470, newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1932 > 154 Thread 100268 (PID=0: kernel/zio_ioctl_issue) sched_switch > (td=0xffffff0002d9e000, newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1932 > 153 Thread 100267 (PID=0: kernel/zio_claim_intr) sched_switch > (td=0xffffff0002d9e8e0, newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1932 > 152 Thread 100266 (PID=0: kernel/zio_claim_issue) sched_switch > (td=0xffffff0002d9a8e0, newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1932 > .... > > From the kerneldump I can give backtraces of all threads or any other > information. > > Some more informations: > > === root@serv07 (pts/1) -> gmirror status > Name Status Components > mirror/gmsv07 COMPLETE da0 (ACTIVE) > da1 (ACTIVE) > > === root@serv07 (pts/1) -> glabel status > Name Status Components > label/9241A7D4 N/A da2 > label/C2477N17 N/A da3 > > === root@serv07 (pts/2) -> zpool status > pool: mpool > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > mpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > label/9241A7D4 ONLINE 0 0 0 > label/C2477N17 ONLINE 0 0 0 > > errors: No known data errors > > === root@serv07 (pts/2) -> zpool list > NAME USED AVAIL REFER MOUNTPOINT > mpool 108G 806G 31K /mpool > mpool/backup 545M 806G 485M /backup > mpool/jail_deb_backup 33K 806G 33K /backup/jail/deb > mpool/jail_deb_prod 32,6G 806G 32,6G /prod/jail/deb > mpool/jail_pvz_backup 32K 806G 32K /backup/jail/pvz > mpool/jail_pvz_prod 54,7G 806G 54,7G /prod/jail/pvz > mpool/prod 20,0G 806G 20,0G /prod > > cat /etc/fstab.deb (fstab.pvz analogue): > # Device Mountpoint FStype Options Dump Pass# > /usr/jail/deb /jail/deb/usr nullfs rw 0 0 > /var/jail/deb /jail/deb/var nullfs rw 0 0 > /home/jail/deb /jail/deb/home nullfs rw 0 0 > /tmp/jail/deb /jail/deb/tmp nullfs rw 0 0 > /prod/jail/deb /jail/deb/prod nullfs rw 0 0 > /backup/jail/deb /jail/deb/backup nullfs rw 0 0 > > On server without zfs the problem does not exist, jails on r252111 > are running fine. > > Andreas Longwitz > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"