Date: Tue, 11 Aug 2009 08:20:05 GMT From: Peter Much <pmc@citylink.dinoex.sub.org> To: freebsd-fs@FreeBSD.org Subject: Re: kern/137037: [zfs] [hang] zfs rollback on root causes FreeBSD to freeze in few seconds Message-ID: <200908110820.n7B8K53Y051011@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/137037; it has been noted by GNATS. From: Peter Much <pmc@citylink.dinoex.sub.org> To: bug-followup@FreeBSD.org, killasmurf86@gmail.com Cc: Subject: Re: kern/137037: [zfs] [hang] zfs rollback on root causes FreeBSD to freeze in few seconds Date: Tue, 11 Aug 2009 09:34:10 +0200 I considered to do more investigations before reporting my issue, but after seeing this bug report I think an interim report from my side should not harm. I also experience system failures after rollback, and the significant similarity is that in my case also the rollbacks succeed, and the system continues to work for some seconds (or sometimes even longer) before it fails. The failure is either (seldom) a system freeze or (much more often) an instanteous reboot without dumping. I am currently investigating about methods to capture some useful data. Maybe, if it freezes, running "watchdog" can trick it to do a dump... I am running 7.2-STABLE as of mid-July (that is ZFS V13). I admit I am someway low on memory to run ZFS (memory is ordered ;) ), but I use it only for a very limited number of filesystems and specific tasks, and I am watching carefully about my mem usage. Nevertheless, if the system would run out of memory, I would expect an orderly panic and not some hard reset or freeze. I am not using geli or anything like, also I am not working with the root; what I am doing is mainly an extensive use of the rollback feature, from script, in a way like this: while <some stuff> do zfs mount jb/x mount -t zfs jb/p /jb/x/p ... do some work ... umount /jb/x/p umount /jb/x zfs rollback jb/x@base zfs rollback jb/p@base done At first I tried this without the unmounting, but the crashes were so reproducible that I considered that unfunctional. With the unmounting it looked functional first, but now I also experience crashes about every 12 hours. Beware: this is an interim report, I have not yet extensively verified against possibilities of my own mistakes. Take it with the appropriate grains of salt. ;) ------------------------------ Update: I was able to obtain a dump. After running the above loop in a tough way and staying on the console, it suddenly started to do havoc, reported that it were not able to unmount the filesystems or could not detect them (something I also had seen occasionally before) and then dropped me into the debugger at _sx_xlock+0x16 lock cmpxchgl %edx,0x10(%ecx) The backtrace see attached below - but beware, since the havoc had already started before, this will very likely NOT point to the root cause of the problem. But maybe it gives some first impression. I suppose this should be reproducible, but in any case I would be glad to provide further data if requested (or do further tests). And as said before - if this is a result of low memory, then I am just sorry. ;) Ah, btw, its a dual Pentium3 SMP machine. (gdb) add-symbol-file /usr/src/sys/i386/compile/D1R72V1/modules/usr/src/sys/modules/zfs/zfs.ko 0xc0a59860 add symbol table from file "/usr/src/sys/i386/compile/D1R72V1/modules/usr/src/sys/modules/zfs/zfs.ko" at .text_addr = 0xc0a59860 (gdb) bt #0 doadump () at pcpu.h:196 #1 0xc05e8be6 in boot (howto=260) at ../../../kern/kern_shutdown.c:418 #2 0xc05e8f07 in panic (fmt=Variable "fmt" is not available. ) at ../../../kern/kern_shutdown.c:574 #3 0xc046ed77 in db_panic (addr=Could not find the frame base for "db_panic". ) at ../../../ddb/db_command.c:446 #4 0xc046f52a in db_command (last_cmdp=0xc0932a54, cmd_table=0x0, dopager=1) at ../../../ddb/db_command.c:413 #5 0xc046f645 in db_command_loop () at ../../../ddb/db_command.c:466 #6 0xc047117c in db_trap (type=12, code=0) at ../../../ddb/db_main.c:228 #7 0xc0617581 in kdb_trap (type=12, code=0, tf=0xdb76b9fc) at ../../../kern/subr_kdb.c:524 #8 0xc0855adf in trap_fatal (frame=0xdb76b9fc, eva=76) at ../../../i386/i386/trap.c:929 #9 0xc0855d8b in trap_pfault (frame=0xdb76b9fc, usermode=0, eva=76) at ../../../i386/i386/trap.c:851 #10 0xc0856786 in trap (frame=0xdb76b9fc) at ../../../i386/i386/trap.c:529 #11 0xc083b70b in calltrap () at ../../../i386/i386/exception.s:166 #12 0xc05f0a56 in _sx_xlock (sx=0x3c, opts=0, file=0xc0b4953d "/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c", line=1807) at atomic.h:149 #13 0xc0a79185 in dmu_buf_update_user (db_fake=0x0, old_user_ptr=0xc2de3000, user_ptr=0x0, user_data_ptr_ptr=0x0, evict_func=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:1807 #14 0xc0ad0cab in zfs_znode_dmu_fini (zp=0xc2de3000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:557 #15 0xc0aef214 in zfs_freebsd_reclaim (ap=0xdb76baf0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4385 #16 0xc0871602 in VOP_RECLAIM_APV (vop=0xc0b55560, a=0xdb76baf0) at vnode_if.c:1566 #17 0xc066d28f in vgonel (vp=0xc355ce04) at vnode_if.h:819 #18 0xc0670f26 in vflush (mp=0xc3db25a0, rootrefs=0, flags=Variable "flags" is not available. ) at ../../../kern/vfs_subr.c:2408 #19 0xc0aee0c8 in zfs_umount (vfsp=0xc3db25a0, fflag=134217728, td=0xc312bd80) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1005 #20 0xc066a201 in dounmount (mp=0xc3db25a0, flags=134217728, td=0xc312bd80) at ../../../kern/vfs_mount.c:1290 #21 0xc066a957 in unmount (td=0xc312bd80, uap=0xdb76bcfc) at ../../../kern/vfs_mount.c:1186 #22 0xc08560f5 in syscall (frame=0xdb76bd38) at ../../../i386/i386/trap.c:1089 #23 0xc083b770 in Xint0x80_syscall () at ../../../i386/i386/exception.s:262 #24 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (gdb)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200908110820.n7B8K53Y051011>