Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Aug 2009 08:20:05 GMT
From:      Peter Much <pmc@citylink.dinoex.sub.org>
To:        freebsd-fs@FreeBSD.org
Subject:   Re: kern/137037: [zfs] [hang] zfs rollback on root causes FreeBSD to freeze in few seconds
Message-ID:  <200908110820.n7B8K53Y051011@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/137037; it has been noted by GNATS.

From: Peter Much <pmc@citylink.dinoex.sub.org>
To: bug-followup@FreeBSD.org, killasmurf86@gmail.com
Cc:  
Subject: Re: kern/137037: [zfs] [hang] zfs rollback on root causes FreeBSD to freeze in few seconds
Date: Tue, 11 Aug 2009 09:34:10 +0200

 I considered to do more investigations before reporting my issue,
 but after seeing this bug report I think an interim report from
 my side should not harm.
 
 I also experience system failures after rollback, and the significant
 similarity is that in my case also the rollbacks succeed, and the
 system continues to work for some seconds (or sometimes even longer)
 before it fails.
 
 The failure is either (seldom) a system freeze or (much more often)
 an instanteous reboot without dumping. I am currently investigating
 about methods to capture some useful data. Maybe, if it freezes,
 running "watchdog" can trick it to do a dump...
 
 I am running 7.2-STABLE as of mid-July (that is ZFS V13).
 
 I admit I am someway low on memory to run ZFS (memory is
 ordered ;) ), but I use it only for a very limited number of 
 filesystems and specific tasks, and I am watching carefully about
 my mem usage. Nevertheless, if the system would run out of memory,
 I would expect an orderly panic and not some hard reset or freeze.
 
 I am not using geli or anything like, also I am not working with
 the root; what I am doing is mainly an extensive use of the rollback
 feature, from script, in a way like this:
 
     while <some stuff>
     do
         zfs mount jb/x
         mount -t zfs jb/p /jb/x/p
         ... do some work ...
         umount /jb/x/p
         umount /jb/x
         zfs rollback jb/x@base
         zfs rollback jb/p@base
     done
 
 At first I tried this without the unmounting, but the crashes
 were so reproducible that I considered that unfunctional. With
 the unmounting it looked functional first, but now I also experience
 crashes about every 12 hours.
 
 Beware: this is an interim report, I have not yet extensively 
 verified against possibilities of my own mistakes. Take it with
 the appropriate grains of salt. ;)
 
 ------------------------------
 
 Update: I was able to obtain a dump. After running the above loop
 in a tough way and staying on the console, it suddenly
 started to do havoc, reported that it were not able to unmount the
 filesystems or could not detect them (something I also had seen 
 occasionally before) and then dropped me into the debugger at 
   _sx_xlock+0x16 lock cmpxchgl %edx,0x10(%ecx)
 
 The backtrace see attached below - but beware, since the havoc had
 already started before, this will very likely NOT point to the
 root cause of the problem. But maybe it gives some first impression.
 
 I suppose this should be reproducible, but in any case I would be glad 
 to provide further data if requested (or do further tests). 
 
 And as said before - if this is a result of low memory, then I am
 just sorry. ;)
 
 Ah, btw, its a dual Pentium3 SMP machine.
 
 (gdb) add-symbol-file /usr/src/sys/i386/compile/D1R72V1/modules/usr/src/sys/modules/zfs/zfs.ko 0xc0a59860
 add symbol table from file "/usr/src/sys/i386/compile/D1R72V1/modules/usr/src/sys/modules/zfs/zfs.ko" at
 	.text_addr = 0xc0a59860
 (gdb) bt
 #0  doadump () at pcpu.h:196
 #1  0xc05e8be6 in boot (howto=260) at ../../../kern/kern_shutdown.c:418
 #2  0xc05e8f07 in panic (fmt=Variable "fmt" is not available.
 ) at ../../../kern/kern_shutdown.c:574
 #3  0xc046ed77 in db_panic (addr=Could not find the frame base for "db_panic".
 ) at ../../../ddb/db_command.c:446
 #4  0xc046f52a in db_command (last_cmdp=0xc0932a54, cmd_table=0x0, dopager=1) at ../../../ddb/db_command.c:413
 #5  0xc046f645 in db_command_loop () at ../../../ddb/db_command.c:466
 #6  0xc047117c in db_trap (type=12, code=0) at ../../../ddb/db_main.c:228
 #7  0xc0617581 in kdb_trap (type=12, code=0, tf=0xdb76b9fc) at ../../../kern/subr_kdb.c:524
 #8  0xc0855adf in trap_fatal (frame=0xdb76b9fc, eva=76) at ../../../i386/i386/trap.c:929
 #9  0xc0855d8b in trap_pfault (frame=0xdb76b9fc, usermode=0, eva=76) at ../../../i386/i386/trap.c:851
 #10 0xc0856786 in trap (frame=0xdb76b9fc) at ../../../i386/i386/trap.c:529
 #11 0xc083b70b in calltrap () at ../../../i386/i386/exception.s:166
 #12 0xc05f0a56 in _sx_xlock (sx=0x3c, opts=0, file=0xc0b4953d "/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c", line=1807) at atomic.h:149
 #13 0xc0a79185 in dmu_buf_update_user (db_fake=0x0, old_user_ptr=0xc2de3000, user_ptr=0x0, user_data_ptr_ptr=0x0, evict_func=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:1807
 #14 0xc0ad0cab in zfs_znode_dmu_fini (zp=0xc2de3000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:557
 #15 0xc0aef214 in zfs_freebsd_reclaim (ap=0xdb76baf0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4385
 #16 0xc0871602 in VOP_RECLAIM_APV (vop=0xc0b55560, a=0xdb76baf0) at vnode_if.c:1566
 #17 0xc066d28f in vgonel (vp=0xc355ce04) at vnode_if.h:819
 #18 0xc0670f26 in vflush (mp=0xc3db25a0, rootrefs=0, flags=Variable "flags" is not available.
 ) at ../../../kern/vfs_subr.c:2408
 #19 0xc0aee0c8 in zfs_umount (vfsp=0xc3db25a0, fflag=134217728, td=0xc312bd80) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1005
 #20 0xc066a201 in dounmount (mp=0xc3db25a0, flags=134217728, td=0xc312bd80) at ../../../kern/vfs_mount.c:1290
 #21 0xc066a957 in unmount (td=0xc312bd80, uap=0xdb76bcfc) at ../../../kern/vfs_mount.c:1186
 #22 0xc08560f5 in syscall (frame=0xdb76bd38) at ../../../i386/i386/trap.c:1089
 #23 0xc083b770 in Xint0x80_syscall () at ../../../i386/i386/exception.s:262
 #24 0x00000033 in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 (gdb) 
 
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200908110820.n7B8K53Y051011>