Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Dec 2015 16:01:08 +0100
From:      Bengt Ahlgren <bengta@sics.se>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS hang in zfs_freebsd_rename
Message-ID:  <uh7io3zhi2z.fsf@P142s.sics.se>
In-Reply-To: <56702A9F.90702@multiplay.co.uk> (Steven Hartland's message of "Tue, 15 Dec 2015 14:58:39 %2B0000")
References:  <uh7a8pbj2mo.fsf@P142s.sics.se> <567022FB.1010508@multiplay.co.uk> <uh7vb7zhihv.fsf@P142s.sics.se> <56702A9F.90702@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
OK, thanks for the advice!

Bengt

Steven Hartland <killing@multiplay.co.uk> writes:

> There have been quite a few reported issues with this some at least
> have been fix, but as with anything the only way to be sure is to test
> it.
>
> On 15/12/2015 14:52, Bengt Ahlgren wrote:
>> Yes, that is on the todo list...
>>
>> So this is likely fixed then in 10.x?
>>
>> Bengt
>>
>> Steven Hartland <killing@multiplay.co.uk> writes:
>>
>>> Not a surprise in 9.x unfortunately, try upgrading to 10.x
>>>
>>> On 15/12/2015 12:51, Bengt Ahlgren wrote:
>>>> We have a server running 9.3-REL which currenly has two quite large zfs
>>>> pools:
>>>>
>>>> NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>>>> p1    18.1T  10.7T  7.38T    59%  1.00x  ONLINE  -
>>>> p2    43.5T  29.1T  14.4T    66%  1.00x  ONLINE  -
>>>>
>>>> It has been running without any issues for some time now.  Once, just
>>>> now, processes are getting stuck and impossible to kill on accessing a
>>>> particular directory in the p2 pool.  That pool is a 2x6 disk raidz2.
>>>>
>>>> One process is stuck in zfs_freebsd_rename, and other processes
>>>> accessing that particular directory also get stuck.  The system is now
>>>> almost completely idle.
>>>>
>>>> Output from kgdb on the running system for that first process:
>>>>
>>>> Thread 651 (Thread 102157):
>>>> #0  sched_switch (td=0xfffffe0b14059920, newtd=0xfffffe001633e920, flags=<value optimized out>)
>>>>       at /usr/src/sys/kern/sched_ule.c:1904
>>>> #1  0xffffffff808f4604 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485
>>>> #2 0xffffffff809308e2 in sleepq_wait (wchan=0xfffffe0135b60488,
>>>> pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:618
>>>> #3  0xffffffff808cf922 in __lockmgr_args (lk=0xfffffe0135b60488, flags=524544, ilk=0xfffffe0135b604b8,
>>>>       wmesg=<value optimized out>, pri=<value optimized out>, timo=<value optimized out>,
>>>>       file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c", line=2337) at /usr/src/sys/kern/kern_lock.c:221
>>>> #4  0xffffffff80977369 in vop_stdlock (ap=<value optimized out>) at lockmgr.h:97
>>>> #5  0xffffffff80dd4a04 in VOP_LOCK1_APV (vop=0xffffffff813e8160, a=0xffffffa07f935520) at vnode_if.c:2052
>>>> #6  0xffffffff80998c17 in _vn_lock (vp=0xfffffe0135b603f0, flags=524288,
>>>>       file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c", line=2337) at vnode_if.h:859
>>>> #7  0xffffffff8098b621 in vputx (vp=0xfffffe0135b603f0, func=1) at /usr/src/sys/kern/vfs_subr.c:2337
>>>> #8  0xffffffff81ac7955 in zfs_rename_unlock (zlpp=0xffffffa07f9356b8)
>>>>       at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3609
>>>> #9  0xffffffff81ac8c72 in zfs_freebsd_rename (ap=<value optimized out>)
>>>>       at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4039
>>>> #10 0xffffffff80dd4f04 in VOP_RENAME_APV (vop=0xffffffff81b47d40, a=0xffffffa07f9358e0) at vnode_if.c:1522
>>>> #11 0xffffffff80996bbd in kern_renameat (td=<value optimized out>, oldfd=<value optimized out>,
>>>>       old=<value optimized out>, newfd=-100, new=0x1826a9af00 <Error reading address 0x1826a9af00: Bad address>,
>>>>       pathseg=<value optimized out>) at vnode_if.h:636
>>>> #12 0xffffffff80cd228a in amd64_syscall (td=0xfffffe0b14059920, traced=0) at subr_syscall.c:135
>>>> #13 0xffffffff80cbc907 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
>>>> ---Type <return> to continue, or q <return> to quit---
>>>> #14 0x0000000800cc1acc in ?? ()
>>>> Previous frame inner to this frame (corrupt stack?)
>>>>
>>>> Full procstat -kk -a and kgdb "thread apply all bt" can be found here:
>>>>
>>>> https://www.sics.se/~bengta/ZFS-hang/
>>>>
>>>> I don't know how to produce "alltrace in ddb" as the instructions in the
>>>> wiki says.  It runs the GENERIC kernel, so perhaps it isn't possible?
>>>>
>>>> I checked "camcontrol tags" for all the disks in the pool - all have
>>>> zeroes for dev_active, devq_queued and held.
>>>>
>>>> Is there anything else I can check while the machine is up?  I however
>>>> need to restart it pretty soon.
>>>>
>>>> Bengt
>>>> _______________________________________________
>>>> freebsd-fs@freebsd.org mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?uh7io3zhi2z.fsf>