From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 12 13:17:08 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6CACC106564A
	for <freebsd-fs@freebsd.org>; Thu, 12 Jul 2012 13:17:08 +0000 (UTC)
	(envelope-from freebsd-listen@fabiankeil.de)
Received: from smtprelay02.ispgateway.de (smtprelay02.ispgateway.de
	[80.67.18.14]) by mx1.freebsd.org (Postfix) with ESMTP id D9B928FC08
	for <freebsd-fs@freebsd.org>; Thu, 12 Jul 2012 13:17:07 +0000 (UTC)
Received: from [84.44.178.238] (helo=fabiankeil.de)
	by smtprelay02.ispgateway.de with esmtpsa (TLSv1:AES128-SHA:128)
	(Exim 4.68) (envelope-from <freebsd-listen@fabiankeil.de>)
	id 1SpJGP-0001QE-5k
	for freebsd-fs@freebsd.org; Thu, 12 Jul 2012 15:17:01 +0200
Date: Thu, 12 Jul 2012 15:15:41 +0200
From: Fabian Keil <freebsd-listen@fabiankeil.de>
To: freebsd-fs@freebsd.org
Message-ID: <20120712151541.7f3a6886@fabiankeil.de>
In-Reply-To: <1341864787.32803.43.camel@btw.pki2.com>
References: <1341864787.32803.43.camel@btw.pki2.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=PGP-SHA1;
	boundary="Sig_/H.zTSzvrsVIoBuPQO6PM00z";
	protocol="application/pgp-signature"
X-Df-Sender: Nzc1MDY3
Subject: Re: ZFS hanging
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Jul 2012 13:17:08 -0000

--Sig_/H.zTSzvrsVIoBuPQO6PM00z
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Dennis Glatting <freebsd@pki2.com> wrote:

> I have a ZFS array of disks where the system simply stops as if forever
> blocked by some IO mutex. This happens often and the following is the
> output of top:
>=20
> last pid:  6075;  load averages:  0.00,  0.00,  0.00    up 0+16:54:41
> 13:04:10
> 135 processes: 1 running, 134 sleeping
> CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> Mem: 47M Active, 24M Inact, 18G Wired, 120M Buf, 44G Free
> Swap: 32G Total, 32G Free
>=20
>   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU
> COMMAND
>  2410 root          1  33    0 11992K  2820K zio->i  7 331:25  0.00%
> bzip2
>  2621 root          1  52    4 28640K  5544K tx->tx 24 245:33  0.00%
> john
>  2624 root          1  48    4 28640K  5544K tx->tx  4 239:08  0.00%
> john
>  2623 root          1  49    4 28640K  5544K tx->tx  7 238:44  0.00%
> john

Does top continue to run or does it hang as well?

I believe the locks shown above shouldn't affect already
running applications that don't cause disk traffic.

> My question is what should I be looking at and how should I look at it?
> There is nothing in the logs or the console, rather the system is
> forever paused and entering commands results in no response (it's as if
> everything is deadlocked).

If the entered commands actually start you can try sending
SIGINFO with CTRL+T to request a status:

fk@r500 ~ $zpool status
load: 0.15  cmd: zpool 2698 [spa_namespace_lock] 543.23r 0.00u 0.12s 0% 290=
8k

If you can run procstat you can try getting kernel stack traces
for some (or all) processes that should give you a rough idea
of how the lock is reached:

fk@r500 ~ $procstat -kk $(pgrep zpool)
  PID    TID COMM             TDNAME           KSTACK
 2698 100922 zpool            -                mi_switch+0x196 sleepq_wait+=
0x42 _sx_xlock_hard+0x525 _sx_xlock+0x75 spa_all_configs+0x5e zfs_ioc_pool_=
configs+0x29 zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl+0x115 sys_ioct=
l+0xfd amd64_syscall+0x5f9 Xfast_syscall+0xf7
 2388 100431 zpool            -                mi_switch+0x196 sleepq_wait+=
0x42 _sx_xlock_hard+0x525 _sx_xlock+0x75 spa_open_common+0x7a spa_get_stats=
+0x5b zfs_ioc_pool_stats+0x2c zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioc=
tl+0x115 sys_ioctl+0xfd amd64_syscall+0x5f9 Xfast_syscall+0xf7

If procstat hangs as well you could try executing it in a loop
before the problem occurs. If it stays in the cache it may keep
running after most of the other processes stop responding.

DTrace can be useful for analyzing locking issues as well:

fk@r500 ~ $sudo ~/scripts/flowtrace.d zfsdev_ioctl
0 101057 2012 Jul 12 14:59:42.155 00: Trace in progress. Waiting to enter z=
fsdev_ioctl. Hit CTRL-C to exit.
0 100979 2012 Jul 12 14:59:43.933 04:      --> zfsdev_ioctl:entry
0 100979 2012 Jul 12 14:59:43.933 05:       --> zfs_secpolicy_none:entry
0 100979 2012 Jul 12 14:59:43.933 05:       <-- zfs_secpolicy_none:return
0 100979 2012 Jul 12 14:59:43.933 05:       --> zfs_ioc_pool_configs:entry
0 100979 2012 Jul 12 14:59:43.933 06:        --> spa_all_configs:entry
0 100979 2012 Jul 12 14:59:43.933 07:         --> nvlist_alloc:entry
0 100979 2012 Jul 12 14:59:43.933 07:             [...]
0 100979 2012 Jul 12 14:59:43.933 07:         <-- nvlist_alloc:return
0 100979 2012 Jul 12 14:59:43.933 07:         --> _sx_xlock:entry
0 100979 2012 Jul 12 14:59:43.933 08:          --> _sx_xlock_hard:entry
0 100979 2012 Jul 12 14:59:43.933 09:           --> sleepq_lock:entry
0 100979 2012 Jul 12 14:59:43.933 10:            --> spinlock_enter:entry
0 100979 2012 Jul 12 14:59:43.933 10:            --> critical_enter:entry
0 100979 2012 Jul 12 14:59:43.933 10:            <-- critical_enter:return
0 100979 2012 Jul 12 14:59:43.933 09:           <-- sleepq_lock:return
0 100979 2012 Jul 12 14:59:43.933 09:           --> lockstat_nsecs:entry
0 100979 2012 Jul 12 14:59:43.933 10:            --> binuptime:entry
0 100979 2012 Jul 12 14:59:43.933 11:             --> hpet_get_timecount:en=
try
0 100979 2012 Jul 12 14:59:43.933 11:             <-- hpet_get_timecount:re=
turn
0 100979 2012 Jul 12 14:59:43.933 10:            <-- binuptime:return
0 100979 2012 Jul 12 14:59:43.933 09:           <-- lockstat_nsecs:return
0 100979 2012 Jul 12 14:59:43.933 09:           --> sleepq_add:entry
0 100979 2012 Jul 12 14:59:43.933 10:            --> sleepq_lookup:entry
0 100979 2012 Jul 12 14:59:43.933 10:            <-- sleepq_lookup:return
0 100979 2012 Jul 12 14:59:43.933 10:            --> thread_lock_flags_:ent=
ry
0 100979 2012 Jul 12 14:59:43.933 11:             --> spinlock_enter:entry
0 100979 2012 Jul 12 14:59:43.933 11:             --> critical_enter:entry
0 100979 2012 Jul 12 14:59:43.933 11:             <-- critical_enter:return
0 100979 2012 Jul 12 14:59:43.933 10:            <-- thread_lock_flags_:ret=
urn
0 100979 2012 Jul 12 14:59:43.933 09:           --> spinlock_exit:entry
0 100979 2012 Jul 12 14:59:43.933 10:            --> critical_exit:entry
0 100979 2012 Jul 12 14:59:43.933 10:            <-- critical_exit:return
0 100979 2012 Jul 12 14:59:43.933 09:           <-- spinlock_exit:return
0 100979 2012 Jul 12 14:59:43.933 09:           --> sleepq_wait:entry
0 100979 2012 Jul 12 14:59:43.933 10:            --> thread_lock_flags_:ent=
ry
0 100979 2012 Jul 12 14:59:43.933 11:             --> spinlock_enter:entry
0 100979 2012 Jul 12 14:59:43.933 11:             --> critical_enter:entry
0 100979 2012 Jul 12 14:59:43.933 11:             <-- critical_enter:return
0 100979 2012 Jul 12 14:59:43.933 10:            <-- thread_lock_flags_:ret=
urn
0 100979 2012 Jul 12 14:59:43.933 10:            --> sleepq_switch:entry
0 100979 2012 Jul 12 14:59:43.933 11:             --> sched_sleep:entry
0 100979 2012 Jul 12 14:59:43.933 11:             <-- sched_sleep:return
0 100979 2012 Jul 12 14:59:43.933 11:             --> thread_lock_set:entry
0 100979 2012 Jul 12 14:59:43.933 11:             --> spinlock_exit:entry
0 100979 2012 Jul 12 14:59:43.933 12:              --> critical_exit:entry
0 100979 2012 Jul 12 14:59:43.933 12:              <-- critical_exit:return
0 100979 2012 Jul 12 14:59:43.933 11:             <-- spinlock_exit:return
0 100979 2012 Jul 12 14:59:43.933 10:            --> mi_switch:entry
0 100979 2012 Jul 12 14:59:43.933 11:             --> rdtsc:entry
0 100979 2012 Jul 12 14:59:43.933 11:             <-- rdtsc:return
0 100979 2012 Jul 12 14:59:43.933 11:             --> sched_switch:entry
0 100979 2012 Jul 12 14:59:43.933 12:              --> sched_pctcpu_update:=
entry
0 100979 2012 Jul 12 14:59:43.933 12:              <-- sched_pctcpu_update:=
return
0 100979 2012 Jul 12 14:59:43.933 12:              --> spinlock_enter:entry
0 100979 2012 Jul 12 14:59:43.933 12:              --> critical_enter:entry
0 100979 2012 Jul 12 14:59:43.933 12:              <-- critical_enter:return
0 100979 2012 Jul 12 14:59:43.933 12:              --> thread_lock_block:en=
try
0 100979 2012 Jul 12 14:59:43.933 13:               --> spinlock_exit:entry
0 100979 2012 Jul 12 14:59:43.933 14:                --> critical_exit:entry
0 100979 2012 Jul 12 14:59:43.933 14:                <-- critical_exit:retu=
rn
0 100979 2012 Jul 12 14:59:43.933 13:               <-- spinlock_exit:return
0 100979 2012 Jul 12 14:59:43.933 12:              <-- thread_lock_block:re=
turn
0 100979 2012 Jul 12 14:59:43.933 12:              --> tdq_load_rem:entry
0 100979 2012 Jul 12 14:59:43.933 12:              <-- tdq_load_rem:return
0 100979 2012 Jul 12 14:59:43.933 12:              --> choosethread:entry
0 100979 2012 Jul 12 14:59:43.933 13:               --> sched_choose:entry
0 100979 2012 Jul 12 14:59:43.933 14:                --> tdq_choose:entry
0 100979 2012 Jul 12 14:59:43.933 15:                 --> runq_choose:entry
0 100979 2012 Jul 12 14:59:43.933 15:                 <-- runq_choose:return
0 100979 2012 Jul 12 14:59:43.933 15:                 --> runq_choose_from:=
entry
0 100979 2012 Jul 12 14:59:43.933 15:                 <-- runq_choose_from:=
return
0 100979 2012 Jul 12 14:59:43.933 14:                --> runq_choose:entry
0 100979 2012 Jul 12 14:59:43.933 14:                <-- runq_choose:return
0 100979 2012 Jul 12 14:59:43.933 13:               <-- sched_choose:return
0 100979 2012 Jul 12 14:59:43.933 12:              <-- choosethread:return
0 100979 2012 Jul 12 14:59:43.933 12:              --> sched_pctcpu_update:=
entry
0 100979 2012 Jul 12 14:59:43.933 12:              <-- sched_pctcpu_update:=
return

Of course once you know where ZFS is hanging you still have to figure
out the why ...

Fabian

--Sig_/H.zTSzvrsVIoBuPQO6PM00z
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAk/+zgAACgkQBYqIVf93VJ1E8gCfavphCJo5CRKraHp5s6ii1eKK
Z1AAnRBgjfT4Uc0x+fyAm1Z4OxbJKgPF
=wxht
-----END PGP SIGNATURE-----

--Sig_/H.zTSzvrsVIoBuPQO6PM00z--