From owner-freebsd-fs@FreeBSD.ORG Mon Aug 1 16:49:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B6D421065670 for ; Mon, 1 Aug 2011 16:49:06 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 49EE38FC18 for ; Mon, 1 Aug 2011 16:49:06 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1Qnvfs-0000DB-Us for freebsd-fs@freebsd.org; Mon, 01 Aug 2011 18:49:04 +0200 Received: from ib-jtotz.ib.ic.ac.uk ([155.198.110.220]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 01 Aug 2011 18:49:04 +0200 Received: from jtotz by ib-jtotz.ib.ic.ac.uk with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 01 Aug 2011 18:49:04 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Johannes Totz Date: Mon, 01 Aug 2011 17:48:51 +0100 Lines: 71 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: ib-jtotz.ib.ic.ac.uk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11 Subject: zfs hang on zfs access X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Aug 2011 16:49:06 -0000 Hi! Lots of processes hang when accessing one specific zfs data set. All other sets are fine. This looks similar to the pool-hang report by Steven Hartland last week, but the machine in question has no high uptime: (Pasted as quote to avoid line-break) > last pid: 274; load averages: 0.00, 0.48, 1.21 up 6+13:06:28 17:35:29 > 71 processes: 1 running, 70 sleeping > CPU: 0.2% user, 0.0% nice, 0.2% system, 0.0% interrupt, 99.6% idle > Mem: 26M Active, 18M Inact, 5018M Wired, 1512K Cache, 597M Buf, 617M Free > Swap: 4096M Total, 27M Used, 4069M Free > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 274 root 1 44 0 9376K 2864K CPU0 1 0:00 0.29% top > 99902 root 1 44 0 29152K 5440K h->has 1 0:00 0.00% smbd > 99922 root 1 44 0 29152K 5056K h->has 1 0:00 0.00% smbd > 99995 root 1 44 0 29152K 5072K zfs 1 0:00 0.00% smbd > 99958 root 1 44 0 29152K 5056K zfsvfs 1 0:00 0.00% smbd > 99976 root 1 45 0 29152K 4824K zfs 1 0:00 0.00% smbd > 99946 root 1 44 0 29152K 4692K zfsvfs 1 0:00 0.00% smbd > 99949 root 1 44 0 29152K 4700K zfsvfs 1 0:00 0.00% smbd > 205 root 1 44 0 29152K 4800K lockf 1 0:00 0.00% smbd > 219 root 1 44 0 29152K 4748K lockf 1 0:00 0.00% smbd > 99932 root 1 44 0 29152K 4704K zfsvfs 1 0:00 0.00% smbd > 99921 root 1 44 0 29152K 4584K zfs 1 0:00 0.00% smbd > 99951 root 1 44 0 29152K 4688K h->has 1 0:00 0.00% smbd > 229 root 1 44 0 29152K 4760K lockf 1 0:00 0.00% smbd > 218 root 1 44 0 29152K 4720K lockf 1 0:00 0.00% smbd > 99952 root 1 44 0 29152K 4688K h->has 1 0:00 0.00% smbd > 99956 root 1 44 0 28816K 4212K zfs 1 0:00 0.00% smbd > 99978 root 1 44 0 29152K 4740K zfs 1 0:00 0.00% smbd > 199 backuppc 1 44 0 12608K 2820K h->has 1 0:00 0.00% perl5.8.9 > 99945 root 1 44 0 28816K 4212K zfs 1 0:00 0.00% smbd > 105 root 1 44 0 29152K 4652K zfs 1 0:00 0.00% smbd > 181 root 1 44 0 28816K 4160K zfs 1 0:00 0.00% smbd > 178 root 1 44 0 28816K 4156K zfs 1 0:00 0.00% smbd > 99950 root 1 44 0 28816K 4208K zfs 1 0:00 0.00% smbd > 257 jo 1 52 0 8252K 1564K zfs 1 0:00 0.00% ls > 99929 root 1 44 0 28816K 4204K zfs 1 0:00 0.00% smbd > 108 root 1 45 0 29152K 4620K zfs 1 0:00 0.00% smbd > 177 root 1 76 0 17976K 2308K tx->tx 1 0:00 0.00% zfs > 136 root 1 76 0 2764K 1048K wait 0 0:00 0.00% lockf I snipped off the ones that look harmless, i.e. the above processes look strange to me. All the smbd are a result of Windows laptops trying to access the share that lives on the hang-causing zfs dataset. > #procstat -kk 257 > PID TID COMM TDNAME KSTACK > 257 100335 ls - mi_switch+0x1c2 sleepq_switch+0xdc sleepq_wait+0x45 __lockmgr_args+0x8e2 vop_stdlock+0x51 VOP_LOCK1_APV+0x55 _vn_lock+0x48 cache_lookup+0x63f vfs_cache_lookup+0xad VOP_LOOKUP_APV+0x53 lookup+0x624 namei+0x597 vn_open_cred+0x340 vn_open+0x1c kern_openat+0x163 kern_open+0x19 open+0x18 syscallenter+0x2fe > #procstat -kk 136 > PID TID COMM TDNAME KSTACK > 136 100264 lockf - mi_switch+0x1c2 sleepq_switch+0xdc sleepq_catch_signals+0x57 sleepq_wait_sig+0xc _sleep+0x26e kern_wait+0xeda wait4+0x37 syscallenter+0x2fe syscall+0x41 Xfast_syscall+0xe2 This is on: > FreeBSD XXX 8.2-STABLE FreeBSD 8.2-STABLE #0 r224227: Wed Jul 20 16:55:23 BST 2011 root@XXX:/usr/obj/usr/src/sys/GENERIC amd64 Any zfs list or similar hangs as well. Last night's scrub finished without any errors. It is likely that the hang occured during an hourly snapshot (no more log entries about recent snapshots). Ideas? Johannes