From owner-freebsd-fs@FreeBSD.ORG Tue Mar 27 18:14:58 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 847E9106564A for ; Tue, 27 Mar 2012 18:14:58 +0000 (UTC) (envelope-from tdb@carrick.bishnet.net) Received: from carrick.bishnet.net (carrick.bishnet.net [IPv6:2a01:348:132:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 457148FC0A for ; Tue, 27 Mar 2012 18:14:58 +0000 (UTC) Received: from carrick-users.bishnet.net ([2a01:348:132:51::10]) by carrick.bishnet.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1SCav3-0000cp-RE for freebsd-fs@freebsd.org; Tue, 27 Mar 2012 19:14:57 +0100 Received: (from tdb@localhost) by carrick-users.bishnet.net (8.14.4/8.14.4/Submit) id q2RIEv8g002406 for freebsd-fs@freebsd.org; Tue, 27 Mar 2012 19:14:57 +0100 (BST) (envelope-from tdb) Date: Tue, 27 Mar 2012 19:14:57 +0100 From: Tim Bishop To: freebsd-fs@freebsd.org Message-ID: <20120327181457.GC24787@carrick-users.bishnet.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-PGP-Key: 0x5AE7D984, http://www.bishnet.net/tim/tim-bishnet-net.asc X-PGP-Fingerprint: 1453 086E 9376 1A50 ECF6 AE05 7DCE D659 5AE7 D984 User-Agent: Mutt/1.5.21 (2010-09-15) Subject: ZFS: processes hanging when trying to access filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Mar 2012 18:14:58 -0000 I have a machine running 8-STABLE amd64 from the end of last week. I have a problem where the machine starts to freeze up. Any process accessing the ZFS filesystems hangs, which eventually causes more and more processes to be spawned (cronjobs, etc, never complete). Although the root filesystem is on UFS (the machine hosts jails on ZFS), eventually I can't log in anymore. The problem occurs when the frequently used part of the ARC gets too large. See this graph: http://dl.dropbox.com/u/318044/zfs_arc_utilization-day.png At the right of the graph things started to hang. At the same time I see a high amount of context switching. I picked a hanging process and procstat showed the following: PID TID COMM TDNAME KSTACK 24787 100303 mutt - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_open+0x85 dmu_tx_assign+0x170 zfs_inactive+0xf1 zfs_freebsd_inactive+0x1a vinactive+0x71 vputx+0x2d8 null_reclaim+0xb3 vgonel+0x119 vrecycle+0x7b null_inactive+0x1f vinactive+0x71 vputx+0x2d8 vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 I'm running a reduced amount of jails on the machine at the moment which is limiting the speed at which the machine freezes up completely. I'd like to debug this problem further, so any advice on useful information to collect would be appreciated. I've had this problem on the machine before[1] but adding more RAM allievated the issue. Tim. [1] http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058541.html -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984