From owner-freebsd-current@FreeBSD.ORG Mon Oct 29 18:48:59 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E131D16A418 for ; Mon, 29 Oct 2007 18:48:59 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mx.egr.msu.edu (surfnturf.egr.msu.edu [35.9.37.164]) by mx1.freebsd.org (Postfix) with ESMTP id A861013C491 for ; Mon, 29 Oct 2007 18:48:59 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from localhost (localhost.egr.msu.edu [127.0.0.1]) by mx.egr.msu.edu (Postfix) with ESMTP id 042F22EB948 for ; Sun, 28 Oct 2007 22:43:57 -0400 (EDT) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mx.egr.msu.edu ([127.0.0.1]) by localhost (surfnturf.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ll-UiXWH21YB for ; Sun, 28 Oct 2007 22:43:56 -0400 (EDT) Received: from localhost (daemon.egr.msu.edu [35.9.44.65]) by mx.egr.msu.edu (Postfix) with ESMTP id D05322EB944 for ; Sun, 28 Oct 2007 22:43:56 -0400 (EDT) Received: by localhost (Postfix, from userid 21281) id B8B5833C22; Sun, 28 Oct 2007 22:43:56 -0400 (EDT) Date: Sun, 28 Oct 2007 22:43:56 -0400 From: Adam McDougall To: freebsd-current@freebsd.org Message-ID: <20071029024356.GR3612@egr.msu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.16 (2007-06-09) Subject: zfs stuck, cannot do any I/O, processes in Disk Wait X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2007 18:49:00 -0000 I think I have had this happen at least once before, but someone else rebooted the system before I could see it. I have a server with a number of zfs filesystems mounted from a raidz, but it won't transfer any data. I'm not sure why its stuck. It is running 7.0-PRERELEASE Wed Oct 17 and I'm pretty sure it is WITHOUT vm_kern.c.2.patch. The system is amd64 and I have not seen a kmem panic since I raised kmem to 1.5G. I logged in to scp a file off of zfs, was able to ls -l to see the file but the scp hung before transferring any bytes. Now I cannot do a ls -l in that directory, /z. I noticed several days worth of rsync processes stuck in disk wait, it must have been in this state for several days. I have no urgent need to reboot this system, its more important to try to get a permanent fix. Please let me know what other information I can provide. 10:34PM up 10 days, 12:37, 3 users, load averages: 0.00, 0.00, 0.00 Stuck rsync processes (started from cron): 1:01AM 4:00AM 5:00AM Fri01AM Fri04AM Fri05AM Mon05AM Sat01AM Sat04AM Sat05AM Thu01AM Thu04AM Thu05AM Tue01AM Tue04AM Tue05AM Wed01AM Wed04AM Wed05AM # more /boot/loader.conf vm.kmem_size=1610612736 vm.kmem_size_max=1610612736 # sysctl -a | grep vnodes kern.maxvnodes: 100000 kern.minvnodes: 25000 vfs.freevnodes: 25000 vfs.wantfreevnodes: 25000 vfs.numvnodes: 49996 No errors in dmesg, and I can dd from the drives in the raidz1 fine. z/backups 101508480 0 101508480 0% /backups z/backups/a 101508480 0 101508480 0% /backups/a z/backups/b 149992448 48483968 101508480 32% /backups/b z/backups/c 219571968 118063488 101508480 54% /backups/c z/backups/d 105923968 4415488 101508480 4% /backups/d z/data 199868032 98359552 101508480 49% /data z 103982976 2474496 101508480 2% /z z/data4 206146688 104638208 101508480 51% /z/data4 z/mysqldb 102015488 507008 101508480 0% /z/mysqldb # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT z 696G 540G 156G 77% ONLINE - # zpool status pool: z state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM z ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 ad8 ONLINE 0 0 0 errors: No known data errors