From owner-freebsd-current@FreeBSD.ORG  Mon Oct 29 18:48:59 2007
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E131D16A418
	for <freebsd-current@freebsd.org>; Mon, 29 Oct 2007 18:48:59 +0000 (UTC)
	(envelope-from mcdouga9@egr.msu.edu)
Received: from mx.egr.msu.edu (surfnturf.egr.msu.edu [35.9.37.164])
	by mx1.freebsd.org (Postfix) with ESMTP id A861013C491
	for <freebsd-current@freebsd.org>; Mon, 29 Oct 2007 18:48:59 +0000 (UTC)
	(envelope-from mcdouga9@egr.msu.edu)
Received: from localhost (localhost.egr.msu.edu [127.0.0.1])
	by mx.egr.msu.edu (Postfix) with ESMTP id 042F22EB948
	for <freebsd-current@freebsd.org>; Sun, 28 Oct 2007 22:43:57 -0400 (EDT)
X-Virus-Scanned: amavisd-new at egr.msu.edu
Received: from mx.egr.msu.edu ([127.0.0.1])
	by localhost (surfnturf.egr.msu.edu [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id Ll-UiXWH21YB for <freebsd-current@freebsd.org>;
	Sun, 28 Oct 2007 22:43:56 -0400 (EDT)
Received: from localhost (daemon.egr.msu.edu [35.9.44.65])
	by mx.egr.msu.edu (Postfix) with ESMTP id D05322EB944
	for <freebsd-current@freebsd.org>; Sun, 28 Oct 2007 22:43:56 -0400 (EDT)
Received: by localhost (Postfix, from userid 21281)
	id B8B5833C22; Sun, 28 Oct 2007 22:43:56 -0400 (EDT)
Date: Sun, 28 Oct 2007 22:43:56 -0400
From: Adam McDougall <mcdouga9@egr.msu.edu>
To: freebsd-current@freebsd.org
Message-ID: <20071029024356.GR3612@egr.msu.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.16 (2007-06-09)
Subject: zfs stuck, cannot do any I/O, processes in Disk Wait
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Oct 2007 18:49:00 -0000

I think I have had this happen at least once before, but someone else
rebooted the system before I could see it.  I have a server with a 
number of zfs filesystems mounted from a raidz, but it won't transfer
any data.  I'm not sure why its stuck.  It is running 7.0-PRERELEASE
Wed Oct 17 and I'm pretty sure it is WITHOUT vm_kern.c.2.patch.  The
system is amd64 and I have not seen a kmem panic since I raised kmem
to 1.5G.  

I logged in to scp a file off of zfs, was able to ls -l to see the file
but the scp hung before transferring any bytes.  Now I cannot do a ls -l
in that directory, /z.  I noticed several days worth of rsync processes
stuck in disk wait, it must have been in this state for several days.
I have no urgent need to reboot this system, its more important to try
to get a permanent fix.  Please let me know what other information I 
can provide.  

10:34PM  up 10 days, 12:37, 3 users, load averages: 0.00, 0.00, 0.00

Stuck rsync processes (started from cron):
1:01AM
4:00AM
5:00AM
Fri01AM
Fri04AM
Fri05AM
Mon05AM
Sat01AM
Sat04AM
Sat05AM
Thu01AM
Thu04AM
Thu05AM
Tue01AM
Tue04AM
Tue05AM
Wed01AM
Wed04AM
Wed05AM

# more /boot/loader.conf 
vm.kmem_size=1610612736
vm.kmem_size_max=1610612736

# sysctl -a | grep vnodes
kern.maxvnodes: 100000
kern.minvnodes: 25000
vfs.freevnodes: 25000
vfs.wantfreevnodes: 25000
vfs.numvnodes: 49996

No errors in dmesg, and I can dd from the drives in the raidz1 fine.

z/backups             101508480         0 101508480     0%    /backups
z/backups/a           101508480         0 101508480     0%    /backups/a
z/backups/b           149992448  48483968 101508480    32%    /backups/b
z/backups/c           219571968 118063488 101508480    54%    /backups/c
z/backups/d           105923968   4415488 101508480     4%    /backups/d
z/data                199868032  98359552 101508480    49%    /data
z                     103982976   2474496 101508480     2%    /z
z/data4               206146688 104638208 101508480    51%    /z/data4
z/mysqldb             102015488    507008 101508480     0%    /z/mysqldb

# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
z                       696G    540G    156G    77%  ONLINE     -

# zpool status
  pool: z
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        z           ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
            ad8     ONLINE       0     0     0

errors: No known data errors