From owner-freebsd-stable@FreeBSD.ORG Tue May 17 07:10:26 2011 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D78E0106564A for ; Tue, 17 May 2011 07:10:26 +0000 (UTC) (envelope-from spork@bway.net) Received: from xena.bway.net (xena.bway.net [216.220.96.26]) by mx1.freebsd.org (Postfix) with ESMTP id 753888FC0A for ; Tue, 17 May 2011 07:10:26 +0000 (UTC) Received: (qmail 26201 invoked by uid 0); 17 May 2011 06:43:45 -0000 Received: from smtp.bway.net (216.220.96.25) by xena.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP; 17 May 2011 06:43:45 -0000 Received: (qmail 26197 invoked by uid 90); 17 May 2011 06:43:45 -0000 Received: from unknown (HELO hotlap.nat.fasttrackmonkey.com) (spork@96.57.144.66) by smtp.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP; 17 May 2011 06:43:45 -0000 Date: Tue, 17 May 2011 02:43:44 -0400 (EDT) From: Charles Sprickman X-X-Sender: spork@hotlap.nat.fasttrackmonkey.com To: stable@freebsd.org Message-ID: User-Agent: Alpine 2.00 (OSX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Cc: Subject: 8.1R possible zfs snapshot livelock? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 May 2011 07:10:26 -0000 Hello, Not sure if it's worth troubleshooting this too much before upgrading, but we recently had an 8.1R/amd64 box hang in a way that suggested everything was waiting on disk access. It's remote and we had to resort to a power-cycle to bring it back (we have serial console, but it hung after accepting the root password). We run hourly/daily/weekly/monthly snapshots on about a half dozen filesystems using RSE's snaphot script (see http://people.freebsd.org/~rse/snapshot/ - we only use the zfs snapshotting and do not use the amd portion). We have some basic stats logged on all our boxes every 5 minutes and I saw a pile of cron jobs stuck in disk I/O wait. I suspect these were the snapshots. Shortly after that it seems as if all disk I/O got hung. Some additional info about what the main tasks are on this box: -qmail deliveries (lots) -postgres (light use) -nfs export of qmail log dirs to another box that does log analysis All services are spread amongst a handful of jails. Each jail has it's out zfs filesystem. Does this sound familiar to anyone running ZFS with snapshots? Anything I should log to get more data if this happens again? I have output from arc_summary.pl running every 5 minutes as part of our general status logging. Any pointers to known issues in ZFS (both 8.1 an 8.2) would be helpful. Also, anywhere to look for the general state of ZFS besides this page? http://wiki.freebsd.org/ZFS Thanks, Charles