Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 May 2011 02:43:44 -0400 (EDT)
From:      Charles Sprickman <spork@bway.net>
To:        stable@freebsd.org
Subject:   8.1R possible zfs snapshot livelock?
Message-ID:  <alpine.OSX.2.00.1105170120510.1983@hotlap.nat.fasttrackmonkey.com>

next in thread | raw e-mail | index | archive | help
Hello,

Not sure if it's worth troubleshooting this too much before upgrading, but 
we recently had an 8.1R/amd64 box hang in a way that suggested everything 
was waiting on disk access.  It's remote and we had to resort to a 
power-cycle to bring it back (we have serial console, but it hung after 
accepting the root password).

We run hourly/daily/weekly/monthly snapshots on about a half dozen 
filesystems using RSE's snaphot script 
(see http://people.freebsd.org/~rse/snapshot/ - we only use the zfs 
snapshotting and do not use the amd portion).  We have some basic stats 
logged on all our boxes every 5 minutes and I saw a pile of cron jobs 
stuck in disk I/O wait.  I suspect these were the snapshots.  Shortly 
after that it seems as if all disk I/O got hung.

Some additional info about what the main tasks are on this box:

-qmail deliveries (lots)
-postgres (light use)
-nfs export of qmail log dirs to another box that does log analysis

All services are spread amongst a handful of jails.  Each jail has it's 
out zfs filesystem.

Does this sound familiar to anyone running ZFS with snapshots?  Anything I 
should log to get more data if this happens again?  I have output from 
arc_summary.pl running every 5 minutes as part of our general status 
logging.

Any pointers to known issues in ZFS (both 8.1 an 8.2) would be helpful.

Also, anywhere to look for the general state of ZFS besides this page?

http://wiki.freebsd.org/ZFS

Thanks,

Charles



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.OSX.2.00.1105170120510.1983>