Date: Tue, 3 Jun 2008 14:45:26 +0200 From: Lorenzo Perone <lopez.on.the.lists@yellowspace.net> To: freebsd-fs@freebsd.org Subject: Re: ZFS lockup in "zfs" state Message-ID: <38DAE942-319A-4A44-A8F6-491D4269A8E7@yellowspace.net> In-Reply-To: <48446C42.4070208@mawer.org> References: <683A6ED2-0E54-42D7-8212-898221C05150@thefrog.net> <20080518124217.GA16222@eos.sc1.parodius.com> <93F07874-8D5F-44AE-945F-803FFC3B9279@thefrog.net> <16a6ef710806012304m48b63161oee1bc6d11e54436a@mail.gmail.com> <20080602064023.GA95247@eos.sc1.parodius.com> <48446C42.4070208@mawer.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, just to add one more voice to the issue: I'm experiencing the lockups with zfs too. Environment: development test machine, amd64, 3GHz AMD, 2GB ram, running FreeBSD/amd64 7.0-STABLE #8, Sat Apr 26 10:10:53 CEST 2008, with one 400GB SATA disk devoted completely to a zpool (no raid of any kind). This disk has 5 filesystems which get rsynced on a daily basis from different other development hosts. Some of the filesystems are nfs-exported. /boot/loader.conf contains: vm.kmem_size=900M vm.kmem_size_max=900M vfs.zfs.arc_max=300M vfs.zfs.prefetch_disable=1 The disk itself has no known hw problems. A script controlled by cron makes a daily or weekly snapshot of the filesystems (at 2:30 AM). Before that, a "housekeeping" script checks for available space, and if the space is getting below a certain threshold, it destroys older snapshots (at 1:30 am). The rsyncs to the pool all happen a few hours later (4:30 am). I've seen lockups periodically, where I could not do anything else but hard-reboot the machine to unstuck it. It was possible to use other filesystems, but any process trying to access the zpool would hang. Now the very first hang was about 3 months after 7.0-BETA4, which was when I first setup the pools. I then csupped and rebuilt world and kernel periodically, the last time being end of april. After that I got those lockups more often, that is, after a maximum of 2 weeks. I noticed that now that I lowered the threshold of the "housekeeping" script, it hasn't locked up for about 3 weeks. That seems to point at a problem with zfs destroy fs@snapshot - or to anything my script does, so here's a link to it: http://lorenzo.yellowspace.net/zfs_housekeeping.sh.txt haven't seen any adX- timeouts or any other suspicious console messages so far. If there is anything I can provide to help nail down zfs problems please refer to it and I'll do my best... Thanx to everyone working on this great OS and on this cute file/volsystem :) Regards, Lorenzo On 02.06.2008, at 23:55, Antony Mawer wrote: > Jeremy Chadwick wrote: >> On Mon, Jun 02, 2008 at 04:04:12PM +1000, Andrew Hill wrote: > ... >>> unfortunately i couldn't get a backtrace or core dump for >>> 'political' >>> reasons (the system was required for use by others) but i'll see >>> if i can >>> get a panic happening after-hours to get some more info... >> I can't tell you what to do or how to do your job, but honestly you >> should be pulling this system out of production and replacing it >> with a >> different one, or a different implementation, or a different OS. >> Your >> users/employees are probably getting ticked off at the crashes, and >> it >> probably irritates you too. The added benefit is that you could get >> Scott access to the box. > > It's a home fileserver rather than a production "work" system, so > the challenge is finding another system with an equivalent amount of > storage.. :-) As one knows these things are often hard enough to > procure out of a company budget, let alone out of ones own pocket! > > --Antony > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?38DAE942-319A-4A44-A8F6-491D4269A8E7>