Date: Tue, 1 Apr 2008 13:51:27 -0600 From: "Cyrus Rahman" <crahman@gmail.com> To: freebsd-fs@freebsd.org Subject: Trouble with snapshots Message-ID: <9e77bdb50804011251q65eca371kc6bc9a60ac0c248@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
I'm seeing serious problems with snapshot deadlocks on 7.0-RELEASE right now. I haven't been able to set up a test environment to really determine precise details, but this much I know: Filesystem i/o will eventually lock up, requiring a hard reset, after the snapshot mount sleeps permanently on suspfs. Eventually there's a cascade and everything ends up waiting on suspfs. Running a 'sync' after mount hangs is a sure way to propagate the problem. This happens very often - probably 15% probability per snapshot on the server running 7.0. It's bad enough so that it's not realistic to use snapshots there. Other strange things have been observed, in that an entire day's worth of work vanished - after the reset/reboot the filesystems were consistent, but in the state they were in many hours before, at the time the snapshot hung. The snapshot had been observed hanging, but everything else seemed to work so a decision was made to reboot at the end of the day - with disastrous effect! During the day nothing unusual except for the hung snapshot was noticed. I'm guessing everything just got cached (for hours!) and the cache never got flushed. This is happening on a system set up with journaled ufs filesystems, so that may be part of the problem. The system is running amd64 with an Intel Q6600. The filesystem that has trouble with this has a number of large files, about 500-700Mb on it. Filesystems with only small files do not seem to have trouble, even though they are bigger filesystems with more files. I can't think of anything else unique about it.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9e77bdb50804011251q65eca371kc6bc9a60ac0c248>