Date: Tue, 21 Mar 2006 01:36:44 -0800 (PST) From: John Kozubik <john@kozubik.com> To: freebsd-current@freebsd.org Cc: freebsd-fs@freebsd.org, jroberson@chesapeake.net, tegge@freebsd.org, kris@obsecurity.org Subject: UFS2 Snapshots in 6.1-Beta4 - Confirmed Problems Message-ID: <20060320224313.O55763@kozubik.com>
next in thread | raw e-mail | index | archive | help
Hello, By request of the participants of freebsd-fs, I have been testing the behavior and stability of UFS2 snapshots on FreeBSD 6.1-Beta4 for several days now. Unfortunately, I have confirmed that behavior from existing PRs (circa 6.0) still manifests itself, as well as some additional bad behavior that has not yet been documented in PRs. I hope that I am making this information available soon enough so that it may be acted upon prior to the release of 6.1. It would make me very happy to confidently use snapshots on FreeBSD (something that has never been possible in the past). ----- Here is the behavior I have witnessed: First, I have confirmed that a filesystem with multiple snapshots that undergoes multiple, rapid deletions of files, will cause the system to hang. I have witnessed this before, but had not confirmed it or documented it in a PR. Now that I have confirmed this behavior, I have documented it in: kern/94769 This is a serious problem because, in addition to making it nearly impossible to run a system with multiple snapshots, it is conceivable that enough rapid file deletions could occur on an otherwise non snapshotted system that has a single snapshot on it due to a background fsck, to cause the system to hang. Second, kern/92292 is still a problem. I have reproduced this error in 6.1-BETA4 (and have seen it happening since 5.1). The (small) difference is that the cp process seems to stick in the flswai state instead of biowr. This next one is complicated, and I haven't submitted a PR for it yet, but I believe it is quite serious for reasons I will expand on below. The problem is: If you completely fill a filesystem (109% usage in `df` on most systems) that has a snapshot on it, the system becomes very unresponsive - all interactive and disk response lags terribly and, although the system is not hung, it is in many cases unusable. I believe this is serious because it is conceivable that the snapshots on a filesystem contain critical data, while at the same time the system they are on is a critical system. If one allows a snapshotted filesystem to fill, one is faced with the difficult choice of deleting a snapshot with potentially critical data on it, or sacrificing the use of that _entire computer system_. Data cannot be deleted from that filesystem to free up space because that data continues to reside on the snapshots. This behavior makes it imperative that an administrator never allow a snapshotted filesystem to become full or close to full, which is perhaps unreasonable. Related to the last problem, is the fact that a filesystem (without snapshots) that is completely full will still sync properly with no errors. However, when one fills up a filesystem that already has snapshots living on it, sync fails with the message: /mnt/data1: write failed, filesystem is full I do not know if the sync is in fact unsuccessful or not. In essence, the current behavior means that a filesystem that has snapshots on it experiences a point of no return if that file system ever fills up completely. A snapshot _must be deleted_ to allow the system to return to reasonable performance. I have not been able to determine if kern/92272 still exists on FreeBSD 6.1BETA4. It looks like it does, but I haven't had time to test conclusively. Finally, most trivial, an attempted snapshot that fails due to insufficient space on the target filesystem fails with error: mksnap_ffs: Cannot create /mnt/data1/.snap/almost_full: No space left on device However, it still creates a zero byte file. I think it should create no file at all. Thank you for reviewing this - please contact me if there are further tests that I can run, or additional details I can provide. P.S. The items above related to bad behavior when disks fill are items I am having a hard time recreating perfectly. That is to say, every time I combine snapshots with full disks on 6.1-BETA4, bad things happen (terrible performance or hangs, or both), but I am having trouble recreating exact scenarios. YMMV. ----- John Kozubik - john@kozubik.com - http://www.kozubik.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060320224313.O55763>