Date: Thu, 13 Nov 2008 01:34:07 -0600 From: Kevin Day <toasty@dragondata.com> To: freebsd-stable@freebsd.org Subject: Re: System deadlock when using mksnap_ffs Message-ID: <DA52E1DB-FE0C-496D-86E7-55D79D4C1D0E@dragondata.com>
next in thread | raw e-mail | index | archive | help
(moving my thread from -fs to -stable) Before touching anything, here's a description of the symptoms I see... Rather busy system, with quite a bit of filesystem activity occurring while the snapshot is being made. Quad CPU amd64 box with 16GB of ram, 6x10Krpm RAID array. Should be reasonably fast. Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted on /dev/da0s1a 739339824 74357926 605834714 11% 1718540 93855474 2% / 1.7 million inodes, 71G used of a 705G volume. Here's a timeline of what I see when starting to make a new snapshot. I've got a few windows running, showing "top", "iostat", etc. Baseline disk activity before starting anything: device r/s w/s kr/s kw/s wait svc_t b da0 24.0 2.0 355.6 32.0 1 10.7 28 0m0s: Snapshot begins, using "mount -u -o snapshot //.snap/weekly. 0 /" Drives immediately jump to 100% busy as expected. device r/s w/s kr/s kw/s wait svc_t b da0 153.8 6.0 3378.6 95.9 2 16.9 100 the mount process is spending 100% of its time in "biord". 2m10s: The mount process starts spending more and more time in "snaplk", alternating with "biord". device r/s w/s kr/s kw/s wait svc_t b da0 77.9 67.9 1270.7 3754.2 1 10.7 100 12m15s: The first intermittent slowdowns start affecting other processes on the system. Occasionally all active processes will get stuck in "snaplk" or "ufs" for 5-10 seconds before resuming. device r/s w/s kr/s kw/s wait svc_t b da0 77.9 31.0 1150.8 1054.9 1 10.4 100 114m47s: Active processes are briefly stuck in "suspfs" 115m22s: Mount is now in "snaprdb", Active processes are now completely stuck in "snaplk". Still responsive to SIGINFO, top is still running, etc. Just hangs any time anything needs the filesystem. device r/s w/s kr/s kw/s wait svc_t b da0 238.8 0.0 3820.1 0.0 1 4.1 99 143m19s: Mount now in wdrain. 143m34s: Finished. snapshot logging shows "/: suspended 13.308 sec, redo 153 of 4058" Most processes were hung for 28 minutes. Is this what others are seeing? It sounds like some of the complaints are it getting stuck in the "wdrain" state, not what I'm showing here. Another mildly annoying note: Any process that touches ".snap" while a snapshot is being generated gets stuck in "ufs" until it finishes. I can understand wanting to keep operations in there in sync, but it would be really nice if "find /" wouldn't get hung when it tries to decent into .snap, for example. ts5# cd /.snap ts5# ls -l ^T load: 0.17 cmd: ls 3696 [ufs] 0.00u 0.00s 0% 1496k
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?DA52E1DB-FE0C-496D-86E7-55D79D4C1D0E>