Date: Tue, 28 Jun 2005 12:13:40 -0500 From: Skylar Thompson <skylar@cs.earlham.edu> To: Xin LI <delphij@frontfree.net> Cc: fs@freebsd.org Subject: Re: Snapshot problems Message-ID: <42C18544.4000909@cs.earlham.edu> In-Reply-To: <20050627134008.GA5764@frontfree.net> References: <20050626182031.GA5268@quark.cs.earlham.edu> <20050627134008.GA5764@frontfree.net>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig3FF10CF33F5EBE6F7C45F97B Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Xin LI wrote: >On Sun, Jun 26, 2005 at 01:20:31PM -0500, Skylar Thompson wrote: > > >>I've discovered a repeatable problem with FreeBSD's UFS2 snapshots. If I >>create several snapshots, and then do heavy disk I/O on the original >>filesystem (deletions, creations, simple touches, etc.) I can cause the I/O >>system to crash. There is no kernel panic, and the machine still answers >>pings, but no disk I/O occurs. I can replicate this on a dual-processor >>beige-box system with a Mylex RAID controller and a RAID-5 set, and also on >>a dual-processor Dell Poweredge 2650 with a PERC 3/i RAID controller and a >>RAID-5 set and RAID-1 set. FreeBSD 5.4-RELEASE is installed on both >>systems, and SMP is enabled as well, with HTT disabled on the Poweredge. I >>have DDB compiled in, so I can get debug information but I don't know what >>to look for. >> >> > >I think a script that can reliably trigger the "crash" would be helpful. > > I was using this script to take the snapshots: #!/bin/sh if [ -f /var/run/hourly_snap ]; then echo "Lock file exists. Exiting...." exit 1 else HOUR=`date "+%H"` touch /var/run/hourly_snap for f in / /usr /var /clients; do if [ -f $f/snapshots/hourly_snap.$HOUR ]; then rm -f $f/snapshots/hourly_snap.$HOUR fi mksnap_ffs $f $f/snapshots/hourly_snap.$HOUR; done rm /var/run/hourly_snap fi I ran this once every other hour, so I had 12 snapshots in circulation at any given time. The number of snapshots seemed to exacerbate the problem; just having one or two around rarely (although sometimes) caused a crash. >What do you mean by "IO system crash", BTW? I got confused since it does >not cause kernel panic and stop ping responses. Do you mean that the >I/O system was stalled/suspended when there is heavy disk operations? > > Yes. The kernel still responds and I can get into DDB just fine, but there's no disk activity, at least on the affected filesystem. Usually it's /usr, which has many used inodes on account of ports and src. >My guess is that there is some underlying deadlock(s) present. Would you >mind compiling WITESS/WITESS_SUPPORT into your kernel and give it a try? >This will reduce performance, but would also be helpful for picking locking >bugs. > > > Sure. I've got the 2650 booted up with WITNESS support in addition to DDB. Where should I go from here? -- -- Skylar Thompson (skylar@cs.earlham.edu) -- http://www.cs.earlham.edu/~skylar/ --------------enig3FF10CF33F5EBE6F7C45F97B Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCwYVIsc4yyULgN4YRAoNUAKCM08ndP7Rx/gBBOvLdktRmSu/z0QCeMEDj 036FSKdyLFjEELNwkz3WSZI= =15PY -----END PGP SIGNATURE----- --------------enig3FF10CF33F5EBE6F7C45F97B--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42C18544.4000909>