Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Jun 2005 12:13:40 -0500
From:      Skylar Thompson <skylar@cs.earlham.edu>
To:        Xin LI <delphij@frontfree.net>
Cc:        fs@freebsd.org
Subject:   Re: Snapshot problems
Message-ID:  <42C18544.4000909@cs.earlham.edu>
In-Reply-To: <20050627134008.GA5764@frontfree.net>
References:  <20050626182031.GA5268@quark.cs.earlham.edu> <20050627134008.GA5764@frontfree.net>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig3FF10CF33F5EBE6F7C45F97B
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Xin LI wrote:

>On Sun, Jun 26, 2005 at 01:20:31PM -0500, Skylar Thompson wrote:
>  
>
>>I've discovered a repeatable problem with FreeBSD's UFS2 snapshots. If I
>>create several snapshots, and then do heavy disk I/O on the original
>>filesystem (deletions, creations, simple touches, etc.) I can cause the I/O
>>system to crash. There is no kernel panic, and the machine still answers
>>pings, but no disk I/O occurs. I can replicate this on a dual-processor
>>beige-box system with a Mylex RAID controller and a RAID-5 set, and also on
>>a dual-processor Dell Poweredge 2650 with a PERC 3/i RAID controller and a
>>RAID-5 set and RAID-1 set.  FreeBSD 5.4-RELEASE is installed on both
>>systems, and SMP is enabled as well, with HTT disabled on the Poweredge. I
>>have DDB compiled in, so I can get debug information but I don't know what
>>to look for.
>>    
>>
>
>I think a script that can reliably trigger the "crash" would be helpful.
>  
>

I was using this script to take the snapshots:

#!/bin/sh

if [ -f /var/run/hourly_snap ]; then
        echo "Lock file exists. Exiting...."
        exit 1
else
        HOUR=`date "+%H"`

        touch /var/run/hourly_snap
        for f in / /usr /var /clients; do
                if [ -f $f/snapshots/hourly_snap.$HOUR ]; then
                        rm -f $f/snapshots/hourly_snap.$HOUR
                fi
                mksnap_ffs $f $f/snapshots/hourly_snap.$HOUR;
        done
        rm /var/run/hourly_snap
fi

I ran this once every other hour, so I had 12 snapshots in circulation 
at any given time. The number of snapshots seemed to exacerbate the 
problem; just having one or two around rarely (although sometimes) 
caused a crash.

>What do you mean by "IO system crash", BTW?  I got confused since it does
>not cause kernel panic and stop ping responses.  Do you mean that the
>I/O system was stalled/suspended when there is heavy disk operations?
>  
>
Yes. The kernel still responds and I can get into DDB just fine, but 
there's no disk activity, at least on the affected filesystem. Usually 
it's /usr, which has many used inodes on account of ports and src.

>My guess is that there is some underlying deadlock(s) present.  Would you
>mind compiling WITESS/WITESS_SUPPORT into your kernel and give it a try?
>This will reduce performance, but would also be helpful for picking locking
>bugs.
>
>  
>

Sure. I've got the 2650 booted up with WITNESS support in addition to 
DDB. Where should I go from here?


-- 
-- Skylar Thompson (skylar@cs.earlham.edu)
-- http://www.cs.earlham.edu/~skylar/


--------------enig3FF10CF33F5EBE6F7C45F97B
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCwYVIsc4yyULgN4YRAoNUAKCM08ndP7Rx/gBBOvLdktRmSu/z0QCeMEDj
036FSKdyLFjEELNwkz3WSZI=
=15PY
-----END PGP SIGNATURE-----

--------------enig3FF10CF33F5EBE6F7C45F97B--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42C18544.4000909>