Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Jan 2007 10:49:23 -0800 (PST)
From:      "R. B. Riddick" <arne_woerner@yahoo.com>
To:        CyberLeo Kitsana <cyberleo@cyberleo.net>
Cc:        FreeBSD Geom <freebsd-geom@freebsd.org>
Subject:   Re: geom_raid5 livelock?
Message-ID:  <666664.62691.qm@web30309.mail.mud.yahoo.com>
In-Reply-To: <45A7D338.1090402@cyberleo.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--- CyberLeo Kitsana <cyberleo@cyberleo.net> wrote:
> I've been making use of the geom_raid5 class for FreeBSD 6.2-PRERELEASE. 
> I've noticed an odd behavior lately, with the latest sources provided 
> (as of 2006-01-10) in which, with 'safeop' enabled, the raid5 will stop 
> responding.
>
Ohoh...

> All works beautifully, until several dozen gigabytes are transferred to 
> or from the filesystem with safeop enabled, at which point the 
> filesystem will grow quickly less responsive, and eventually cease 
> responding entirely (processes stuck in diskwait) CPU usage is at 0%, 
> and all four members of the raid5 are being read at around 160kB/sec 
> (16kB/t, 10tps) constantly. It does not naturally recover within 72 
> hours. The mirror is unaffected by this behavior.
>
Strange... :-)
SAFEOP means, that a failed disk leads to an IO error for every request, and
that every read request reads all corresponding disk areas (if possible) and
checks parity. SAFEOP mode is surely useful, if u want to be sure, that neither
ur disks nor ur operating system provide bogus data... SAFEOP mode causes a lot
of disk activity (e. g. in case of sequential read, it reads n-1 times (where n
is the disk count of the RAID5) the whole stripe (n blocks)...).
This special form of a read request is used by the rebuild-procedure, so that
it should work fine...

What does "graid5 list" say in those times?
Are there any special messages logged via syslog in those times?

> When this occurs, the moment safeop is disabled on the raid5, all the 
> problems cease, the filesystem begins responding and the programs resume.
>
So the kernel does not panic or so...? :-)

> Is this intentional, an artifact of the hardware or layout I'm using, or 
> could this be indicative of an obscure bug somewhere? Can I provide any 
> additional information which would assist in tracking this down?
>
I would guess: An obscure bug in graid5...

You could try to put gcache between the disks and graid5...
And the syslog messages (if there r any) would be very interesting (like
messages about read error or disk failure or so)...

-Arne



 
____________________________________________________________________________________
Finding fabulous fares is fun.  
Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.
http://farechase.yahoo.com/promo-generic-14795097



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?666664.62691.qm>