Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Apr 2008 10:36:24 -0600
From:      Scott Long <scottl@samsco.org>
To:        Dmitry Morozovsky <marck@rinet.ru>
Cc:        current@freebsd.org
Subject:   Re: Adaptec 1420SA support?
Message-ID:  <480CC288.4090002@samsco.org>
In-Reply-To: <20080421182852.N42264@woozle.rinet.ru>
References:  9060000000212025383 <1415691208445504@webmail12.yandex.ru> <48077B1C.5070608@samsco.org> <20080420185448.X56317@woozle.rinet.ru> <480B5DA0.6030404@samsco.org> <20080420192735.G56317@woozle.rinet.ru> <20080420193212.Y56317@woozle.rinet.ru> <20080420194331.Y56317@woozle.rinet.ru> <480B6664.9040602@samsco.org> <20080420201013.M56317@woozle.rinet.ru> <20080421140600.E72747@woozle.rinet.ru> <480C9F28.6050400@samsco.org> <20080421182852.N42264@woozle.rinet.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
Dmitry Morozovsky wrote:
> On Mon, 21 Apr 2008, Scott Long wrote:
> 
> [snip]
> 
> SL> > At least after simulating drive loss (atacotrol detach, atacontrol attach)
> SL> > I can't rebuild ar0:
> SL> >                                                                                                                                                                                    marck@moleskin:~#
> SL> > atacontrol status ar0
> SL> > ar0: ATA RAID1 status: DEGRADED
> SL> >  subdisks:
> SL> >    0 ad16 ONLINE
> SL> >    1 ad18 ONLINE
> SL> > marck@moleskin:~# atacontrol rebuild ar0
> SL> > atacontrol: ioctl(IOCATARAIDREBUILD): Input/output error
> SL> > marck@moleskin:~#
> SL> > 
> SL> > Or, should I wipe out ar label from the second disk to emulate disk
> SL> > replacement?
> SL> > 
> SL> 
> SL> Generating metadata is not supported, nor is automatic failover to a
> SL> spare.  If you're interested in working on the code, let me know and
> SL> I'll help you get started.
> 
> Yes I'm interested; however, my kernel hacking skills are rather limited and, 
> as I think, rudimentary ;-)
> 
> But I at least would try.

DDF requires knowledge of the topology of the entire system, something
that ata-raid (nor g_raid) can provide or even has a concept of.  So
generating metadata from scratch is massively error-prone in all but the
most simple case of a single array and a single set of disks on a single
controller.  Also, the spare handling currently in ata-raid is limited
to array-dedicated spares, and it's limited to assuming that a spare
will always be at a fixed position that can be directly mapped into the
array via an array in C (note the overloaded use of the term 'array' in
this discussion, I'll try to keep it clear when I'm talking about the C
programming construct vs the collection of disks construct).  These are
two separate problems, so I'll describe them separately.

For creating/writing metadata, the first thing that I'd do is to save
off the existing metadata from any good disks into a buffer pointed to
by one of the magic fields in the ar_softc.  These fields are actually
uint64_t types, but they can be overloaded to hold a pointer.  Then when
it's time to write out metadata, the saved buffer can be updated and
written out, preserving the previously-recorded system-wide information
that ata-raid can't provide.  Creating metadata from scratch is
significantly more cumbersome but certainly not hard aside from the fact
the the VDR and PDR records are going to be incomplete.

For spares, the existing ata-raid code treats them like unactivated
members of the array.  They are assigned a static slot in the C array
that is used to order the I/O, and just merely skipped over if they
aren't activated.  This is completely unworkable in anything but the
RAID-1 case, but luckily that's the only kind of redundancy that
ata-raid supports.  However, it also doesn't lend itself well to
supporting global spares, something that DDF does support.  What I'd do
is to add a linked list to the ar_softc that spare disks can be placed
on while they are inactive.  When a spare is activated, I'd take it off
the list and put it into the appropriate slot in the C array that
replaces the failed/missing member.  I'd also look into adding a global
spare linked list as well.  These changes are pretty simple, the rest of
the work involves just completing the unfinished spare code in the patch
that I posted (there's a comment in there that points to where the
missing code is).  The spare activation code would probably need some
work as well to inform the DDF module of the changing role of the spare
and its new position within the array.

The DDF spec can be found at www.snia.org, it's an open spec.  Adaptec
uses an older unpublished revision of the spec that has some unfortunate
differences.  I've compensated for some of those differences in the
patch, but there might be others that I haven't encountered yet.  Let me
know if you have any questions.

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?480CC288.4090002>