From owner-freebsd-hardware@FreeBSD.ORG  Fri Oct 15 22:49:30 2010
Return-Path: <owner-freebsd-hardware@FreeBSD.ORG>
Delivered-To: freebsd-hardware@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 894E1106564A;
	Fri, 15 Oct 2010 22:49:30 +0000 (UTC)
	(envelope-from pluknet@gmail.com)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 2CAB08FC0A;
	Fri, 15 Oct 2010 22:49:29 +0000 (UTC)
Received: by qwe4 with SMTP id 4so653961qwe.13
	for <multiple recipients>; Fri, 15 Oct 2010 15:49:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=6Qs6AIzSbGc+FC1crWIG+XXrWFwZNZUtvRD6KONE2MA=;
	b=NMBqNsN9oefGSRYEQ4h6ZyXnJU6yEDZcxPX1MIQuKdtW0a2xOd4U5A0tREyW8KIRlA
	oW85ElAimXpUX3gYDdtDcqxOUAVyFg1iL8DdRNaIhm5yuP3/eFPAT0JZa8TqRIggY/V+
	fQ6sFIYFfOXDOd2CtfXdANgaWsVzgOp6UX5P8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=nS8z+7O9WrsZ9vjLPK1EXB1SzX6ML/GS/h+3VrMfevQ3pxFLOH7UDUhPV6237EnwTT
	1aOfCLBgl6sRYJJTE4By01rBh1aydaipNDVj6ZjBS13tP7pMiRPn4gMSdEmINMd8gYOE
	S7nxTveZEnDSyYdr4lwIvS0bHcqMxgun5EZTM=
MIME-Version: 1.0
Received: by 10.224.212.199 with SMTP id gt7mr365267qab.130.1287181137068;
	Fri, 15 Oct 2010 15:18:57 -0700 (PDT)
Received: by 10.229.61.29 with HTTP; Fri, 15 Oct 2010 15:18:57 -0700 (PDT)
In-Reply-To: <4CB8BED6.8040204@greatbaysoftware.com>
References: <4CB8A614.6000707@greatbaysoftware.com>
	<4CB8BED6.8040204@greatbaysoftware.com>
Date: Sat, 16 Oct 2010 02:18:57 +0400
Message-ID: <AANLkTimYU_XmZ_DRjA_zJ7dcmgaj47UM6Tf3ea50cZLK@mail.gmail.com>
From: Sergey Kandaurov <pluknet@gmail.com>
To: Charles Owens <cowens@greatbaysoftware.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: Scott Long <scottl@freebsd.org>, freebsd-hardware@freebsd.org
Subject: Re: mfiutil reports "PSTATE 0x0020" new drive state
X-BeenThere: freebsd-hardware@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: General discussion of FreeBSD hardware <freebsd-hardware.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hardware>, 
	<mailto:freebsd-hardware-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hardware>
List-Post: <mailto:freebsd-hardware@freebsd.org>
List-Help: <mailto:freebsd-hardware-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hardware>, 
	<mailto:freebsd-hardware-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Oct 2010 22:49:30 -0000

On 16 October 2010 00:51, Charles Owens <cowens@greatbaysoftware.com> wrote=
:
> =A0Hmm... the problem appears to have resolved itself. =A0After a few hou=
rs the
> new drive seems to have gone back into the array, and the original hot sp=
are
> drive put back into hot-spare state.
>
> So I'm interpreting state 0x0020 to therefore mean something like "hang o=
n
> while I use this new drive to automatically put everything back as it was
> before the failure". =A0Is this correct?
>
> Thanks,
> Charles
>
> [root@Bsvr ~]# mfiutil show drives
> mfi0 Physical Drives:
> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM236JR> =A0SATA enclosure 1=
, slot 0
> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM237KF> =A0SATA enclosure 1=
, slot 1
> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM236N8> =A0SATA enclosure 1=
, slot 2
> ( =A0149G) HOT SPARE<ST9160511NS SN04 serial=3D9SM237EK> =A0SATA enclosur=
e 1, slot
> 3
> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM238AG> =A0SATA enclosure 1=
, slot 4
>
>
>
> On 10/15/10 3:05 PM, Charles Owens wrote:
>>
>> =A0Hello,
>>
>> We have a mfi-based RAID array with a failed drive. =A0When replacing th=
e
>> failed drive with a brand new one 'mfiutil' reports it having status of
>> "PSTATE 0x0020". =A0Attempts to work with the drive to make it a hot spa=
re are
>> unsuccessful (eg. using "good" and/or "add" subcommands of mfiutil). =A0=
 We've
>> tested procedures for replacing failed drives in the past and haven't ru=
n
>> into this.
>>
>> Looking at the code for mfiutil it appears that this is happening becaus=
e
>> the mfi controller is reporting a drive status code that mfiutil doesn't
>> know about. =A0The system is remote and in production, so booting into t=
he LSI
>> in-BIOS RAID-management-tool is not an attractive option.
>>
>> Any help with understanding the situation and potential next steps would
>> be greatly appreciated. =A0More background information follows below.
>>
>> Thanks,
>>
>> Charles
>>
>>
>> Storage configuration: =A04-drive RAID 10 array plus one hot spare
>>
>> [root@svr ~]# mfiutil show config
>> mfi0 Configuration: 2 arrays, 1 volumes, 0 spares
>> =A0 =A0array 0 of 2 drives:
>> =A0 =A0 =A0 =A0drive 0 ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM23=
6JR> =A0SATA
>> enclosure 1, slot 0
>> =A0 =A0 =A0 =A0drive 1 ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM23=
7KF> =A0SATA
>> enclosure 1, slot 1
>> =A0 =A0array 1 of 2 drives:
>> =A0 =A0 =A0 =A0drive 4 ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM23=
7EK> =A0SATA
>> enclosure 1, slot 3
>> =A0 =A0 =A0 =A0drive 3 ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM23=
6N8> =A0SATA
>> enclosure 1, slot 2
>> =A0 =A0volume mfid0 (296G) RAID-1 256K OPTIMAL spans:
>> =A0 =A0 =A0 =A0array 0
>> =A0 =A0 =A0 =A0array 1
>>
>> [root@svr ~]# mfiutil show drives
>> mfi0 Physical Drives:
>> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM236JR> =A0SATA enclosure =
1, slot
>> 0
>> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM237KF> =A0SATA enclosure =
1, slot
>> 1
>> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM236N8> =A0SATA enclosure =
1, slot
>> 2
>> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM237EK> =A0SATA enclosure =
1, slot
>> 3
>> ( =A0149G) PSTATE 0x0020<ST9160511NS SN04 serial=3D9SM238AG> =A0SATA enc=
losure
>> 1, slot 4
>>
>> mfi0:<LSI MegaSAS 1078> =A0port 0x1000-0x10ff mem
>> ...
>>

Hi, Charles Owens.

0x20 is much likely to be the copyback physical state,
which is missing in enum mfi_pd_state.
And what you've experienced is copyback feature in action :)
Your array has been rebuilt with HSP as its ordinal PD, then you
switched failed drive
with good one, and HSP came into copyback mode to move all its data back
to good disk. That prevents reordering of disk numbers in array and
double rebuilding.

--=20
wbr,
pluknet