Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Jan 2000 13:34:35 +0530
From:      Greg Lehey <grog@lemis.com>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        freebsd-questions@FreeBSD.org, cjclark@home.com
Subject:   Re: Recoverving/reviving a 'stale' subdisk under vinum
Message-ID:  <20000121133435.U1123@mojave.worldwide.lemis.com>
In-Reply-To: <200001210635.BAA73206@server.baldwin.cx>; from jhb@FreeBSD.org on Fri, Jan 21, 2000 at 01:35:33AM -0500
References:  <20000121105518.N481@mojave.worldwide.lemis.com> <200001210635.BAA73206@server.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, 21 January 2000 at  1:35:33 -0500, John Baldwin wrote:
>
> On 21-Jan-00 Greg Lehey wrote:
>> On Thursday, 20 January 2000 at 19:15:43 -0500, Crist J. Clark wrote:
>>> On Thu, Jan 20, 2000 at 01:56:07PM -0500, John H. Baldwin wrote:
>>>> I've read the vinum(4) and vinum(8) manpages as well as the webpages at
>>>> www.lemis.com/~grog/vinum.html, and while they are very good as far as
>>>> setup and configuration info, I haven't been able to find a lot of info
>>>> about recovering.  I have a stale subdisk that I can't get to recover no
>>>> matter how many different start commands I try.  I've tried starting the
>>>> volume, the plex, and the subdisk itself with no success.
>>>>
>>>> # vinum list
>>>> Configuration summary
>>>>
>>>> Drives:         3 (4 configured)
>>>> Volumes:        1 (4 configured)
>>>> Plexes:         1 (8 configured)
>>>> Subdisks:       3 (16 configured)
>>>>
>>>> D vinumdrive0           State: up       Device /dev/da1s1e      Avail: 0/8683 MB (0%)
>>>> D vinumdrive1           State: up       Device /dev/da2s1e      Avail: 0/8683 MB (0%)
>>>> D vinumdrive2           State: up       Device /dev/da3s1e      Avail: 0/8683 MB (0%)
>>>>
>>>> V ftp_mirror            State: up       Plexes:       1 Size:         25 GB
>>>>
>>>> P ftp_mirror.p0       S State: corrupt  Subdisks:     3 Size:         25 GB
>>>>
>>>> S ftp_mirror.p0.s0      State: up       PO:        0  B Size:       8683 MB
>>>> S ftp_mirror.p0.s1      State: up       PO:      256 kB Size:       8683 MB
>>>> S ftp_mirror.p0.s2      State: stale    PO:      512 kB Size:       8683 MB
>>>>
>>>> # vinum start ftp_mirror.p0.s2
>>>> Can't start ftp_mirror.p0.s2: Device busy (16)
>>
>> Hmm.  That shouldn't happen.
>
> Well, that's comforting. :)

Hmm.  Looking at this more carefully, yes, you can't do anything
there.  You just don't have the information to recover the subdisk.
I'm still debating what to do in this case; there's no way to bring it
back to a guaranteed consistent state here, but you *can* use the
'setupstate' command to fake it.

>>> You have to 'stop' everything first. (I might be overkilling here,
>>> but better safe...)
>>
>> No, that's not safe.  That would mean taking down the volume.
>
> Err, oops.  I already did this and it worked.  I've already fsck'd
> the volume and have it in use right now.
>
>> I haven't seen this before.  How about the information I ask for in
>> the web page?
>
> Ok, here's what I do have, but I did fix it using the above
> hackishness, so some of it may not apply.
>
> the output of 'vinum list' you already have above, here's some of
> vinum_history, although it doesn't include any of the return values,
> so I don't think it will be of much use:
>
> 20 Jan 2000 12:39:55.489661 *** vinum started ***
> 20 Jan 2000 12:39:55.540632 start
> 20 Jan 2000 12:39:55.820518 *** Created devices ***
> 20 Jan 2000 12:40:12.649217 *** vinum started ***
> 20 Jan 2000 12:40:13.502406 help
> 20 Jan 2000 12:40:25.188145 ls
> 20 Jan 2000 13:10:31.321216 start
> 20 Jan 2000 13:10:47.978917 start ftp_mirror.p0.s2
> 20 Jan 2000 13:10:50.980012 stop
>
> That is what I did when I first brought the machine back up.
>
> 20 Jan 2000 16:21:53.536302 *** vinum started ***
> 20 Jan 2000 16:21:53.537010 stop ftp_mirror.p0
> 20 Jan 2000 16:21:58.984393 *** vinum started ***

Hmm.  Interesting.  I don't seem to log a 'vinum stop'.

> 20 Jan 2000 16:21:58.985133 list
> 20 Jan 2000 16:22:06.561902 *** vinum started ***
> 20 Jan 2000 16:22:06.562622 stop ftp_mirror.p0.s2
> 20 Jan 2000 16:22:17.000952 *** vinum started ***
> 20 Jan 2000 16:22:17.005242 stop -f ftp_mirror.p0.s2
> 20 Jan 2000 16:22:21.145993 *** vinum started ***
> 20 Jan 2000 16:22:21.146744 list
> 20 Jan 2000 16:22:40.709634 *** vinum started ***
> 20 Jan 2000 16:22:40.710394 start ftp_mirror
> 20 Jan 2000 16:22:54.393075 *** vinum started ***
> 20 Jan 2000 16:22:54.393778 start ftp_mirror.p0.s0
> 20 Jan 2000 16:23:00.238272 *** vinum started ***
> 20 Jan 2000 16:23:00.239015 list
> 20 Jan 2000 16:23:09.552251 *** vinum started ***
> 20 Jan 2000 16:23:09.552963 start ftp_mirror.p0.s1
> 20 Jan 2000 16:23:16.193159 *** vinum started ***
> 20 Jan 2000 16:23:16.193896 start ftp_mirror.p0.s2
>
> That is how I "fixed" it.

I don't see the volume being stopped there.  Of course, it's not so
important not to stop a volume if it's only partially accessible.

> However, the drive seems to have fallen over again (*sigh*) with the
> following kernel messages:
>
> Jan 20 23:28:38 raven /kernel: (da2:ahc1:0:1:0): SCB 0x96 - timed out while idle, LASTPHASE == 0x1, SEQADDR == 0xa
> Jan 20 23:28:38 raven /kernel: (da2:ahc1:0:1:0): Queuing a BDR SCB
> Jan 20 23:28:38 raven /kernel: (da2:ahc1:0:1:0): Bus Device Reset Message Sent
> Jan 20 23:28:38 raven /kernel: (da2:ahc1:0:1:0): no longer in timeout, status = 34b
> Jan 20 23:28:38 raven /kernel: ahc1: Bus Device Reset on A:1. 1 SCBs aborted

Yup, that looks like a hardware problem; possibly bus termination or
some such.  Vinum is good at finding suboptimal SCSI chains, since it
issues multiple requests in parallel.

> Note that I didn't get this message until after the drive had been
> booted for a while,

Right, that's relatively typical.

> the kernel found it fine during boot:

Greg
--
When replying to this message, please copy the original recipients.
For more information, see http://www.lemis.com/questions.html
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000121133435.U1123>