Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 03 Aug 2006 00:27:59 +0200
From:      Miroslav Lachman <000.fbsd@quip.cz>
To:        rick-freebsd@kiwi-computer.com
Cc:        freebsd-geom@freebsd.org
Subject:   Re: gmirror Cannot add disk ad5 to gm0 (error=22)
Message-ID:  <44D126EF.9070503@quip.cz>
In-Reply-To: <20060802210709.GA15310@megan.kiwi-computer.com>
References:  <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> <20060802210709.GA15310@megan.kiwi-computer.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Rick C. Petty wrote:

> On Wed, Aug 02, 2006 at 10:37:49PM +0200, Miroslav Lachman wrote:
> 
>>>Did you have SMART enabled in the BIOS?
>>
>>Yes, (as I remember - I have only remote access now) and have 
> 
> 
> Then I doubt the disk itself had any errors..  Likely a bad cable or
> controller, which I've typically seen manifested under heavier disk
> activity.
[...]
> Yup, disks disappear when they stop responding to "bus reset" commands.
> This seems to happen on various controllers after an unpredictable number
> of READ_DMA or WRITE_DMA timeout errors.  Theoretically, you could reinit
> the channel and see if the disk pops back up.

Reinit did not help, only reboot.

> One thing to note:  I
> recommend putting the disks on separate channels so if a reinit fails, you
> don't lose both disks.  I hate it when manufacturers put two SATA ports on
> the same ATA channel..  Cheap for them, problematic for you.

I dont understand hardware much, but SATA controller is set to IDE mode 
in BIOS and disks are on ATA channel 2 as ad4 Master and ad5 Slave. If 
BIOS settings is changed to AHCI, dmesg shows two more ATA channels, ad4 
as ata2-master and second disk will be ad8 on ata4-master (without 
changing cables / connections). As I see same problem with disk 
disappearing with AHCI and IDE, I have decided to use IDE mode, which 
seems to me little bit faster in gmirror synchronization.

Is there big difference between AHCI and IDE mode of SATA controller?

As I see in dmesg, controller is Intel ICH7 *SATA300* but disks are 
SATA150, I this cause some troubles?

>>>>Can anybody tell me, where is the problem / how can I found what is wrong?
>>>
>>>
>>>What's the output of "gmirror status" ??  I suspect on a reboot, gmirror
>>>will try to synchronize ad4 to ad5 (since ad5 was the first to drop).  Once
>>>that is complete, gmirror won't be DEGRADED anymore.
>>
>># gmirror status
>>      Name    Status  Components
>>mirror/gm0  DEGRADED  ad4
> 
> 
> Hmm, and is ad5 detected?  (rhetorical question, because I see that it was)
> 
> 
>>Gmirror is not synchronized after reboot:
>>
>>Aug  1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider ad5 
>>detected.
>>Aug  1 09:14:50 track kernel: GEOM_MIRROR: Component ad5 (device gm0) 
>>broken, skipping.
> 
> 
> Looks like the disk was marked with bad metadata.
> 
> 
>>So disk is OK, but gmirror refused to use it?
> 
> 
> Yes.  I would first suggest trying "gmirror deactivate -v gm0 ad5" then
> trying to reactivate it.  Maybe that will flush out the wrong metadata.
> If that doesn't work, try booting in verbose mode and attaching the dmesg
> (in particular, when the mirror is being attached).

> Last resort (although not a horrible option), you can try removing ad5 from
> the mirror and relabelling (gmirror label, not bsdlabel) it.  If the remove
> fails, use a combination of forget and clear.  

gmirror forget and insert helped:

root@track ~/# gmirror deactivate -v gm0 ad5
No such provider: ad5.
root@track ~/# gmirror forget -v gm0
Done.
root@track ~/# gmirror insert -v gm0 ad5
Done.

root@track ~/# gmirror status
       Name    Status  Components
mirror/gm0  DEGRADED  ad4
                       ad5 (0%)

>>If disks are OK, what is wrong? What caused READ / WRITE timeouts? 
>>Broken SATA controler? FreeBSD ATA driver?
> 
> 
> Try replacing the cables, trying a different SATA controller.  I've seen
> these timeouts *a lot* and usually my gmirror/gvinum partitions all
> survive (after reboot at least).  There are a lot of threads on this and
> other mailing lists describing the timeout problems.

Yes, I read many post about similar problems. I have similar problem on 
4 machines, so I think this is not cable problem. Maybe bad controller 
in whole serie of ASUS RS120, or something like this. (4 of 4 same 
machines has similar problems with disk subsystem)

Thank you.

Miroslav Lachman



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?44D126EF.9070503>