Date: Thu, 03 Aug 2006 00:27:59 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> To: rick-freebsd@kiwi-computer.com Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) Message-ID: <44D126EF.9070503@quip.cz> In-Reply-To: <20060802210709.GA15310@megan.kiwi-computer.com> References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> <20060802210709.GA15310@megan.kiwi-computer.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Rick C. Petty wrote: > On Wed, Aug 02, 2006 at 10:37:49PM +0200, Miroslav Lachman wrote: > >>>Did you have SMART enabled in the BIOS? >> >>Yes, (as I remember - I have only remote access now) and have > > > Then I doubt the disk itself had any errors.. Likely a bad cable or > controller, which I've typically seen manifested under heavier disk > activity. [...] > Yup, disks disappear when they stop responding to "bus reset" commands. > This seems to happen on various controllers after an unpredictable number > of READ_DMA or WRITE_DMA timeout errors. Theoretically, you could reinit > the channel and see if the disk pops back up. Reinit did not help, only reboot. > One thing to note: I > recommend putting the disks on separate channels so if a reinit fails, you > don't lose both disks. I hate it when manufacturers put two SATA ports on > the same ATA channel.. Cheap for them, problematic for you. I dont understand hardware much, but SATA controller is set to IDE mode in BIOS and disks are on ATA channel 2 as ad4 Master and ad5 Slave. If BIOS settings is changed to AHCI, dmesg shows two more ATA channels, ad4 as ata2-master and second disk will be ad8 on ata4-master (without changing cables / connections). As I see same problem with disk disappearing with AHCI and IDE, I have decided to use IDE mode, which seems to me little bit faster in gmirror synchronization. Is there big difference between AHCI and IDE mode of SATA controller? As I see in dmesg, controller is Intel ICH7 *SATA300* but disks are SATA150, I this cause some troubles? >>>>Can anybody tell me, where is the problem / how can I found what is wrong? >>> >>> >>>What's the output of "gmirror status" ?? I suspect on a reboot, gmirror >>>will try to synchronize ad4 to ad5 (since ad5 was the first to drop). Once >>>that is complete, gmirror won't be DEGRADED anymore. >> >># gmirror status >> Name Status Components >>mirror/gm0 DEGRADED ad4 > > > Hmm, and is ad5 detected? (rhetorical question, because I see that it was) > > >>Gmirror is not synchronized after reboot: >> >>Aug 1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider ad5 >>detected. >>Aug 1 09:14:50 track kernel: GEOM_MIRROR: Component ad5 (device gm0) >>broken, skipping. > > > Looks like the disk was marked with bad metadata. > > >>So disk is OK, but gmirror refused to use it? > > > Yes. I would first suggest trying "gmirror deactivate -v gm0 ad5" then > trying to reactivate it. Maybe that will flush out the wrong metadata. > If that doesn't work, try booting in verbose mode and attaching the dmesg > (in particular, when the mirror is being attached). > Last resort (although not a horrible option), you can try removing ad5 from > the mirror and relabelling (gmirror label, not bsdlabel) it. If the remove > fails, use a combination of forget and clear. gmirror forget and insert helped: root@track ~/# gmirror deactivate -v gm0 ad5 No such provider: ad5. root@track ~/# gmirror forget -v gm0 Done. root@track ~/# gmirror insert -v gm0 ad5 Done. root@track ~/# gmirror status Name Status Components mirror/gm0 DEGRADED ad4 ad5 (0%) >>If disks are OK, what is wrong? What caused READ / WRITE timeouts? >>Broken SATA controler? FreeBSD ATA driver? > > > Try replacing the cables, trying a different SATA controller. I've seen > these timeouts *a lot* and usually my gmirror/gvinum partitions all > survive (after reboot at least). There are a lot of threads on this and > other mailing lists describing the timeout problems. Yes, I read many post about similar problems. I have similar problem on 4 machines, so I think this is not cable problem. Maybe bad controller in whole serie of ASUS RS120, or something like this. (4 of 4 same machines has similar problems with disk subsystem) Thank you. Miroslav Lachman
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?44D126EF.9070503>