Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Aug 2007 17:53:19 +0200
From:      =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@stromnet.se>
To:        Pawel Jakub Dawidek <pjd@FreeBSD.org>
Cc:        freebsd-stable@freebsd.org, freebsd-geom@freebsd.org
Subject:   Re: Crashed gmirror, single disk marked SYNC and wont boot...
Message-ID:  <441B87F4-5846-441B-B6B4-34694B483C73@stromnet.se>
In-Reply-To: <20070821143136.GD1132@garage.freebsd.pl>
References:  <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se> <20070821143136.GD1132@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Aug 21, 2007, at 16:31 , Pawel Jakub Dawidek wrote:

> On Tue, Aug 21, 2007 at 02:15:08PM +0200, Johan Str=F6m wrote:
>> Hi
>>
>> FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7:
>> Tue Feb 13 18:24:34 CET 2007     johan@elfi.stromnet.se:/usr/obj/usr/
>> src/sys/ROUTER.POLLING  i386
>>
>> (ROUTER.POLLING is GENERIC  + options DEVICE_POLLING  and ALTQ,
>> IPSEC, also pfsync and carp)
>>
>> This weekend I had a disk failing on me in a machine running gmirror
>> gm0 with 2 providers (ad0 and ad6). The whole box froze with no
>> screen output, and on hard reboot I got some LBA errors etc from ad0,
>> after a few reboots it got up and running though (I wasnt at the
>> screen, had do do it by phone so couldn't really debug very well).
>> As soon as the box got up, I removed ad0 from the gmirror, so ad6 was
>> the only provider. Today I got a new disk that would replace ad0..
>> Now remeber, ad6 was the only disk in the mirror. I took the box down
>> fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4
>> +6 is SATA, ad0 was IDE). Changed so I booted of the old SATA..
>> Okay, there came the first problem; the boot loader gave me the usual
>> options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1
>> i got the same prompt again.. F5 nothing at all.. Funny!... The
>> system refused to load the loader (or whatever the 1-9 menu thingy is
>> called) kernel or anything..
>> So I finally plugged the old ad0 disk into the machine to at least
>> get it booted, thinking it would go up on the gmirror.. Nope..:
>>
>> (got the new ad4 out here)
>> ad0: 38166MB <WDC WD400BB-00CAA1 17.07W17> at ata0-master UDMA100
>> ad6: 152627MB <SAMSUNG HD160JJ ZM100-41> at ata3-master SATA150
>> GEOM_MIRROR: Device gm0 created (id=3D4029378995).
>> GEOM_MIRROR: Device gm0: provider ad6 detected.
>> Root mount waiting for: GMIRROR
>> Root mount waiting for: GMIRROR
>> Root mount waiting for: GMIRROR
>> Root mount waiting for: GMIRROR
>> GEOM_MIRROR: Force device gm0 start due to timeout.
>> Trying to mount root from ufs:/dev/mirror/gm0s1a
>>
>> Manual root filesystem specification:
>>   <fstype>:<device>  Mount <device> using filesystem <fstype>
>>                        eg. ufs:da0s1a
>>   ?                  List valid disk boot devices
>>   <empty line>       Abort manual input
>>
>> mountroot>
>>
>> Okey... so why wouldnt it load my mirror from ad6 now?? I just did a
>> clean shutdown without problems.. It didnt even recognize any slices
>> on ad6s1 (altough the ad6s1 was found)...
>
> It loaded your mirror just fine, you confuse things. Gmirror =20
> started in
> degraded state, as one could expect, but it seems there is no 'a'
> partition on your gm0s1 slice (or entire bsdlabel is gone).
> You could try to recreate it based on bsdlabel from ad0 (if it =20
> should be
> the same), but I've no idea how it disapeared. Anyway, gmirror =20
> seems to
> work properly.

Okay.. So it tries to load, find no partition table, and ignores and =20
unloads gm0?

>
>> Some more digging into gmirror, I did a gmirror dump ad6:
>>
>> Metadata on /dev/ad6:
>>      magic: GEOM::MIRROR
>>    version: 3
>>       name: gm0
>>        mid: 4029378995
>>        did: 449032193
>>        all: 3
>
> You have 3-way mirror?

Uhm.. never had more than 2 disks in this machine..

>
>>      genid: 0
>>     syncid: 5
>>   priority: 0
>>      slice: 4096
>>    balance: round-robin
>> mediasize: 20416757248
>> sectorsize: 512
>> syncoffset: 0
>>     mflags: NONE
>>     dflags: SYNCHRONIZING
>> hcprovider:
>>   provsize: 160041885696
>>   MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f
>
> BTW. Your provider size is 149GB and mirror only use 19GB, which means
> you mirrored 149GB disk with 19GB disk and you waste 130GB (it's
> unused).

Yes, the ad0 disk was (is) only 40GB so only first 40 Gb of that disk =20=

was in mirror (the rest was in another slice with its own label.. =20
altough if I'm doing fdisk on the disk it seems to not be there at =20
all..)
But hum, 19??.. It should be 40 (or somewhere around there at =20
least).. =46rom ad0 mount:
Filesystem           1K-blocks     Used     Avail Capacity  Mounted on
/dev/ad0s1a             507630    85142    381878    18%    /
/dev/ad0s1e             507630       20    467000     0%    /tmp
/dev/ad0s1f           10154158  1176410   8165416    13%    /usr
/dev/ad0s1d            1506190    80326   1305370     6%    /var
/dev/ad0s1g           24174212  6939804  15300472    31%    /var/squid
swapinfo:
/dev/ad0s1b       1022536        0  1022536     0%

~35Gb...
Compared slice 1 on ad0 vs ad6, both have the exact same size.

>
>> Some googling indicated  that  SYNCHRONIZING means that its not
>> "complete" and wont mount? Is that correct? Why would it be in that
>> state then, I just shut it down fine... And where the f*ck did my
>> slices go??..
>
> SYNCHRONIZING means that this component was/is being synchronized. It
> seems that you removed/lost the master disk, while it was =20
> synchronizing.
> It should work anyway.

Okay thats odd.. ad6 was the only disk in the mirror when I shut down =20=

(shutdown -p now, and it powered off by itself..) so it should have =20
been good..

>
> BTW. You confuse things again. Your slice is just fine (ad6s1), you
> don't have partitions, AFAIU.

Seems I did yes, thanks. Disks have slices (which on windows/dos/=20
linux world is called partitions) which have partitions.. check :)

>
> All in all, your partition table seems to be gone. If you created =20
> it on
> gmirror before (gm0s1) you may still have the same partition table on
> the other half of the mirror. You can try to move it to ad6 with
> bsdlabel and verify if you can see file system inside partitions.

Okay, tried that now.. Saved ad0s1 label, reloaded it onto ad6s1.. =20
Now I got same partition table on ad6s1 as on ad0s1...
Trying to mount any though gives me incorrect super block... fsck =20
cannot find any superblocks either..

So.. What to do now then? Just for get ad6 and start from scratch =20
from ad2? (as i said, the data isnt very old realy)...

Im thinking about doing complete reinstall on ad4+ad6 then.. Can I do =20=

that? fdisk both with full partition on both, create a new gmirror =20
between ad6s1/ad4s1 (or should i go on ad4/ad6?), create slices, use =20
dump | restore (of course with apps shutdown so no data is changed.. =20
or at least nothing that I care about) to copy all files from ad2 to =20
new mirror.. what do I need to do more? bsdlabel -B on both to write =20
boot blocks? Is there anything else to think about?


Thanks for your help..:)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?441B87F4-5846-441B-B6B4-34694B483C73>