Date: Tue, 21 Aug 2007 17:53:19 +0200 From: =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@stromnet.se> To: Pawel Jakub Dawidek <pjd@FreeBSD.org> Cc: freebsd-stable@freebsd.org, freebsd-geom@freebsd.org Subject: Re: Crashed gmirror, single disk marked SYNC and wont boot... Message-ID: <441B87F4-5846-441B-B6B4-34694B483C73@stromnet.se> In-Reply-To: <20070821143136.GD1132@garage.freebsd.pl> References: <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se> <20070821143136.GD1132@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Aug 21, 2007, at 16:31 , Pawel Jakub Dawidek wrote: > On Tue, Aug 21, 2007 at 02:15:08PM +0200, Johan Str=F6m wrote: >> Hi >> >> FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7: >> Tue Feb 13 18:24:34 CET 2007 johan@elfi.stromnet.se:/usr/obj/usr/ >> src/sys/ROUTER.POLLING i386 >> >> (ROUTER.POLLING is GENERIC + options DEVICE_POLLING and ALTQ, >> IPSEC, also pfsync and carp) >> >> This weekend I had a disk failing on me in a machine running gmirror >> gm0 with 2 providers (ad0 and ad6). The whole box froze with no >> screen output, and on hard reboot I got some LBA errors etc from ad0, >> after a few reboots it got up and running though (I wasnt at the >> screen, had do do it by phone so couldn't really debug very well). >> As soon as the box got up, I removed ad0 from the gmirror, so ad6 was >> the only provider. Today I got a new disk that would replace ad0.. >> Now remeber, ad6 was the only disk in the mirror. I took the box down >> fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4 >> +6 is SATA, ad0 was IDE). Changed so I booted of the old SATA.. >> Okay, there came the first problem; the boot loader gave me the usual >> options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1 >> i got the same prompt again.. F5 nothing at all.. Funny!... The >> system refused to load the loader (or whatever the 1-9 menu thingy is >> called) kernel or anything.. >> So I finally plugged the old ad0 disk into the machine to at least >> get it booted, thinking it would go up on the gmirror.. Nope..: >> >> (got the new ad4 out here) >> ad0: 38166MB <WDC WD400BB-00CAA1 17.07W17> at ata0-master UDMA100 >> ad6: 152627MB <SAMSUNG HD160JJ ZM100-41> at ata3-master SATA150 >> GEOM_MIRROR: Device gm0 created (id=3D4029378995). >> GEOM_MIRROR: Device gm0: provider ad6 detected. >> Root mount waiting for: GMIRROR >> Root mount waiting for: GMIRROR >> Root mount waiting for: GMIRROR >> Root mount waiting for: GMIRROR >> GEOM_MIRROR: Force device gm0 start due to timeout. >> Trying to mount root from ufs:/dev/mirror/gm0s1a >> >> Manual root filesystem specification: >> <fstype>:<device> Mount <device> using filesystem <fstype> >> eg. ufs:da0s1a >> ? List valid disk boot devices >> <empty line> Abort manual input >> >> mountroot> >> >> Okey... so why wouldnt it load my mirror from ad6 now?? I just did a >> clean shutdown without problems.. It didnt even recognize any slices >> on ad6s1 (altough the ad6s1 was found)... > > It loaded your mirror just fine, you confuse things. Gmirror =20 > started in > degraded state, as one could expect, but it seems there is no 'a' > partition on your gm0s1 slice (or entire bsdlabel is gone). > You could try to recreate it based on bsdlabel from ad0 (if it =20 > should be > the same), but I've no idea how it disapeared. Anyway, gmirror =20 > seems to > work properly. Okay.. So it tries to load, find no partition table, and ignores and =20 unloads gm0? > >> Some more digging into gmirror, I did a gmirror dump ad6: >> >> Metadata on /dev/ad6: >> magic: GEOM::MIRROR >> version: 3 >> name: gm0 >> mid: 4029378995 >> did: 449032193 >> all: 3 > > You have 3-way mirror? Uhm.. never had more than 2 disks in this machine.. > >> genid: 0 >> syncid: 5 >> priority: 0 >> slice: 4096 >> balance: round-robin >> mediasize: 20416757248 >> sectorsize: 512 >> syncoffset: 0 >> mflags: NONE >> dflags: SYNCHRONIZING >> hcprovider: >> provsize: 160041885696 >> MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f > > BTW. Your provider size is 149GB and mirror only use 19GB, which means > you mirrored 149GB disk with 19GB disk and you waste 130GB (it's > unused). Yes, the ad0 disk was (is) only 40GB so only first 40 Gb of that disk =20= was in mirror (the rest was in another slice with its own label.. =20 altough if I'm doing fdisk on the disk it seems to not be there at =20 all..) But hum, 19??.. It should be 40 (or somewhere around there at =20 least).. =46rom ad0 mount: Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 507630 85142 381878 18% / /dev/ad0s1e 507630 20 467000 0% /tmp /dev/ad0s1f 10154158 1176410 8165416 13% /usr /dev/ad0s1d 1506190 80326 1305370 6% /var /dev/ad0s1g 24174212 6939804 15300472 31% /var/squid swapinfo: /dev/ad0s1b 1022536 0 1022536 0% ~35Gb... Compared slice 1 on ad0 vs ad6, both have the exact same size. > >> Some googling indicated that SYNCHRONIZING means that its not >> "complete" and wont mount? Is that correct? Why would it be in that >> state then, I just shut it down fine... And where the f*ck did my >> slices go??.. > > SYNCHRONIZING means that this component was/is being synchronized. It > seems that you removed/lost the master disk, while it was =20 > synchronizing. > It should work anyway. Okay thats odd.. ad6 was the only disk in the mirror when I shut down =20= (shutdown -p now, and it powered off by itself..) so it should have =20 been good.. > > BTW. You confuse things again. Your slice is just fine (ad6s1), you > don't have partitions, AFAIU. Seems I did yes, thanks. Disks have slices (which on windows/dos/=20 linux world is called partitions) which have partitions.. check :) > > All in all, your partition table seems to be gone. If you created =20 > it on > gmirror before (gm0s1) you may still have the same partition table on > the other half of the mirror. You can try to move it to ad6 with > bsdlabel and verify if you can see file system inside partitions. Okay, tried that now.. Saved ad0s1 label, reloaded it onto ad6s1.. =20 Now I got same partition table on ad6s1 as on ad0s1... Trying to mount any though gives me incorrect super block... fsck =20 cannot find any superblocks either.. So.. What to do now then? Just for get ad6 and start from scratch =20 from ad2? (as i said, the data isnt very old realy)... Im thinking about doing complete reinstall on ad4+ad6 then.. Can I do =20= that? fdisk both with full partition on both, create a new gmirror =20 between ad6s1/ad4s1 (or should i go on ad4/ad6?), create slices, use =20 dump | restore (of course with apps shutdown so no data is changed.. =20 or at least nothing that I care about) to copy all files from ad2 to =20 new mirror.. what do I need to do more? bsdlabel -B on both to write =20 boot blocks? Is there anything else to think about? Thanks for your help..:)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?441B87F4-5846-441B-B6B4-34694B483C73>