From owner-freebsd-stable@FreeBSD.ORG Tue Aug 21 15:53:37 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3779C16A469; Tue, 21 Aug 2007 15:53:37 +0000 (UTC) (envelope-from johan@stromnet.se) Received: from av12-1-sn2.hy.skanova.net (av12-1-sn2.hy.skanova.net [81.228.8.185]) by mx1.freebsd.org (Postfix) with ESMTP id 99A6B13C481; Tue, 21 Aug 2007 15:53:36 +0000 (UTC) (envelope-from johan@stromnet.se) Received: by av12-1-sn2.hy.skanova.net (Postfix, from userid 502) id C8093381E0; Tue, 21 Aug 2007 17:53:34 +0200 (CEST) Received: from smtp4-2-sn2.hy.skanova.net (smtp4-2-sn2.hy.skanova.net [81.228.8.93]) by av12-1-sn2.hy.skanova.net (Postfix) with ESMTP id 83772380EF; Tue, 21 Aug 2007 17:53:34 +0200 (CEST) Received: from phomca.stromnet.se (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by smtp4-2-sn2.hy.skanova.net (Postfix) with ESMTP id 6D70337E45; Tue, 21 Aug 2007 17:53:34 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by phomca.stromnet.se (Postfix) with ESMTP id 109ADB826; Tue, 21 Aug 2007 17:53:34 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.se Received: from phomca.stromnet.se ([127.0.0.1]) by localhost (phomca.stromnet.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5W9nHAK4Txym; Tue, 21 Aug 2007 17:53:27 +0200 (CEST) Received: from [172.28.1.102] (jstrom-mb.stromnet.se [172.28.1.102]) by phomca.stromnet.se (Postfix) with ESMTP id D4AF3B824; Tue, 21 Aug 2007 17:53:27 +0200 (CEST) In-Reply-To: <20070821143136.GD1132@garage.freebsd.pl> References: <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se> <20070821143136.GD1132@garage.freebsd.pl> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: <441B87F4-5846-441B-B6B4-34694B483C73@stromnet.se> Content-Transfer-Encoding: quoted-printable From: =?ISO-8859-1?Q?Johan_Str=F6m?= Date: Tue, 21 Aug 2007 17:53:19 +0200 To: Pawel Jakub Dawidek X-Mailer: Apple Mail (2.752.3) Cc: freebsd-stable@freebsd.org, freebsd-geom@freebsd.org Subject: Re: Crashed gmirror, single disk marked SYNC and wont boot... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Aug 2007 15:53:37 -0000 On Aug 21, 2007, at 16:31 , Pawel Jakub Dawidek wrote: > On Tue, Aug 21, 2007 at 02:15:08PM +0200, Johan Str=F6m wrote: >> Hi >> >> FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7: >> Tue Feb 13 18:24:34 CET 2007 johan@elfi.stromnet.se:/usr/obj/usr/ >> src/sys/ROUTER.POLLING i386 >> >> (ROUTER.POLLING is GENERIC + options DEVICE_POLLING and ALTQ, >> IPSEC, also pfsync and carp) >> >> This weekend I had a disk failing on me in a machine running gmirror >> gm0 with 2 providers (ad0 and ad6). The whole box froze with no >> screen output, and on hard reboot I got some LBA errors etc from ad0, >> after a few reboots it got up and running though (I wasnt at the >> screen, had do do it by phone so couldn't really debug very well). >> As soon as the box got up, I removed ad0 from the gmirror, so ad6 was >> the only provider. Today I got a new disk that would replace ad0.. >> Now remeber, ad6 was the only disk in the mirror. I took the box down >> fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4 >> +6 is SATA, ad0 was IDE). Changed so I booted of the old SATA.. >> Okay, there came the first problem; the boot loader gave me the usual >> options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1 >> i got the same prompt again.. F5 nothing at all.. Funny!... The >> system refused to load the loader (or whatever the 1-9 menu thingy is >> called) kernel or anything.. >> So I finally plugged the old ad0 disk into the machine to at least >> get it booted, thinking it would go up on the gmirror.. Nope..: >> >> (got the new ad4 out here) >> ad0: 38166MB at ata0-master UDMA100 >> ad6: 152627MB at ata3-master SATA150 >> GEOM_MIRROR: Device gm0 created (id=3D4029378995). >> GEOM_MIRROR: Device gm0: provider ad6 detected. >> Root mount waiting for: GMIRROR >> Root mount waiting for: GMIRROR >> Root mount waiting for: GMIRROR >> Root mount waiting for: GMIRROR >> GEOM_MIRROR: Force device gm0 start due to timeout. >> Trying to mount root from ufs:/dev/mirror/gm0s1a >> >> Manual root filesystem specification: >> : Mount using filesystem >> eg. ufs:da0s1a >> ? List valid disk boot devices >> Abort manual input >> >> mountroot> >> >> Okey... so why wouldnt it load my mirror from ad6 now?? I just did a >> clean shutdown without problems.. It didnt even recognize any slices >> on ad6s1 (altough the ad6s1 was found)... > > It loaded your mirror just fine, you confuse things. Gmirror =20 > started in > degraded state, as one could expect, but it seems there is no 'a' > partition on your gm0s1 slice (or entire bsdlabel is gone). > You could try to recreate it based on bsdlabel from ad0 (if it =20 > should be > the same), but I've no idea how it disapeared. Anyway, gmirror =20 > seems to > work properly. Okay.. So it tries to load, find no partition table, and ignores and =20 unloads gm0? > >> Some more digging into gmirror, I did a gmirror dump ad6: >> >> Metadata on /dev/ad6: >> magic: GEOM::MIRROR >> version: 3 >> name: gm0 >> mid: 4029378995 >> did: 449032193 >> all: 3 > > You have 3-way mirror? Uhm.. never had more than 2 disks in this machine.. > >> genid: 0 >> syncid: 5 >> priority: 0 >> slice: 4096 >> balance: round-robin >> mediasize: 20416757248 >> sectorsize: 512 >> syncoffset: 0 >> mflags: NONE >> dflags: SYNCHRONIZING >> hcprovider: >> provsize: 160041885696 >> MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f > > BTW. Your provider size is 149GB and mirror only use 19GB, which means > you mirrored 149GB disk with 19GB disk and you waste 130GB (it's > unused). Yes, the ad0 disk was (is) only 40GB so only first 40 Gb of that disk =20= was in mirror (the rest was in another slice with its own label.. =20 altough if I'm doing fdisk on the disk it seems to not be there at =20 all..) But hum, 19??.. It should be 40 (or somewhere around there at =20 least).. =46rom ad0 mount: Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 507630 85142 381878 18% / /dev/ad0s1e 507630 20 467000 0% /tmp /dev/ad0s1f 10154158 1176410 8165416 13% /usr /dev/ad0s1d 1506190 80326 1305370 6% /var /dev/ad0s1g 24174212 6939804 15300472 31% /var/squid swapinfo: /dev/ad0s1b 1022536 0 1022536 0% ~35Gb... Compared slice 1 on ad0 vs ad6, both have the exact same size. > >> Some googling indicated that SYNCHRONIZING means that its not >> "complete" and wont mount? Is that correct? Why would it be in that >> state then, I just shut it down fine... And where the f*ck did my >> slices go??.. > > SYNCHRONIZING means that this component was/is being synchronized. It > seems that you removed/lost the master disk, while it was =20 > synchronizing. > It should work anyway. Okay thats odd.. ad6 was the only disk in the mirror when I shut down =20= (shutdown -p now, and it powered off by itself..) so it should have =20 been good.. > > BTW. You confuse things again. Your slice is just fine (ad6s1), you > don't have partitions, AFAIU. Seems I did yes, thanks. Disks have slices (which on windows/dos/=20 linux world is called partitions) which have partitions.. check :) > > All in all, your partition table seems to be gone. If you created =20 > it on > gmirror before (gm0s1) you may still have the same partition table on > the other half of the mirror. You can try to move it to ad6 with > bsdlabel and verify if you can see file system inside partitions. Okay, tried that now.. Saved ad0s1 label, reloaded it onto ad6s1.. =20 Now I got same partition table on ad6s1 as on ad0s1... Trying to mount any though gives me incorrect super block... fsck =20 cannot find any superblocks either.. So.. What to do now then? Just for get ad6 and start from scratch =20 from ad2? (as i said, the data isnt very old realy)... Im thinking about doing complete reinstall on ad4+ad6 then.. Can I do =20= that? fdisk both with full partition on both, create a new gmirror =20 between ad6s1/ad4s1 (or should i go on ad4/ad6?), create slices, use =20 dump | restore (of course with apps shutdown so no data is changed.. =20 or at least nothing that I care about) to copy all files from ad2 to =20 new mirror.. what do I need to do more? bsdlabel -B on both to write =20 boot blocks? Is there anything else to think about? Thanks for your help..:)