From owner-freebsd-stable@FreeBSD.ORG Wed Oct 3 07:30:02 2012 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E1086106564A; Wed, 3 Oct 2012 07:30:01 +0000 (UTC) (envelope-from dgeo@centrale-marseille.fr) Received: from meli.ec-m.fr (meli.ec-m.fr [147.94.19.138]) by mx1.freebsd.org (Postfix) with ESMTP id 77AD08FC0A; Wed, 3 Oct 2012 07:30:01 +0000 (UTC) Received: from amavis4.serv.int (amavis4.serv.int [10.3.0.48]) by meli.ec-m.fr (GrosseBox 1743 XXL) with ESMTP id 11D7E278A30; Wed, 3 Oct 2012 09:30:01 +0200 (CEST) X-Virus-Scanned: amavisd-new at centrale-marseille.fr Received: from meli.ec-m.fr ([10.3.0.12]) by amavis4.serv.int (amavis4.serv.int [10.3.0.48]) (amavisd-new, port 10024) with LMTP id aKWq71mi4eY7; Wed, 3 Oct 2012 09:30:00 +0200 (CEST) Received: from dgeo.sysadm.ec-m.fr (dgeo.sysadm.ec-m.fr [147.94.19.169]) (Authenticated sender: dgeo) by meli.ec-m.fr (GrosseBox 1743 XXL) with ESMTPSA id 8DC552789CC; Wed, 3 Oct 2012 09:30:00 +0200 (CEST) Message-ID: <506BE977.1060405@centrale-marseille.fr> Date: Wed, 03 Oct 2012 09:29:59 +0200 From: geoffroy desvernay User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120928 Thunderbird/15.0.1 MIME-Version: 1.0 To: Alexander Motin References: <506AE944.3020806@centrale-marseille.fr> <506AF15D.1010707@FreeBSD.org> <506B0AE1.5050303@FreeBSD.org> In-Reply-To: <506B0AE1.5050303@FreeBSD.org> X-Enigmail-Version: 1.4.3 OpenPGP: url=http://dgeo.perso.ec-m.fr/0x7C253D52.pgp Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Cc: freebsd-stable@FreeBSD.org, Andriy Gapon Subject: Re: ahcich reset -> cannot mount zfs root in 9.1-PRE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 07:30:02 -0000 On 10/02/2012 17:40, Alexander Motin wrote: > On 02.10.2012 16:51, Andriy Gapon wrote: >> on 02/10/2012 16:16 geoffroy desvernay said the following: >>> Hi all, >>> >>> Trying to upgrade a system from 9.0-RELEASE to 9.1-PRE from yesterday on >>> my machine (GEOM+ZFS mirror setup on ada[01]p3), the new kernel becomes >>> unable to mount root... The only way to recover is to boot from 9.0 >>> kernel. >>> The disks were already named ada[01] in 9.0, so I suspect nothing >>> there... >>> >>> I tried >>> - disabling AHCI in bios (no change seen) >>> - change cables, check PSU, test disks with smartctl >>> >>> Here are some bits (via serial console): >>> ahci0: port >>> 0xc000-0xc007,0xb000-0xb003,0xa000-0xa007,0x9000-0x9003,0x8000-0x800f >>> mem 0xfe9ff800-0xfe9ffbff irq 22 at device 18.0 on pci0 >>> ahci0: AHCI v1.10 with 4 3Gbps ports, Port Multiplier supported >>> ahci0: Caps: 64bit NCQ SNTF MPS AL CLO 3Gbps PM PMD SSC PSC 32cmd CCC >>> 4ports >>> ahcich0: at channel 0 on ahci0 >>> ahcich0: Caps: HPCP >>> ahcich1: at channel 1 on ahci0 >>> ahcich1: Caps: HPCP >>> ahcich2: at channel 2 on ahci0 >>> ahcich2: Caps: HPCP >>> ahcich3: at channel 3 on ahci0 >>> ahcich3: Caps: HPCP >>> ahcich0: AHCI reset... >>> ahcich0: SATA connect time=100us status=00000123 >>> ahcich0: AHCI reset: device found >>> ahcich0: AHCI reset: device ready after 0ms >>> >>> The difference with 9.0 is after that: here is 9.0's next lines: (same >>> for ahcich1) >>> (aprobe0:ahcich0:0:15:0): Command timed out >>> (aprobe0:ahcich0:0:15:0): Error 5, Retries exhausted >>> (aprobe0:ahcich0:0:0:0): SIGNATURE: 0000 >>> >>> And 9.1-PRE's: >>> (aprobe0:ahcich0:0:15:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 >>> (aprobe0:ahcich0:0:15:0): CAM status: Command timeout >>> (aprobe0:ahcich0:0:15:0): Error 5, Retries exhausted >>> >>> In both cases ada[01] are detected and available, but with 9.1-PRE I >>> see: >>> GEOM_RAID: Promise: Disk ada0 state changed from NONE to SPARE. >>> GEOM_RAID: Promise: Disk ada1 state changed from NONE to SPARE. >>> >>> (I see the same when I # kldload geom_raid # from running 9.0, doesn't >>> breaks anything...) >>> >>> I attach the full boot log with 9.1-PRE (bios with NO-raid nor AHCI >>> enabled, but this changes nothing in the output) >>> >>> I could test patches or try any command required to debug this… But for >>> the moment I don't know where to search (and kernel code is far away >>> from my current skills in debugging…) >> >> You probably need to clear RAID metadata on the disks as I think that >> disabling >> geom_raid is not possible in 9.1-PRE. >> I think that Alexander can help you more here. > > The right way is to clear RAID metadata on disks. If it is possible to > boot from any other source, you can just do `graid delete Promise` and > then reboot. > > Alternatively it is possible to disable geom_raid module using recently > added loader tunable kern.geom.raid.enable=0. After that your system > should boot and run fine. I would still recommend you to erase metadata, > but after setting that tunable it will be impossible to do it via graid > tool, only with manual dd surgery. In case of Promise format metadata > use up to 63 last sectors of the disk. You can identify respective > sectors to erase by signature "Promise Technology, Inc." in the > beginning of the sector. > I tried clearing metadata, but no effect (it seems to work, the first 'geom raid delete Promise' returns 0, the second one complains something like 'Promise array doesn't exist', but it didn't solve the problem. But adding kern.geom.raid.enable=0 did ;) I still didn't try to locate manualy the last sectors... Thanks a lot ! -- *geoffroy desvernay* C.R.I - Administration systèmes et réseaux Ecole Centrale de Marseille Tel: (+33|0)4 91 05 45 24 Fax: (+33|0)4 91 05 45 98 dgeo@centrale-marseille.fr