Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 03 Oct 2012 09:29:59 +0200
From:      geoffroy desvernay <dgeo@centrale-marseille.fr>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        freebsd-stable@FreeBSD.org, Andriy Gapon <avg@FreeBSD.org>
Subject:   Re: ahcich reset -> cannot mount  zfs root in 9.1-PRE
Message-ID:  <506BE977.1060405@centrale-marseille.fr>
In-Reply-To: <506B0AE1.5050303@FreeBSD.org>
References:  <506AE944.3020806@centrale-marseille.fr> <506AF15D.1010707@FreeBSD.org> <506B0AE1.5050303@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10/02/2012 17:40, Alexander Motin wrote:
> On 02.10.2012 16:51, Andriy Gapon wrote:
>> on 02/10/2012 16:16 geoffroy desvernay said the following:
>>> Hi all,
>>>
>>> Trying to upgrade a system from 9.0-RELEASE to 9.1-PRE from yesterday on
>>> my machine (GEOM+ZFS mirror setup on ada[01]p3), the new kernel becomes
>>> unable to mount root... The only way to recover is to boot from 9.0
>>> kernel.
>>> The disks were already named ada[01] in 9.0, so I suspect nothing
>>> there...
>>>
>>> I tried
>>>   - disabling AHCI in bios (no change seen)
>>>   - change cables, check PSU, test disks with smartctl
>>>
>>> Here are some bits (via serial console):
>>> ahci0: <ATI IXP600 AHCI SATA controller> port
>>> 0xc000-0xc007,0xb000-0xb003,0xa000-0xa007,0x9000-0x9003,0x8000-0x800f
>>> mem 0xfe9ff800-0xfe9ffbff irq 22 at device 18.0 on pci0
>>> ahci0: AHCI v1.10 with 4 3Gbps ports, Port Multiplier supported
>>> ahci0: Caps: 64bit NCQ SNTF MPS AL CLO 3Gbps PM PMD SSC PSC 32cmd CCC
>>> 4ports
>>> ahcich0: <AHCI channel> at channel 0 on ahci0
>>> ahcich0: Caps: HPCP
>>> ahcich1: <AHCI channel> at channel 1 on ahci0
>>> ahcich1: Caps: HPCP
>>> ahcich2: <AHCI channel> at channel 2 on ahci0
>>> ahcich2: Caps: HPCP
>>> ahcich3: <AHCI channel> at channel 3 on ahci0
>>> ahcich3: Caps: HPCP
>>> ahcich0: AHCI reset...
>>> ahcich0: SATA connect time=100us status=00000123
>>> ahcich0: AHCI reset: device found
>>> ahcich0: AHCI reset: device ready after 0ms
>>>
>>> The difference with 9.0 is after that: here is 9.0's next lines: (same
>>> for ahcich1)
>>> (aprobe0:ahcich0:0:15:0): Command timed out
>>> (aprobe0:ahcich0:0:15:0): Error 5, Retries exhausted
>>> (aprobe0:ahcich0:0:0:0): SIGNATURE: 0000
>>>
>>> And 9.1-PRE's:
>>> (aprobe0:ahcich0:0:15:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
>>> (aprobe0:ahcich0:0:15:0): CAM status: Command timeout
>>> (aprobe0:ahcich0:0:15:0): Error 5, Retries exhausted
>>>
>>> In both cases ada[01] are detected and available, but with 9.1-PRE I
>>> see:
>>> GEOM_RAID: Promise: Disk ada0 state changed from NONE to SPARE.
>>> GEOM_RAID: Promise: Disk ada1 state changed from NONE to SPARE.
>>>
>>> (I see the same when I # kldload geom_raid # from running 9.0, doesn't
>>> breaks anything...)
>>>
>>> I attach the full boot log with 9.1-PRE (bios with NO-raid nor AHCI
>>> enabled, but this changes nothing in the output)
>>>
>>> I could test patches or try any command required to debug this… But for
>>> the moment I don't know where to search (and kernel code is far away
>>> from my current skills in debugging…)
>>
>> You probably need to clear RAID metadata on the disks as I think that
>> disabling
>> geom_raid is not possible in 9.1-PRE.
>> I think that Alexander can help you more here.
> 
> The right way is to clear RAID metadata on disks. If it is possible to
> boot from any other source, you can just do `graid delete Promise` and
> then reboot.
> 
> Alternatively it is possible to disable geom_raid module using recently
> added loader tunable kern.geom.raid.enable=0. After that your system
> should boot and run fine. I would still recommend you to erase metadata,
> but after setting that tunable it will be impossible to do it via graid
> tool, only with manual dd surgery. In case of Promise format metadata
> use up to 63 last sectors of the disk. You can identify respective
> sectors to erase by signature "Promise Technology, Inc." in the
> beginning of the sector.
> 
I tried clearing metadata, but no effect (it seems to work, the first
'geom raid delete Promise' returns 0, the second one complains something
like 'Promise array doesn't exist', but it didn't solve the problem.

But adding kern.geom.raid.enable=0 did ;)

I still didn't try to locate manualy the last sectors...

Thanks a lot !
-- 
*geoffroy desvernay*
C.R.I - Administration systèmes et réseaux
Ecole Centrale de Marseille
Tel: (+33|0)4 91 05 45 24
Fax: (+33|0)4 91 05 45 98
dgeo@centrale-marseille.fr




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?506BE977.1060405>