Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Apr 2011 14:39:30 -0700
From:      Garrett Cooper <yanegomi@gmail.com>
To:        Alexander Motin <mav@freebsd.org>
Cc:        pyunyh@gmail.com, FreeBSD-Current <freebsd-current@freebsd.org>, David Naylor <naylor.b.david@gmail.com>
Subject:   Re: [regression] unable to boot: no GEOM devices found.
Message-ID:  <BANLkTikfxNRvyL%2Bc5JCbix%2BQoaS%2B1V2wtw@mail.gmail.com>
In-Reply-To: <4DA4BF6A.7010806@FreeBSD.org>
References:  <mailpost.1302585106.8448174.20731.mailing.freebsd.current@FreeBSD.cs.nctu.edu.tw> <4DA3EE8F.8050306@FreeBSD.org> <201104122132.23809.naylor.b.david@gmail.com> <4DA4B247.6010901@FreeBSD.org> <20110412210354.GC1421@michelle.cdnetworks.com> <4DA4BF6A.7010806@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 12, 2011 at 2:08 PM, Alexander Motin <mav@freebsd.org> wrote:
> YongHyeon PYUN wrote:
>> On Tue, Apr 12, 2011 at 11:12:55PM +0300, Alexander Motin wrote:
>>> David Naylor wrote:
>>>> On Tuesday 12 April 2011 08:17:51 Alexander Motin wrote:
>>>>> David Naylor wrote:
>>>>>> I am running -current and since a few days ago (at least 2011/04/11)=
 I am
>>>>>> unable to boot.
>>>>>>
>>>>>> The boot process stops when it looks to find a bootable device. =A0T=
he
>>>>>> prompt (when pressing '?') does not display any device and yielding =
one
>>>>>> second (or more) to the kernel (by pressing '.') does not improve th=
e
>>>>>> situation.
>>>>>>
>>>>>> A known working date is 2011/02/20.
>>>>>>
>>>>>> I am running amd64 on a nVidia MCP51 chipset.
>>>>> MCP51... again...
>>>>>
>>>>>> I am willing to help any way I can.
>>>>> You could start from capturing and showing verbose dmesg. Full or at
>>>>> least in parts related to disks.
>>>> I captured the dmesg output for both the old (working) kernel and the =
new
>>>> (bad) kernel. =A0See attached for the difference between the two. =A0I=
f you need
>>>> the full dmesg please let me know.
>>>>
>>>> One thing I found is that the old kernel would not boot if I simply re=
booted
>>>> from the bad kernel. =A0I had to do a hard power off before the old ke=
rnel would
>>>> work again. =A0Is some device state surviving between reboots?
>>> +ata2: reiniting channel ..
>>> +ata2: SATA connect time=3D0ms status=3D00000113
>>> +ata2: reset tp1 mask=3D01 ostat0=3D58 ostat1=3D00
>>> +ata2: stat0=3D0x50 err=3D0x01 lsb=3D0x00 msb=3D0x00
>>> +ata2: reset tp2 stat0=3D50 stat1=3D00 devices=3D0x1
>>> +ata2: reinit done ..
>>> +unknown: FAILURE - ATA_IDENTIFY timed out LBA=3D0
>>>
>>> As soon as all devices detected but not responding to commands, I would
>>> suppose that there is something wrong with ATA interrupts. There is a
>>> long chain of interrupt problems in this chipset. I have already tried
>>> to debug one case where ATA wasn't generating interrupts at all.
>>> Unfortunately, without success -- requests were executing, but not
>>> generating interrupts, it wasn't looked like ATA driver problem.
>>>
>>> What's about possible candidate to revision triggering your problem, I
>>> would look on this message:
>>> +pcib0: Enabling MSI window for HyperTransport slave at pci0:0:9:0
>>>
>>> At least it is recent (SVN revs 219737,219740 on 2011-03-18 by jhb) and
>>> it is interrupt related.
>>
>> Does the driver disable MSI for MCP51?
>
> ata(4) doesn't uses MSI by default and I doubt this controller supports
> them any way. But if I am not mixing something, there were very strange
> situations with MSI on that chipset, when enabling them one one device
> caused interrupt problems on another.
>
>> I think jhb's patch fixed one MSI issue of all MCP chipset.
>
> I am not telling it is wrong. It could just trigger something.

Could the OP try disabling MSI[X] to see whether or not the issue
still occurs then?
-Garrett



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTikfxNRvyL%2Bc5JCbix%2BQoaS%2B1V2wtw>