Date: Thu, 30 Sep 2010 17:43:01 +0200 From: David Naylor <naylor.b.david@gmail.com> To: Alexander Motin <mav@freebsd.org> Cc: freebsd-current@freebsd.org, Andriy Gapon <avg@icyb.net.ua> Subject: Re: Safe-mode on amd64 broken Message-ID: <201009301743.36007.naylor.b.david@gmail.com> In-Reply-To: <4CA433B7.9010306@FreeBSD.org> References: <201009291207.53146.naylor.b.david@gmail.com> <201009300755.46989.naylor.b.david@gmail.com> <4CA433B7.9010306@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart2414250.FMb40JKo8v Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable On Thursday 30 September 2010 08:52:39 Alexander Motin wrote: > David Naylor wrote: > > On Thursday 30 September 2010 07:23:34 Alexander Motin wrote: > >> David Naylor wrote: > >>> On Wednesday 29 September 2010 18:25:13 Alexander Motin wrote: > >>>> David Naylor wrote: > >>>>> On Wednesday 29 September 2010 16:19:08 Andriy Gapon wrote: > >>>>>> What do you try to actually achieve? > >>>>>=20 > >>>>> I was trying to boot a system and it was panicking due to stray > >>>>> interrupts. It turned out to be caused by HPET. I found > >>>>> `hint.hpet.0.clock=3D0' which fixed the problem. > >>>>>=20 > >>>>> This means HPET does not work on any of my machines. The other one= 's > >>>>> symptoms are hda losing interrupts after a period of up-time. > >>>>=20 > >>>> What chipset do you use? Nvidia MCP5x? Could you send me your verbose > >>>> dmesg? > >>>=20 > >>> Yes, the one is a MCP51, the other is a ICH8M. > >>>=20 > >>> The desktop is a Gigabyte N650SLI-DS4L. Its symptom is hda losing > >>> interrupts after a period of time. > >>=20 > >> There are too many reports about different lost interrupts problems on > >> different controllers of MCP5x. I don't know the reason. Attached patch > >> should disable using regular HPET interrupts on NVidia chipsets. I hope > >> it will work as workaround. May be it is too aggressive, but better to > >> be safe then sorry. I assume that legacy_route mode may still work fine > >> there. It would be nice to test it. > >=20 > > I assume you mean hint.hpet.0.legacy_route=3D1? I'll give that a try l= ater > > today on both machines. >=20 > Make sure that both attimer and atrtc disabled, as mentioned in hpet(4). legacy_route worked on the desktop but not on the laptop (boot stalled). =20 Here is vmstat using default settings for the desktop: interrupt total rate irq1: atkbd0 64 0 irq12: psm0 756 3 irq14: ata0 1255 5 irq16: vgapci0 13576 54 irq17: dc0 1546 6 irq18: hpet0 456756 1834 irq20: atapci2 11557 46 irq21: hdac0 ohci0 17038 68 irq23: atapci1 11534 46 Total 514082 2064 I moved hpet to irq22 (allowed_irqs=3D"0x400000") and that also worked for = the=20 desktop. =20 > > Is your patch the same as hint.hpet.0.clock=3D0? >=20 > By default - effectively yes. But it still allows to configure > legacy_route, which is, for example, default for Linux. >=20 > >>> The laptop is a Acer 2920. Its symptom for a GENERIC is a panic sayi= ng > >>> stray interrupt (irq7), with a custom kernel booting stalls. > >>=20 > >> This is strange, as my Acer with the same ICH8M works fine in all > >> possible modes. Also IMHO stray interrupts are not a reason to panic. > >> Could you show what it looks like? > >=20 > > See http://markmail.org/message/smxnofrdmmkxyvnd for my previous email > > that includes the backtrace from that panic. When I booted in i386 safe > > mode the kernel reported stray interrupts on irq7. vmstat -i shows irq7 > > as "stray irq7". >=20 > I am not sure "stray irq7" related here. Instead more suspicious looks > probable irq20 interrupt sharing between HPET and uhci0 and the fact > that system panicked during interrupt handler registration by uhci0. I > can't be sure what IRQ was used by HPET there, as in only present dmesg > it was disabled, but as soon as HPET registered early, I think it > grabbed first possible - irq20. On my system HPET also uses irq20, but > uhci0 lives on irq16 and so irq20 is not shared. On the laptop uhci0 and ehci0 live on irq20. =20 > To collect more data you may try to hint HPET driver to avoid irq20 by > setting hint.hpet.0.allowed_irqs=3D0x00e00000 or other values. I've tried > same recipy to create sharing on my system, but still found no problem. This fixes the problem for the laptop. This also allows one-shot timing to= =20 work. Moving hpet to irq22 also worked. Here is the vmstat -i using the=20 above hint: interrupt total rate irq1: atkbd0 407 0 irq9: acpi0 1857 2 irq12: psm0 1005 1 irq14: ata0 1870 2 irq18: uhci4 2183 2 irq20: uhci0 ehci0 2421 3 irq21: hpet0 uhci1 502330 667 irq23: uhci2 ehci1 3 0 irq256: vgapci0 25023 33 irq257: hdac0 236 0 irq258: bge0 79 0 irq259: ahci0 27356 36 Total 564770 750 --nextPart2414250.FMb40JKo8v Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (FreeBSD) iEUEABECAAYFAkyksCcACgkQUaaFgP9pFrJt5gCYs1WK5VPIEg5+HLyZTNIgHtC/ wACcCQjBrPbunKWXajfwEFBK7RmI1RE= =JmLK -----END PGP SIGNATURE----- --nextPart2414250.FMb40JKo8v--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201009301743.36007.naylor.b.david>