Date: Sun, 29 Mar 2009 14:24:57 +0200 From: Dominic Fandrey <kamikaze@bsdforen.de> To: Alexander Motin <mav@FreeBSD.org> Cc: freebsd-stable@freebsd.org Subject: Re: powerd broken Message-ID: <49CF6899.2060002@bsdforen.de> In-Reply-To: <49CF60AB.4040709@bsdforen.de> References: <1238293386.00093672.1238281804@10.7.7.3> <49CF0803.1070505@FreeBSD.org> <49CF2F8D.6000905@bsdforen.de> <49CF4EB9.60108@FreeBSD.org> <49CF49F5.6010800@bsdforen.de> <49CF615A.6050304@FreeBSD.org> <49CF595A.30805@bsdforen.de> <49CF6B28.2080400@FreeBSD.org> <49CF60AB.4040709@bsdforen.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Dominic Fandrey wrote: > Alexander Motin wrote: >> Dominic Fandrey wrote: >>> Alexander Motin wrote: >>>> It means that one of your CPUs spent most of it's time in interrupt >>>> processing and so far from idle. What does `top -P` shows you? Where >>>> have you seen that ~6% CPU load? >>> That is the load shown by the e17 CPU module. It's display has always >>> been in sync with top in the past, no longer though, it appears. >>> >>> # top -PIS >>> last pid: 68235; load averages: 0.09, 0.16, >>> 0.17 >>> up 0+05:17:29 13:05:10 >>> 137 processes: 4 running, 117 sleeping, 16 waiting >>> CPU 0: 1.1% user, 0.0% nice, 1.1% system, 61.4% interrupt, 36.3% idle >>> CPU 1: 9.0% user, 0.0% nice, 3.4% system, 0.0% interrupt, 87.6% idle >>> Mem: 419M Active, 415M Inact, 416M Wired, 3752K Cache, 183M Buf, 716M >>> Free >>> Swap: 4096M Total, 4096M Free >>> >>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU >>> COMMAND >>> 11 root 1 171 ki31 0K 16K RUN 1 286:42 93.46% >>> idle: cpu1 >>> 23 root 1 -80 - 0K 16K RUN 0 126:54 59.96% >>> irq16: hdac0 uhci+ >>> 12 root 1 171 ki31 0K 16K RUN 0 179:27 40.09% >>> idle: cpu0 >>> 1318 root 1 46 0 439M 312M select 1 12:55 1.76% Xorg >>> 4361 musicpd 4 44 0 91164K 14412K ucond 1 1:57 0.78% mpd >>> >>> Some things strike me as odd. The difference between the load reported >>> by powerd and top is still very significant and of course the high >>> interrupt load. >> powerd now reports/uses summary load of all CPUs (it can be bigger then >> 100%), while top shows average. >> >>> I've got a mouse with a 1khz report rate (the only connected USB >>> device), but unplugging it doesn't change the load. Neither does >>> stopping moused (I'm running the system without HAL). There also >>> is a fingerprint reader, but it is only detected by ugen. >> I would start from identifying all devices sharing that IRQ and trying >> to disable them (or unload their drivers) one by one. `systat -vm 1` >> will show you how much interrupts actually happens there per second. >> >> On most of modern systems you can make hdac0 to not share that IRQ by >> enabling MSI there with hint.hdac.0.msi=1 in loader.conf. >> > > I already did unplug all devices to no avail. Afterwards I tried > to 'kldunload -f' all usb devices, but most refused. However during > this I recognized that /boot/modules holds an old u3g.ko, back from > the time when it was not in stable. Apparently modules in > /boot/modules have preference over those in /boot/kernels. I deleted > the old u3g.ko and rebooted (because I couldn't get the system to > unload it). Now everything is fine, the interrupt load is gone. > > I'll monitor this and come back here if it turns out this has just > been coincidence. > > Thank you for all that help. > > Regards This has turned out to be an early call. I just happened to start skype_devel and the whole thing started over, so I think it's pretty certain, now, that this is a hdac issue. However after a reboot I tried to reproduce this and now skype and mpd are both running without causing an irq race. If the irq race reappears I will use the device hint you suggested to rule out that this is an interrupt sharing issue.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49CF6899.2060002>