Date: Tue, 10 Aug 2004 01:26:36 -0700 (PDT) From: Don Lewis <truckman@FreeBSD.org> To: rnejdl@ringofsaturn.com Cc: dnelson@allantgroup.com Subject: Re: Is anything being done re: the pcm timeout issue? Message-ID: <200408100826.i7A8Qa8H013148@gw.catspoiler.org> In-Reply-To: <44129.12.148.147.242.1092077172.squirrel@[12.148.147.242]>
next in thread | previous in thread | raw e-mail | index | archive | help
On 9 Aug, Rusty Nejdl wrote: > Conrad J. Sabatier said: >> > >>> Sound works fine for me on my Dell laptop (mss driver). I do get a >>> mutex-related panic on my desktop (sb16 driver), but haven't sent in >>> the stack trace to anyone yet. >> >> My problems are occurring on an amd64 box (Athlon 64) with the nVidia >> nForce3 chipset (snd_ich driver). >> >> Sounds works fine for a while, then suddenly I get a pcm play timeout, >> and game over. Have to reboot to get sound to work again. >> >> Others have reported similar problems, but I've seen no followups >> indicating anything is being done about it. >> > > I've been seeing a sound issue on 5.2.1-release and I wonder if it's > related at all to what you are seeing. I have : > > hw.snd.maxautovchans: 4 > hw.snd.pcm0.vchans: 4 > > And I have seen that these will eventually stop working one by one > until I have none left. lsof and fstat don't show any programs using > them, but nonetheless, programms like xmms and gaim can't use them > anymore. The vchan code is fairly broken. I was hoping to have to some time to work on this (and other problems in the top half of the sound code) before 5.3, but it looks like the clock has just about run out. > Do you have any more details on the pcm play timeout? Are you using > vchans? What program are you using? My suspicion is that there is either a problem in ich_intr() that it causing it to stop receiving interrupts or to stop calling chn_intr(), or there is enough interrupt latency to allow the DMA pointer to wrap and fool chn_dmaupdate() into thinking no data was consumed. It is possible that the ich_intr() problem is specific to amd64. I previously sent out these suggestions on how to debug the problem: ------ Forwarded message ------ From: Don Lewis <truckman@freebsd.org> Subject: Re: Questionable code in sys/dev/sound/pcm/channel.c Date: Tue, 27 Jul 2004 15:15:06 -0700 (PDT) To: mat@cnd.mcgill.ca Cc: freebsd-current@freebsd.org On 27 Jul, Mathew Kanner wrote: > On Jul 26, John-Mark Gurney wrote: >> Conrad J. Sabatier wrote this message on Mon, Jul 26, 2004 at 16:35 -0500: >> > Why the formulaic calculation of timeout, if it's simply going to be >> > unconditionally set to 1 immediately afterwards anyway? What's going on >> > here? >> >> Well, if you look at the annotations, that absolute set of timeout was >> added in rev 1.65 by cg with the comment: >> tweaks to reduce latency/pauses in output >> > > > I think this has been raised on the mailling list before. > IIRC, the logic for this is to check frequently for dead channels but > CG is the authoriy. My suspicion is that this change was made to reduce the consequences of lost wakeups from the interrupt routine. This would have been more of a problem when tsleep() was used in chn_sleep() and shouldn't be needed now that the top and bottom halves of the code use the channel lock and chn_sleep() uses msleep() to atomically release the lock and wait for the wakeup from the interrupt code. That said, setting timeout to 1 shouldn't hurt anything and will just waste a bit of CPU time. >> > Also, at the end of the function: >> > >> > if (count <= 0) { >> > c->flags |= CHN_F_DEAD; >> > printf("%s: play interrupt timeout, channel dead\n", c->name); >> > } >> > >> > return ret; >> > } >> >> that was changed in rev1.52 (by cg also), and previously was just a check >> for count == 0.. >> >> So, I'd recommend a message off to cg and ask why he made this changes... The original version of the code always set timeout to 1 and looped on (count > 0), so count could never go negative. When the code was changed to set count to something larger than 1, count could go negative if (hz % timeout != 0), so the condition for setting CHN_F_DEAD had to be modified accordingly. My suspicion is that there is sometimes enough latency in executing the interrupt routine that the hardware DMA pointer is wrapping and chn_dmaupdate() is calculating delta as zero. This would cause chn_wrfeed() not to consume any data from the software buffer (and skip the wakeup()), which might be enough to cause the chn_write() to time out while waiting for space to become available in the software buffer. It would be interesting to enable the debug code in chn_dmaupdate(), and add (delta == 0) as a condition to trigger the device_printf(). The bigger question is what is the cause of the latency ... ------ Forwarded message ------ From: Don Lewis <truckman@freebsd.org> Subject: Re: Questionable code in sys/dev/sound/pcm/channel.c Date: Tue, 27 Jul 2004 15:21:57 -0700 (PDT) To: conrads@cox.net Cc: freebsd-current@freebsd.org On 27 Jul, Conrad J. Sabatier wrote: > > On 26-Jul-2004 Conrad J. Sabatier wrote: >> >> On 26-Jul-2004 Conrad J. Sabatier wrote: >>> I'm a little perplexed at the following bit of logic in chn_write() >>> (which is where the "interrupt timeout, channel dead" messages are >>> being generated). > > [snip] > >>> Also, at the end of the function: >>> >>> if (count <= 0) { >>> c->flags |= CHN_F_DEAD; >>> printf("%s: play interrupt timeout, channel dead\n", >>> c->name); >>> } >>> >>> return ret; >>> } >>> >>> Could it be that the conditional test is wrong here? Perhaps >>> we should be using (count < 0) instead? >> >> I'm now running a kernel built with this last conditional test >> changed to "if (count < 0)" and sound is still working OK. Have yet >> to see if this eliminates the interrupt timeout messages. > > Well, that was a failure. :-) Didn't see any timeout error messages, > but the device still died eventually, nonetheless. I've since changed > back to the original code. That's an interesting data point. At this point I'd start looking at the driver code for your sound hardware. I suspect that the driver interrupt code is either no longer seeing interrupts, or it is no longer calling chn_intr().
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200408100826.i7A8Qa8H013148>