Date: Mon, 26 Jul 2004 15:33:42 -0700 (PDT) From: Don Lewis <truckman@FreeBSD.org> To: conrads@cox.net Cc: freebsd-current@FreeBSD.org Subject: Re: Questionable code in sys/dev/sound/pcm/channel.c Message-ID: <200407262233.i6QMXghe058450@gw.catspoiler.org> In-Reply-To: <XFMail.20040726171524.conrads@cox.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 26 Jul, Conrad J. Sabatier wrote: > > On 26-Jul-2004 Don Lewis wrote: >> On 26 Jul, Conrad J. Sabatier wrote: >>> I'm a little perplexed at the following bit of logic in chn_write() >>> (which is where the "interrupt timeout, channel dead" messages are >>> being generated). >>> >>> Within an else branch within the main while loop, we have: >>> >>> else { >>> timeout = (hz * sndbuf_getblksz(bs)) / >>> (sndbuf_getspd(bs) * sndbuf_getbps(bs)); >>> if (timeout < 1) >>> timeout = 1; >>> timeout = 1; >>> >>> Why the formulaic calculation of timeout, if it's simply going to be >>> unconditionally set to 1 immediately afterwards anyway? What's >>> going on >>> here? >> >> Hmn, looks bogus to me. I think the intention is to round timeout up >> to 1 if the result of the formula is zero. The final assignment >> statement looks bogus to me. Maybe a too short timeout is the >> source of this problem. >> >> It looks like this assignment appeared in rev 1.65. > > Hmm, your guess is as good as (or probably better than) mine. :-) > A little more in the way of comments certainly wouldn't hurt. > >>> Also, at the end of the function: >>> >>> if (count <= 0) { >>> c->flags |= CHN_F_DEAD; >>> printf("%s: play interrupt timeout, channel dead\n", >>> c->name); >>> } >>> >>> return ret; >>> } >>> >>> Could it be that the conditional test is wrong here? Perhaps >>> we should be using (count < 0) instead? >>> >>> I don't know. I'm having no small difficulty understanding this >>> code, but these two items caught my attention. >> >> I ran into the same problem when I was looking at the code a few days >> ago. >> >> BTW, the trace output that was posted showed write() returning 0 >> immediately before the failure occurred. > > Are you referring to the truss output I posted a few days ago? The > thing of it is, though, that the original "channel dead" message had > already occurred in a previous run of madplay (which wasn't traced), so > it's really hard to say if there's any useful info to be obtained from > tracing a later run, after the pcm device was already "broken". I think that was it. The truss output looked like things were working for a while before it croaked. I saw a bunch of writes succeed, then a write returned 0, and then it looked like it died. > So far, I still haven't gotten the error with the new kernel I'm > testing. I wouldn't say absolutely that that single patch (of the > final conditional test) is "the fix", but it may help in the meantime. I just looked at the code some more. With timeout hardwired to 1, count can never go negative. The code initializes count to hz, and then decrements it whenever chn_sleep() returns EWOULDBLOCK, and re-initializes count to hz if chn_sleep() returns zero. With timeout hardwired to 1, count should only be able to decrement to zero if chn_sleep() returns EWOULDBLOCK hz times in a row, which means that nothing could be stuffed into the buffer for one second, which seems like a long time ... I suspect that with your change the write() call is returning a 0 and the player software is doing a retry that succeeds (or this might be audible as a skip).
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200407262233.i6QMXghe058450>