From owner-freebsd-current@FreeBSD.ORG Mon Jul 26 22:33:57 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8A83116A4CE for ; Mon, 26 Jul 2004 22:33:57 +0000 (GMT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0C4A443D39 for ; Mon, 26 Jul 2004 22:33:57 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.11/8.12.11) with ESMTP id i6QMXghe058450; Mon, 26 Jul 2004 15:33:46 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200407262233.i6QMXghe058450@gw.catspoiler.org> Date: Mon, 26 Jul 2004 15:33:42 -0700 (PDT) From: Don Lewis To: conrads@cox.net In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-current@FreeBSD.org Subject: Re: Questionable code in sys/dev/sound/pcm/channel.c X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2004 22:33:57 -0000 On 26 Jul, Conrad J. Sabatier wrote: > > On 26-Jul-2004 Don Lewis wrote: >> On 26 Jul, Conrad J. Sabatier wrote: >>> I'm a little perplexed at the following bit of logic in chn_write() >>> (which is where the "interrupt timeout, channel dead" messages are >>> being generated). >>> >>> Within an else branch within the main while loop, we have: >>> >>> else { >>> timeout = (hz * sndbuf_getblksz(bs)) / >>> (sndbuf_getspd(bs) * sndbuf_getbps(bs)); >>> if (timeout < 1) >>> timeout = 1; >>> timeout = 1; >>> >>> Why the formulaic calculation of timeout, if it's simply going to be >>> unconditionally set to 1 immediately afterwards anyway? What's >>> going on >>> here? >> >> Hmn, looks bogus to me. I think the intention is to round timeout up >> to 1 if the result of the formula is zero. The final assignment >> statement looks bogus to me. Maybe a too short timeout is the >> source of this problem. >> >> It looks like this assignment appeared in rev 1.65. > > Hmm, your guess is as good as (or probably better than) mine. :-) > A little more in the way of comments certainly wouldn't hurt. > >>> Also, at the end of the function: >>> >>> if (count <= 0) { >>> c->flags |= CHN_F_DEAD; >>> printf("%s: play interrupt timeout, channel dead\n", >>> c->name); >>> } >>> >>> return ret; >>> } >>> >>> Could it be that the conditional test is wrong here? Perhaps >>> we should be using (count < 0) instead? >>> >>> I don't know. I'm having no small difficulty understanding this >>> code, but these two items caught my attention. >> >> I ran into the same problem when I was looking at the code a few days >> ago. >> >> BTW, the trace output that was posted showed write() returning 0 >> immediately before the failure occurred. > > Are you referring to the truss output I posted a few days ago? The > thing of it is, though, that the original "channel dead" message had > already occurred in a previous run of madplay (which wasn't traced), so > it's really hard to say if there's any useful info to be obtained from > tracing a later run, after the pcm device was already "broken". I think that was it. The truss output looked like things were working for a while before it croaked. I saw a bunch of writes succeed, then a write returned 0, and then it looked like it died. > So far, I still haven't gotten the error with the new kernel I'm > testing. I wouldn't say absolutely that that single patch (of the > final conditional test) is "the fix", but it may help in the meantime. I just looked at the code some more. With timeout hardwired to 1, count can never go negative. The code initializes count to hz, and then decrements it whenever chn_sleep() returns EWOULDBLOCK, and re-initializes count to hz if chn_sleep() returns zero. With timeout hardwired to 1, count should only be able to decrement to zero if chn_sleep() returns EWOULDBLOCK hz times in a row, which means that nothing could be stuffed into the buffer for one second, which seems like a long time ... I suspect that with your change the write() call is returning a 0 and the player software is doing a retry that succeeds (or this might be audible as a skip).