Date: Sat, 16 Feb 2002 14:21:57 -0500 From: Bill Wells <bill@twwells.com> To: FreeBSD-gnats-submit@freebsd.org Subject: kern/35004: [PATCH] Fix for pcm driver lockups Message-ID: <E16cAPF-0002Qk-00@bill.twwells.com>
next in thread | raw e-mail | index | archive | help
>Number: 35004
>Category: kern
>Synopsis: [PATCH] Fix for pcm driver lockups
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Sat Feb 16 11:30:01 PST 2002
>Closed-Date:
>Last-Modified:
>Originator: Bill Wells
>Release: FreeBSD 4.5-STABLE i386
>Organization:
>Environment:
System: FreeBSD bill.twwells.com 4.5-STABLE FreeBSD 4.5-STABLE #0: Sat Feb 16 04:01:38 EST 2002 toor@bill.twwells.com:/usr/obj/usr/src/sys/BILL i386
>Description:
There is a race condition in pcm that will cause it to
lock up in certain circumstances. When the driver is
locked up, all further attempts at accessing it result in
"Device busy". (NB: The -current fixes don't correct
this.)
>How-To-Repeat:
Run this:
while :; do cp beep.au /dev/audio; done
And while it is running, enter this:
cp beep.au /dev/audio
(beep.au can be anything, but the shorter it is the
better.)
>Fix:
The appended patches (against stable) fix this problem and
a few others as well. Before the patches (use patch -l and
you'll probably have to fix the spacing by hand), I'll
explain what they're about.
The first patch, against sound.c, simply adds some
newlines to certain debugging messages. The second patch
involves a number of changes to resolve the lockup problem
and other problems.
First, this bit of code in dsp_open (and its equivalent
for wrch), sets up the channel. Its main problem is that
the reference count is not updated for a newly opened
channel but is instead updated if the channel had been
previously opened. (E.g., an open to read will not
increment the count but a following call to write will
cause the *read* count to be incremented.) These reference
count problems have the effect of preventing kldload from
unloading the driver under certain circumstances.
if (rdch) {
if (flags & FREAD) {
chn_reset(rdch, fmt);
if (flags & O_NONBLOCK)
rdch->flags |= CHN_F_NBIO;
} else {
CHN_LOCK(rdch);
pcm_chnref(rdch, 1);
}
CHN_UNLOCK(rdch);
}
Here's the replacement code. This code is much simpler
because it takes advantage of a couple of facts. First, if
FREAD is set, rdch (i_dev->si_drv1) *must* have previously
been null and *must* be non-null now. So, no need to test
it. Also, the channel that was allocated is already
locked, so no need to do that again.
if (flags & FREAD) {
chn_reset(rdch, fmt);
if (flags & O_NONBLOCK)
rdch->flags |= CHN_F_NBIO;
pcm_chnref(rdch, 1);
CHN_UNLOCK(rdch);
}
The remaining changes are in dsp_close. This bit of code
(and the equivalent wrch code) leaves the channel locked
if the reference count doesn't go to zero. That means that
the channel remains locked when dsp_close exits; I don't
think that's intended. The unlock goes afer the "if", not
inside it.
if (rdch) {
CHN_LOCK(rdch);
if (pcm_chnref(rdch, -1) > 0) {
CHN_UNLOCK(rdch);
exit = 1;
}
}
Note this comment that I've included in the patch. If the
answer is "no", then all is good and the comment can go
away. Otherwise, it's necessary to someday fix the problem
and the comment should be retained in the code.
/* XXX And what happens if one of the channels had 2 references and
the other has but one? The latter won't get reset. Can that
happen? */
In the original code, these twe lines are executed if both
of the reference counts go to zero but not if either is
nonzero. If the reference counts are supposed to indicate
the number of references from si_drv?'s, that's wrong;
these need to be nulled out regardless of what the
reference counts do.
i_dev->si_drv1 = NULL;
i_dev->si_drv2 = NULL;
So, one fix involving these lines is to add them to the
"if (exit)" code. This ensures that the reference counts
correspond to the references.
However, that's not the only problem with these two lines.
In the original code, the fields are cleared before the
abort and flush. In my patch, they are cleared after it.
This misplacement is the actual cause of the lockups.
*** sound.c.orig Fri Feb 15 16:44:01 2002
--- sound.c Sat Feb 16 03:28:30 2002
***************
*** 416,434 ****
snd_mtxlock(d->lock);
if (d->inprog) {
! device_printf(dev, "unregister: operation in progress");
snd_mtxunlock(d->lock);
return EBUSY;
}
SLIST_FOREACH(sce, &d->channels, link) {
if (sce->channel->refcount > 0) {
! device_printf(dev, "unregister: channel busy");
snd_mtxunlock(d->lock);
return EBUSY;
}
}
if (mixer_uninit(dev)) {
! device_printf(dev, "unregister: mixer busy");
snd_mtxunlock(d->lock);
return EBUSY;
}
--- 416,434 ----
snd_mtxlock(d->lock);
if (d->inprog) {
! device_printf(dev, "unregister: operation in progress\n");
snd_mtxunlock(d->lock);
return EBUSY;
}
SLIST_FOREACH(sce, &d->channels, link) {
if (sce->channel->refcount > 0) {
! device_printf(dev, "unregister: channel busy\n");
snd_mtxunlock(d->lock);
return EBUSY;
}
}
if (mixer_uninit(dev)) {
! device_printf(dev, "unregister: mixer busy\n");
snd_mtxunlock(d->lock);
return EBUSY;
}
*** dsp.c.orig Fri Feb 15 16:28:58 2002
--- dsp.c Sat Feb 16 03:33:40 2002
***************
*** 240,265 ****
/* finished with snddev, new channels still locked */
/* bump refcounts, reset and unlock any channels that we just opened */
- if (rdch) {
if (flags & FREAD) {
chn_reset(rdch, fmt);
if (flags & O_NONBLOCK)
rdch->flags |= CHN_F_NBIO;
- } else {
- CHN_LOCK(rdch);
pcm_chnref(rdch, 1);
- }
CHN_UNLOCK(rdch);
}
- if (wrch) {
if (flags & FWRITE) {
chn_reset(wrch, fmt);
if (flags & O_NONBLOCK)
wrch->flags |= CHN_F_NBIO;
- } else {
- CHN_LOCK(wrch);
pcm_chnref(wrch, 1);
- }
CHN_UNLOCK(wrch);
}
splx(s);
--- 240,257 ----
***************
*** 286,316 ****
if (rdch) {
CHN_LOCK(rdch);
if (pcm_chnref(rdch, -1) > 0) {
- CHN_UNLOCK(rdch);
exit = 1;
}
}
if (wrch) {
CHN_LOCK(wrch);
if (pcm_chnref(wrch, -1) > 0) {
- CHN_UNLOCK(wrch);
exit = 1;
}
}
if (exit) {
snd_mtxunlock(d->lock);
splx(s);
return 0;
}
-
/* both refcounts are zero, abort and release */
if (d->fakechan)
d->fakechan->flags = 0;
- i_dev->si_drv1 = NULL;
- i_dev->si_drv2 = NULL;
-
d->flags &= ~SD_F_TRANSIENT;
snd_mtxunlock(d->lock);
--- 278,310 ----
if (rdch) {
CHN_LOCK(rdch);
if (pcm_chnref(rdch, -1) > 0) {
exit = 1;
}
+ CHN_UNLOCK(rdch);
}
if (wrch) {
CHN_LOCK(wrch);
if (pcm_chnref(wrch, -1) > 0) {
exit = 1;
}
+ CHN_UNLOCK(wrch);
}
+ /* XXX And what happens if one of the channels had 2 references and
+ the other has but one? The latter won't get reset. Can that
+ happen? */
+
if (exit) {
+ i_dev->si_drv1 = NULL;
+ i_dev->si_drv2 = NULL;
snd_mtxunlock(d->lock);
splx(s);
return 0;
}
/* both refcounts are zero, abort and release */
if (d->fakechan)
d->fakechan->flags = 0;
d->flags &= ~SD_F_TRANSIENT;
snd_mtxunlock(d->lock);
***************
*** 326,331 ****
--- 320,327 ----
chn_reset(wrch, 0);
pcm_chnrelease(wrch);
}
+ i_dev->si_drv1 = NULL;
+ i_dev->si_drv2 = NULL;
splx(s);
return 0;
>Release-Note:
>Audit-Trail:
>Unformatted:
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E16cAPF-0002Qk-00>
