Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 09 Dec 2002 10:51:31 +0200
From:      Diomidis Spinellis <dds@aueb.gr>
To:        freebsd-questions@FreeBSD.ORG
Subject:   Re: Sound driver hangs when writing to the system console
Message-ID:  <3DF45993.E5C13094@aueb.gr>
References:  <3DEFD3EB.F7773AB4@aueb.gr>

next in thread | previous in thread | raw e-mail | index | archive | help
Diomidis Spinellis wrote:
> For a couple of months I've been trying to crack a puzzling problem.
> After upgrading from 4.1 to 4.6 (and now to 4.7) my on-board sound card
> would pause or die after playing for approximately 20 minutes with a
> message:
> 
> pcm0:play:0: play interrupt timeout, channel dead
> 
> I have now managed to isolate the timing dependency: the problem comes
> from writing to the system console (which a cron job apparently did
> every 20 minutes).  So a command like:
> 
> cat /usr/share/dict/words >/dev/console
> 
> will immediately break the sound driver with the same message.  I have
> dissabled PnP from the BIOS and locked the card to two different IRQs.
> I have also dissabled the VGA IRQ and even removed entirely the VGA card
> (the box works as an appliance), but none of these measures helped.
> 
> The sound hardware is an on-board WSS-compatible CS4231-based device on
> an Intel Triton motherboard (Pentium 150MHz) running the latest BIOS
> update.  The box has no local storage booting remotely from another
> FreeBSD box through Etherboot.
[...]

Solved.  Detection and resolution procedure follows:

More experiments revealed that essentially any activity with heavy
context switching (e.g. cat /etc/* >/dev/null) would create the same
problem.  Commenting out the DBG macro in sound/pcm/channel.c and a few
commented driver_printf commands resulted in additional messages of the
type:

pcm0: hwptr went backwards 708 -> 628

Repeated and extensive searches in Google news showed that this problem
was not uncommon and was related to interrupt latency performance.  This
makes sense, since the specific system is a 150MHz Pentium class and the
problem was indeed occuring when the system was burdened with context
switching.  

I "fixed" the problem by changing in sound/isa/mss.c the
MSS_DEFAULT_BUFSZ from 4096 to 65536.  Apparently this could also be
done by setting hw.snd.pcm0.buffersize=65536.  However, as I understand
it, this is a read-only value so it can not be set using sysctl in
/etc/syscntl.conf, but can only be passed to the kernel by setting it 
in /boot/loader.conf.local.  Unfortunately my diskless system directly
boots the kernel using etherboot and can not use the boot loader, so I
had to change mss.c.  The diskless boot procedure also means I can not
use top/vmstat/systat since the system does not correctly handle the
nlist when the kernel is not loaded using loader.

Unfortunatelly, even with the new buffer size, the original problem, 20
sec pauses every 20 minutes, persisted.  Tcpdumping the trafic from a
different host revealed an interesting phenomenon.  During the pauses
the nfs client would continously retransmit an nfs read command, and at
the end the server would send an arp-who has message.  Also at the same
time a message " nfsd send error 64" would get logged.  (64 is
EHOSTDOWN).  Some digging through 

nfs/nfs_socket.c    nfs_send     (source of the message)
kern/uipc_socket.c  sosend
net_if_ethersubr.c  ether_output (returns HOSTDOWN)
netinet/if_ether.c  arp_resolve  (sets RTF_REJECT triggering HOSTDOWN)

gave me a very strong suspect: ARP.  The ARP timeout/retry values
exactly match the problems I was experiencing:

static int arpt_keep = (20*60); /* once resolved, good for 20 more
minutes */
static int arpt_down = 20;      /* once declared down, don't send for 20
sec */

Changing the values with sysctl

sysctl net.link.ether.inet.max_age=60
sysctl net.link.ether.inet.host_down_time=5

confirmed my suspicion and also made my problem a lot more easier to
reproduce (otherwise I would have to wait for twenty minutes for the
problem to occur).  The exactl details of this problem and a proposed
kernel patch are now documented in the FreeBSD problem report kern/46116
http://www.freebsd.org/cgi/query-pr.cgi?pr=46116

Diomidis

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3DF45993.E5C13094>