From owner-freebsd-questions Mon Dec 9 0:54: 0 2002 Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D941637B401 for ; Mon, 9 Dec 2002 00:53:57 -0800 (PST) Received: from hermes.aueb.gr (hermes.aueb.gr [195.251.255.142]) by mx1.FreeBSD.org (Postfix) with ESMTP id C04D543EB2 for ; Mon, 9 Dec 2002 00:53:56 -0800 (PST) (envelope-from dds@aueb.gr) Received: from aueb.gr (spinellhw-1.eyelpidwn.offices.aueb.gr [195.251.233.14]) by hermes.aueb.gr (8.12.6/8.12.6) with ESMTP id gB98rGf5005315; Mon, 9 Dec 2002 10:53:16 +0200 Message-ID: <3DF45993.E5C13094@aueb.gr> Date: Mon, 09 Dec 2002 10:51:31 +0200 From: Diomidis Spinellis X-Mailer: Mozilla 4.78 [en] (Windows NT 5.0; U) X-Accept-Language: en,el,de MIME-Version: 1.0 To: freebsd-questions@FreeBSD.ORG Subject: Re: Sound driver hangs when writing to the system console References: <3DEFD3EB.F7773AB4@aueb.gr> Content-Type: text/plain; charset=iso-8859-7 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Diomidis Spinellis wrote: > For a couple of months I've been trying to crack a puzzling problem. > After upgrading from 4.1 to 4.6 (and now to 4.7) my on-board sound card > would pause or die after playing for approximately 20 minutes with a > message: > > pcm0:play:0: play interrupt timeout, channel dead > > I have now managed to isolate the timing dependency: the problem comes > from writing to the system console (which a cron job apparently did > every 20 minutes). So a command like: > > cat /usr/share/dict/words >/dev/console > > will immediately break the sound driver with the same message. I have > dissabled PnP from the BIOS and locked the card to two different IRQs. > I have also dissabled the VGA IRQ and even removed entirely the VGA card > (the box works as an appliance), but none of these measures helped. > > The sound hardware is an on-board WSS-compatible CS4231-based device on > an Intel Triton motherboard (Pentium 150MHz) running the latest BIOS > update. The box has no local storage booting remotely from another > FreeBSD box through Etherboot. [...] Solved. Detection and resolution procedure follows: More experiments revealed that essentially any activity with heavy context switching (e.g. cat /etc/* >/dev/null) would create the same problem. Commenting out the DBG macro in sound/pcm/channel.c and a few commented driver_printf commands resulted in additional messages of the type: pcm0: hwptr went backwards 708 -> 628 Repeated and extensive searches in Google news showed that this problem was not uncommon and was related to interrupt latency performance. This makes sense, since the specific system is a 150MHz Pentium class and the problem was indeed occuring when the system was burdened with context switching. I "fixed" the problem by changing in sound/isa/mss.c the MSS_DEFAULT_BUFSZ from 4096 to 65536. Apparently this could also be done by setting hw.snd.pcm0.buffersize=65536. However, as I understand it, this is a read-only value so it can not be set using sysctl in /etc/syscntl.conf, but can only be passed to the kernel by setting it in /boot/loader.conf.local. Unfortunately my diskless system directly boots the kernel using etherboot and can not use the boot loader, so I had to change mss.c. The diskless boot procedure also means I can not use top/vmstat/systat since the system does not correctly handle the nlist when the kernel is not loaded using loader. Unfortunatelly, even with the new buffer size, the original problem, 20 sec pauses every 20 minutes, persisted. Tcpdumping the trafic from a different host revealed an interesting phenomenon. During the pauses the nfs client would continously retransmit an nfs read command, and at the end the server would send an arp-who has message. Also at the same time a message " nfsd send error 64" would get logged. (64 is EHOSTDOWN). Some digging through nfs/nfs_socket.c nfs_send (source of the message) kern/uipc_socket.c sosend net_if_ethersubr.c ether_output (returns HOSTDOWN) netinet/if_ether.c arp_resolve (sets RTF_REJECT triggering HOSTDOWN) gave me a very strong suspect: ARP. The ARP timeout/retry values exactly match the problems I was experiencing: static int arpt_keep = (20*60); /* once resolved, good for 20 more minutes */ static int arpt_down = 20; /* once declared down, don't send for 20 sec */ Changing the values with sysctl sysctl net.link.ether.inet.max_age=60 sysctl net.link.ether.inet.host_down_time=5 confirmed my suspicion and also made my problem a lot more easier to reproduce (otherwise I would have to wait for twenty minutes for the problem to occur). The exactl details of this problem and a proposed kernel patch are now documented in the FreeBSD problem report kern/46116 http://www.freebsd.org/cgi/query-pr.cgi?pr=46116 Diomidis To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message