Date: Sun, 26 Jul 1998 12:08:36 -0700 From: Mitch Lichtenberg <mitch@pa.dec.com> To: "'current@freebsd.org'" <current@FreeBSD.ORG> Subject: Hard hangs of -current under heavy load - how to debug? Message-ID: <c=US%a=_%p=DEC%l=SRC-EXCHANGE-980726190836Z-4659@src-exchange.pa.dec.com>
next in thread | raw e-mail | index | archive | help
I've been experiencing some random hangs on -current releases over the past few months (I'm currently at 3.0-19980723, but I've seen this since last December). The systems operate under heavy load for about 24 hours, then one or two randomly hang. The hangs are hard (no console messages, no dumps/traps, can't escape to the debugger). It looks like interrupts are disabled. Generally, how do you debug a hang like this? Are there any generic techniques or kernel options that I can enable to help me figure this one out? My next step is to hook up a button to the NMI line to see if I can get into DDB that way, but perhaps there's someting easier I can do in the meantime, or maybe there are known problems with my configuration that someone can point out to me. ---- Workload / system description, for those that are interested: I've got a network of ten identical machines. They netboot from a "master" machine (I did a netboot driver for the DEC DC21143 Ethernet chip if anyone's interested). The workload is a distributed storage application I'm working on, which generates a huge amount of UDP traffic and disk I/O. When the tests are running, the net and disk are running flat out, near maximum throughput. The application is basically I/O bound - I seldom see more than 15% CPU utilization. At present, some PCs are servers (lots of disk and net traffic), and some are clients (only net traffic). Both the clients and servers are affected by this problem, so I'm tempted to believe the disk is OK, but servers do crash more often than clients. The "master" machine, identical to the others, has never crashed. Could there be anything screwy about the hardware interrupt mechanism, or known problems with the VIA VP2/97 chipset? (see http://www.research.digital.com/SRC/personal/Ed_Lee/Petal/petal.html if you'd like to know more about the project) Basic configuration: Motherboard: FIC PA-2007 motherboard (VIA VP2/97 chipset (for ECC)), Processor: Cyrix 6x86MX processor Memory: 64MB Disk: Four IBM Deskstar 8.4GB, UltraDMA, all masters (Promise Ultra33 IDE controller for drives 3 and 4) Network: DEC DE500-BA (DC21143) 100Mb/s, connected to a Prominet fast ethernet switch The machines boot via netboot. Thanks! Mitch Lichtenberg COMPAQ Systems Research Center (yes, formerly Digital Equipment Corp.) Palo Alto, CA. mitch@pa.dec.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?c=US%a=_%p=DEC%l=SRC-EXCHANGE-980726190836Z-4659>