From owner-freebsd-smp Mon Sep 11 12:36:23 2000 Delivered-To: freebsd-smp@freebsd.org Received: from isbalham.ist.co.uk (isbalham.ist.co.uk [192.31.26.1]) by hub.freebsd.org (Postfix) with ESMTP id 3FAE037B423 for ; Mon, 11 Sep 2000 12:36:19 -0700 (PDT) Received: (from uucp@localhost) by isbalham.ist.co.uk (8.9.2/8.8.7) with UUCP id UAA38140; Mon, 11 Sep 2000 20:35:42 +0100 (BST) (envelope-from rb@gid.co.uk) Received: from [194.32.164.2] (eccles [194.32.164.2]) by seagoon.gid.co.uk (8.9.3/8.9.3) with ESMTP id UAA17717; Mon, 11 Sep 2000 20:16:07 +0100 (BST) (envelope-from rb@gid.co.uk) X-Sender: rb@194.32.164.1 Message-Id: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Mon, 11 Sep 2000 20:16:07 +0100 To: John Baldwin From: Bob Bishop Subject: Re: SMPng box wedges repeatably Cc: smp@FreeBSD.ORG Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org At 11:07 -0700 11/9/00, John Baldwin wrote: >Bob Bishop wrote: >> Hi, >> >> Seems that I can repeatably wedge an MP box running SMPng (yesterday's >> -current) under the following conditions. It's doing a buildworld -j8 >> getting its sources via NFS from box B, its /usr/obj from box C and also >> NFS-serving /usr/obj for box C which is also buildworlding; so there's >> quite a lot of NFS activity. It's also running a couple of dnetcs. >> >> Symptoms are as if the scheduler is wedged: I can ping the box and get >> into DDB, but nothing is happening in userland. According to ps in DDB the >> dnetcs are runnable as are half a dozen shells presumably just spawned by >> make. Eveything else is waiting, a few in ffsvgt or inode, a bunch of >> shells in wait, a bunch of makes in select, ... > >What disk controller do you have? This is a known problem, but when we >looked at it, it so far has only happened with ahc SCSI controllers. When >it hangs, it seems the ahc driver is waiting for an interrupt that never >comes. As a result, any process that accesses the disk blocks forever. >It is triggered by heavy load situations such as you describe. It's a 2940UW, so yes, ahc. >> Anyone want more information from this corpse before I turn off its >> life-support? > >Check to see what wait channels processes are sleeping on. If a lot are >sleeping on biowait, it is probably the same problem. Not a one in biowait. Nearest I can offer is a couple of nfsd in biowr. As I said, I seem to have a repeatable scenario here so if you want anything else tried I can probably oblige. Can't offer you a crash dump this time around but I'll try next time. -- Bob Bishop (0118) 977 4017 international code +44 118 rb@gid.co.uk fax (0118) 989 4254 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message