From owner-freebsd-smp  Mon Sep 11 12:36:23 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from isbalham.ist.co.uk (isbalham.ist.co.uk [192.31.26.1])
	by hub.freebsd.org (Postfix) with ESMTP id 3FAE037B423
	for <smp@FreeBSD.ORG>; Mon, 11 Sep 2000 12:36:19 -0700 (PDT)
Received: (from uucp@localhost)
	by isbalham.ist.co.uk (8.9.2/8.8.7) with UUCP id UAA38140;
	Mon, 11 Sep 2000 20:35:42 +0100 (BST)
	(envelope-from rb@gid.co.uk)
Received: from [194.32.164.2] (eccles [194.32.164.2])
	by seagoon.gid.co.uk (8.9.3/8.9.3) with ESMTP id UAA17717;
	Mon, 11 Sep 2000 20:16:07 +0100 (BST)
	(envelope-from rb@gid.co.uk)
X-Sender: rb@194.32.164.1
Message-Id: <l03130305b5e2df9abe57@[194.32.164.2]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 11 Sep 2000 20:16:07 +0100
To: John Baldwin <jhb@pike.osd.bsdi.com>
From: Bob Bishop <rb@gid.co.uk>
Subject: Re: SMPng box wedges repeatably
Cc: smp@FreeBSD.ORG
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

At 11:07 -0700 11/9/00, John Baldwin wrote:
>Bob Bishop wrote:
>> Hi,
>>
>> Seems that I can repeatably wedge an MP box running SMPng (yesterday's
>> -current) under the following conditions. It's doing a buildworld -j8
>> getting its sources via NFS from box B, its /usr/obj from box C and also
>> NFS-serving /usr/obj for  box C which is also buildworlding; so there's
>> quite a lot of NFS activity. It's also running a couple of dnetcs.
>>
>> Symptoms are as if the scheduler is wedged:  I can ping the box and get
>> into DDB, but nothing is happening in userland. According to ps in DDB the
>> dnetcs are runnable as are half a dozen shells presumably just spawned by
>> make. Eveything else is waiting, a few in ffsvgt or inode, a bunch of
>> shells in wait, a bunch of makes in select, ...
>
>What disk controller do you have?  This is a known problem, but when we
>looked at it, it so far has only happened with ahc SCSI controllers.  When
>it hangs, it seems the ahc driver is waiting for an interrupt that never
>comes.  As a result, any process that accesses the disk blocks forever.
>It is triggered by heavy load situations such as you describe.

It's a 2940UW, so yes, ahc.

>> Anyone want more information from this corpse before I turn off its
>> life-support?
>
>Check to see what wait channels processes are sleeping on.  If a lot are
>sleeping on biowait, it is probably the same problem.

Not a one in biowait. Nearest I can offer is a couple of nfsd in biowr.

As I said, I seem to have a repeatable scenario here so if you want
anything else tried I can probably oblige. Can't offer you a crash dump
this time around but I'll try next time.


--
Bob Bishop              (0118) 977 4017  international code +44 118
rb@gid.co.uk        fax (0118) 989 4254


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message