From owner-freebsd-current@FreeBSD.ORG Mon Jan 24 10:18:42 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9557D16A4CF; Mon, 24 Jan 2005 10:18:42 +0000 (GMT) Received: from publicd.ub.mng.net (publicd.ub.mng.net [202.179.0.88]) by mx1.FreeBSD.org (Postfix) with ESMTP id A585543D49; Mon, 24 Jan 2005 10:18:41 +0000 (GMT) (envelope-from ganbold@micom.mng.net) Received: from [202.179.0.164] (helo=ganbold.micom.mng.net) by publicd.ub.mng.net with esmtpa (Exim 4.43 (FreeBSD)) id 1Ct1O6-000P9A-4h; Mon, 24 Jan 2005 18:24:02 +0800 Message-Id: <6.2.0.14.2.20050124181259.03419040@202.179.0.80> X-Mailer: QUALCOMM Windows Eudora Version 6.2.0.14 Date: Mon, 24 Jan 2005 18:20:25 +0800 To: Robert Watson From: Ganbold In-Reply-To: References: <6.2.0.14.2.20050124113106.03402770@202.179.0.80> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed cc: freebsd-current@freebsd.org Subject: Re: fxp0: device timed out problem X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Jan 2005 10:18:42 -0000 At 05:53 PM 1/24/2005, you wrote: >On Mon, 24 Jan 2005, Ganbold wrote: > >Any luck with disabling ACPI? In particular, are the interrupt >assignments substantially different between booting with ACPI and without? >You can probably just diff -u the old dmesg.boot and the new one... I didn't try disabling ACPI. > > >Usually a device timed out error is related to interrupts from the device > > >not being delivered, being delivered improperly, etc. Does your dmesg > > >contain any references to interrupt storms? Once the above message has > > >printed, do you see any further interrupts on the fxp interrupt source > > >when checking intermittently with "systat -vmstat 1" or "vmstat -i"? > > > > I couldn't check the system by issuing those commands. Following is the > > dmesg output with debug.mpsafenet disabled: > >Couldn't as in, not possible for administrative reasons, because you >couldn't log in once the failure occurred so couldn't get the output, or >because they don't work, or...? Just want to make sure I understand if >this is an administrative issue or symptomatic. Sorry for my poor explanation. Actually I didn't try these commands. > > I didn't do much investigation on those servers that time. However > > without debug.mpsafenet, servers are working fine for more than 3 weeks. > >That is certainly suggestive -- I wonder if we're looking at a locking bug >in fxp0 involving serialization with the hardware. However, it's not >conclusive, I think -- when running MPSAFE, the timing is quite different >on UP as well as SMP hardware, which could trigger other existing bugs. >The big open question, I think, is whether an interrupt delivery problem >is involved. Probably I have to enable debug.mpsafenet in one of the servers and experiment disabling ACPI and checking interrupt source when device times out. I will let you know. thanks, Ganbold >Robert N M Watson > > >_______________________________________________ >freebsd-current@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-current >To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"