Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Jan 2005 18:20:25 +0800
From:      Ganbold <ganbold@micom.mng.net>
To:        Robert Watson <rwatson@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: fxp0: device timed out problem
Message-ID:  <6.2.0.14.2.20050124181259.03419040@202.179.0.80>
In-Reply-To: <Pine.NEB.3.96L.1050124093419.63183A-100000@fledge.watson.o rg>
References:  <6.2.0.14.2.20050124113106.03402770@202.179.0.80> <Pine.NEB.3.96L.1050124093419.63183A-100000@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
At 05:53 PM 1/24/2005, you wrote:
>On Mon, 24 Jan 2005, Ganbold wrote:
>
>Any luck with disabling ACPI?  In particular, are the interrupt
>assignments substantially different between booting with ACPI and without?
>You can probably just diff -u the old dmesg.boot and the new one...

I didn't try disabling ACPI.

> > >Usually a device timed out error is related to interrupts from the device
> > >not being delivered, being delivered improperly, etc.  Does your dmesg
> > >contain any references to interrupt storms?  Once the above message has
> > >printed, do you see any further interrupts on the fxp interrupt source
> > >when checking intermittently with "systat -vmstat 1" or "vmstat -i"?
> >
> > I couldn't check the system by issuing those commands.  Following is the
> > dmesg output with debug.mpsafenet disabled:
>
>Couldn't as in, not possible for administrative reasons, because you
>couldn't log in once the failure occurred so couldn't get the output, or
>because they don't work, or...?  Just want to make sure I understand if
>this is an administrative issue or symptomatic.

Sorry for my poor explanation. Actually I didn't try these commands.

> > I didn't do much investigation on those servers that time. However
> > without debug.mpsafenet, servers are working fine for more than 3 weeks.
>
>That is certainly suggestive -- I wonder if we're looking at a locking bug
>in fxp0 involving serialization with the hardware.  However, it's not
>conclusive, I think -- when running MPSAFE, the timing is quite different
>on UP as well as SMP hardware, which could trigger other existing bugs.
>The big open question, I think, is whether an interrupt delivery problem
>is involved.

Probably I have to enable debug.mpsafenet in one of the servers and experiment
disabling ACPI and checking interrupt source when device times out.
I will let you know.

thanks,

Ganbold


>Robert N M Watson
>
>
>_______________________________________________
>freebsd-current@freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-current
>To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6.2.0.14.2.20050124181259.03419040>