Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Mar 2005 23:11:12 -0800 (PST)
From:      Doug White <dwhite@gumbysoft.com>
To:        Alan Jay <alan_jay_uk@yahoo.co.uk>
Cc:        freebsd-amd64@freebsd.org
Subject:   RE: BroadcomBCM5704C 10/100/1000 on TyanThunder K8S pro S2882 twin[Alan Jay]  Operteron
Message-ID:  <20050310230725.D64217@carver.gumbysoft.com>
In-Reply-To: <20050310130029.6887154821@buxton.digitalspy.co.uk>
References:  <20050310130029.6887154821@buxton.digitalspy.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 10 Mar 2005, Alan Jay wrote:

>
> > From: Doug White <dwhite@gumbysoft.com>
> >
> > On Mon, 7 Mar 2005, Alan Jay wrote:
> >
> > > Well after upgrading to the latest -STABLE via cvsup and makeworld
> > makekernel
> > > etc we have been doing some more tests over the weekend.
> >
> > When did you run this cvsup?
>
> [Alan Jay] March 2nd.

Being that its been a week  you might give this another spin.

> > > One of our databases ran fine all weekend so we took the plunge on Sunday
> > to
> > > try our big heavily accessed database.
> > >
> > > It ran fine until 7.45 Monday morning - when I checked at 7.30am it was
> > using
> > > around 6 of the 8Gb of RAM the server then logged:
> > >
> > > Mar  7 07:42:47 flappy kernel: bge1: discard frame w/o leading ethernet
> > header
> > > (len 4294967292 pkt len 4294967292)
> >
> > Hm, unsigned -1.  That message is printed by ether_input() if it get
> > handed a bum mbuf.
> >
> > > Followed by:
> > >
> > > Mar  7 07:42:47 flappy kernel: Fatal trap 12: pag
> >
> > Unfortunately this is not useful. We need the entire panic messsage and
> > ideally a backtrace and crashdump.  Can you connect a serial console to
> > this system and log the output?
>
> [Alan Jay] We have done that but the serial terminal is attached to a terminal
> concentrator and it seems to timeout before logging any useful information.
> When we succeeded there was nothing on the serial console in the way of a
> panic message.  Sorry not sure how to do a backtrace or crashdump?

See the section on kernel debugging in the Developer's Handbook. You
activate crashdumps by nominating a partition that is at least as large as
memory with the 'dumpdev' rc.conf variable (and can be enabled at runtime
with the 'dumpon' command).  Once the machine panics and creates the
crashdump, on the ensuing reboot savecore will automatically run and
extract the crashdump. With the crashdump in hand you can use kgdb and a
debugging kernel image to figure out what happened.

> > > Subsequently to that it has crashed a number of times and on a couple of
> > > occasions has reported:
> > >
> > > kernel: fxp0: can't map mbuf (error 12)
> >
> > Error 12 is ENOMEM and thats coming from bus_dmamap_load_mbuf().  That can
> > be returned if you're running out of space for bounce buffers, or kmem in
> > general.  scottl has been working on busdma issues in HEAD and recently
> > committed a fix for i386 for bounce page allocation issues.
> >
> > kmem depletion would be more insidious.  Have you been getting other
> > message that indicates failure to allocate memory or error 12?
>
> [Alan Jay] I had seen them before on the console several times.

Hm, then kmem depletion may be in play. Unforutnately I've not tuned kmem
on amd64 so I don't know if the same variables on i386 apply.



>
> > > By the way over the weekend the latest -STABLE which is marked 5.4-
> > PRERELEASE
> > > 2 seemed much better than 5.3 had and the initial problems took much
> > longer to
> > > appear.  Though once the problems started to appear, they repeated
> > themselves
> > > rebooting every 1-2hrs until we removed the tests data.
> >
> > That behavior sounds a lot like thermal issues.  It takes a while to warm
> > up to the critcal point and once it hits that point it really starts to
> > malfunction.  Unless the test run starts out slow or something.
>
> [Alan Jay] Unlikely as the servers have been on 24hrs a day since we got them
> in a rack at a data centre so the temperature should be reasonable consistent.

Right, but a failed fan keeps that nice cool air from getting to the
burning hot parts. :)

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite@gumbysoft.com          |  www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050310230725.D64217>