Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Mar 2005 11:37:08 -0000
From:      "Alan Jay" <alan_jay_uk@yahoo.co.uk>
To:        "'Doug White'" <dwhite@gumbysoft.com>
Cc:        freebsd-amd64@freebsd.org
Subject:   RE: BroadcomBCM5704C 10/100/1000 on TyanThunder K8S pro S2882 twin[Alan Jay]  Operteron
Message-ID:  <20050311113709.97A8A54821@buxton.digitalspy.co.uk>
In-Reply-To: <20050310230725.D64217@carver.gumbysoft.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> -----Original Message-----
> From: Doug White [mailto:dwhite@gumbysoft.com]
> On Thu, 10 Mar 2005, Alan Jay wrote:
> 
> > > From: Doug White <dwhite@gumbysoft.com>
> > >
> > > On Mon, 7 Mar 2005, Alan Jay wrote:
> > >
> > > > Well after upgrading to the latest -STABLE via cvsup and makeworld
> > > makekernel
> > > > etc we have been doing some more tests over the weekend.
> > >
> > > When did you run this cvsup?
> >
> > [Alan Jay] March 2nd.
> 
> Being that its been a week  you might give this another spin.

[Alan Jay] OK will do so - things are moving that fast are they.  Being a
newbie at this kind of thing for a small time period like this do I need to do
a make world and make kernel and follow the full list of things to do or can I
get away with just a new kernel?
 
> > > > around 6 of the 8Gb of RAM the server then logged:
> > > >
> > > > Mar  7 07:42:47 flappy kernel: bge1: discard frame w/o leading
> ethernet
> > > header
> > > > (len 4294967292 pkt len 4294967292)
> > >
> > > Hm, unsigned -1.  That message is printed by ether_input() if it get
> > > handed a bum mbuf.
> > >
> > > > Followed by:
> > > >
> > > > Mar  7 07:42:47 flappy kernel: Fatal trap 12: pag
> > >
> > > Unfortunately this is not useful. We need the entire panic messsage and
> > > ideally a backtrace and crashdump.  Can you connect a serial console to
> > > this system and log the output?
> >
> > [Alan Jay] We have done that but the serial terminal is attached to a
> terminal
> > concentrator and it seems to timeout before logging any useful
> information.
> > When we succeeded there was nothing on the serial console in the way of a
> > panic message.  Sorry not sure how to do a backtrace or crashdump?
> 
> See the section on kernel debugging in the Developer's Handbook. You
> activate crashdumps by nominating a partition that is at least as large as
> memory with the 'dumpdev' rc.conf variable (and can be enabled at runtime
> with the 'dumpon' command).  Once the machine panics and creates the
> crashdump, on the ensuing reboot savecore will automatically run and
> extract the crashdump. With the crashdump in hand you can use kgdb and a
> debugging kernel image to figure out what happened.

[Alan Jay] Thanks will add this in and look at the developers handbook.
 
> > > > Subsequently to that it has crashed a number of times and on a couple
> of
> > > > occasions has reported:
> > > >
> > > > kernel: fxp0: can't map mbuf (error 12)
> > >
> > > Error 12 is ENOMEM and thats coming from bus_dmamap_load_mbuf().  That
> can
> > > be returned if you're running out of space for bounce buffers, or kmem
> in
> > > general.  scottl has been working on busdma issues in HEAD and recently
> > > committed a fix for i386 for bounce page allocation issues.
> > >
> > > kmem depletion would be more insidious.  Have you been getting other
> > > message that indicates failure to allocate memory or error 12?
> >
> > [Alan Jay] I had seen them before on the console several times.
> 
> Hm, then kmem depletion may be in play. Unforutnately I've not tuned kmem
> on amd64 so I don't know if the same variables on i386 apply.

[Alan Jay] OK thanks.
 
> > > > By the way over the weekend the latest -STABLE which is marked 5.4-
> > > PRERELEASE
> > > > 2 seemed much better than 5.3 had and the initial problems took much
> > > longer to
> > > > appear.  Though once the problems started to appear, they repeated
> > > themselves
> > > > rebooting every 1-2hrs until we removed the tests data.
> > >
> > > That behavior sounds a lot like thermal issues.  It takes a while to
> warm
> > > up to the critcal point and once it hits that point it really starts to
> > > malfunction.  Unless the test run starts out slow or something.
> >
> > [Alan Jay] Unlikely as the servers have been on 24hrs a day since we got
> them
> > in a rack at a data centre so the temperature should be reasonable
> consistent.
> 
> Right, but a failed fan keeps that nice cool air from getting to the
> burning hot parts. :)

[Alan Jay] Indeed that is true and I will check the fans when I am next in but
it is relatively low down my list of potential problems especially as we have
seen similar problems on both servers and it only happens we a certain test is
done all the others are fine.  But I never rule anything out.

Thanks for all the input it has been very useful.

Alan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050311113709.97A8A54821>