Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Sep 2007 11:35:29 +0200
From:      Kris Kennaway <kris@FreeBSD.org>
To:        Benjie Chen <benjie@addgene.org>
Cc:        freebsd-hackers@freebsd.org, freebsd-hardware@freebsd.org
Subject:   Re: Kernel panic on PowerEdge 1950 under certain stress load
Message-ID:  <46F784E1.1080000@FreeBSD.org>
In-Reply-To: <c53be070709211526j2178ebb7ia6ea39e1a5df303c@mail.gmail.com>
References:  <c53be070709211526j2178ebb7ia6ea39e1a5df303c@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Benjie Chen wrote:
> Hi FreeBSD hackers and engineers,
> 
> I am experiencing a kernel panic that comes on when my new PowerEdge 1950
> FreeBSD 6.2 setup is under a certain stress load. I've emailed a few people
> on the list who have given me useful comments, some of which I am still
> following up. But I wanted to send a general cry for help to see if there
> are more knowledge out there about this problem.
> 
> FreeBSD 6.2 on PowerEdge 1950, RAID1 setup with mfi driver (PERC5i). 4GB
> RAM. I am currently running i386, and not amd64, due to various reasons.
> 
> I've ran exhaustively memory tests, disk tests, and network tests and cannot
> produce the kernel panic. I worked with Dell support to run memory test 1
> DIMM at a time and cannot find any problem.  With 1 DIMM at a time, I could
> still get the kernel panic under my work load.
> 
> My work load is heavily hitting a web site running on the machine and
> requiring the web service to do MySQL requests. On the side, I am running a
> bunch of scripts that mostly read from the MySQL database but also write to
> it occasionally. Not memory intensive -- still have usually about 1GB free
> memory, but fairly disk intensive. I don't get disk errors. Anywhere from
> between 10 minutes to 4 or 5 hours into the test, I get the kernel panic.
> Again, still no disk errors. I turned off soft-update, still happens.
> 
> Kernel panic is at 0xC066C731, which from nm shows it's in mtx_lock_spin
>  c066c7b4 T _mtx_lock_spin
>  c066c85c T _mtx_unlock_sleep
> 
> So this could mean that independent stress tests will not result in panic if
> there aren't enough concurrency to cause the problem.
> 
> There are a few other complaints about kernel panics at the same IP on the
> web (google 0xc066c731)... I was wondering if anyone had dealt with this
> before and if there are any work arounds?

The IP is meaningless, it changes each time you compile your kernel. 
Unfortunately even knowing that it is in that symbol is nearly 
meaningless, because it doesn't provide enough information (only that 
your panic involved a spin mutex somehow).  Please read the chapter on 
kernel debugging in the developers handbook and file a PR containing 
enough information for a developer to investigate the problem.

Kris




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?46F784E1.1080000>