Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Mar 2010 20:38:31 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Kai Gallasch <gallasch@free.de>
Cc:        freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject:   Re: proliant server lockups with freebsd-amd64-stable (2010-03-10)
Message-ID:  <4B9A8A27.8050608@FreeBSD.org>
In-Reply-To: <20100312115028.GG1819@garage.freebsd.pl>
References:  <20100311133916.42ba69b0@orwell.free.de> <20100312115028.GG1819@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
Pawel Jakub Dawidek wrote:
> On Thu, Mar 11, 2010 at 01:39:16PM +0100, Kai Gallasch wrote:
>> I have some trouble with an opteron server locking up spontaneously. It looses
>> all networks connectivity and even through console I can get no shell.
>>
>> Lockups occur mostly under disk load (periodic daily, bacula backup
>> running, make buildworld/buildkernel) and I can provoke them easily.
> [...]
>>     4     0     0     0  LL     *cissmtx  0xffffff04ed820c00 [g_down]
> [...]
>> 100046                   L      *cissmtx  0xffffff04ed820c00 [irq257: ciss0]
> [...]
> 
> I was analizing similar problem as potential ZFS bug. It turned out to
> be bug in ciss(4) and I believe mav@ (CCed) has fix for that.

That my patch is already at 8-STABLE since r204873 of 2010-03-08. Make
sure you have it.

In this case trap stopped process at ciss_get_request(), which indeed
called holding cissmtx lock. But there is no place to sleep or loop
there, so may be it was just spontaneous. With bugs I was fixing there
was a chance to loop indefinitely between ciss and CAM on resource
constraint. That increases chance for such situation to be caught.

You may try also look what's going on with `top -HS` and `systat -vm 1`.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B9A8A27.8050608>