Date: Fri, 12 Mar 2010 20:38:31 +0200 From: Alexander Motin <mav@FreeBSD.org> To: Kai Gallasch <gallasch@free.de> Cc: freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org> Subject: Re: proliant server lockups with freebsd-amd64-stable (2010-03-10) Message-ID: <4B9A8A27.8050608@FreeBSD.org> In-Reply-To: <20100312115028.GG1819@garage.freebsd.pl> References: <20100311133916.42ba69b0@orwell.free.de> <20100312115028.GG1819@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
Pawel Jakub Dawidek wrote: > On Thu, Mar 11, 2010 at 01:39:16PM +0100, Kai Gallasch wrote: >> I have some trouble with an opteron server locking up spontaneously. It looses >> all networks connectivity and even through console I can get no shell. >> >> Lockups occur mostly under disk load (periodic daily, bacula backup >> running, make buildworld/buildkernel) and I can provoke them easily. > [...] >> 4 0 0 0 LL *cissmtx 0xffffff04ed820c00 [g_down] > [...] >> 100046 L *cissmtx 0xffffff04ed820c00 [irq257: ciss0] > [...] > > I was analizing similar problem as potential ZFS bug. It turned out to > be bug in ciss(4) and I believe mav@ (CCed) has fix for that. That my patch is already at 8-STABLE since r204873 of 2010-03-08. Make sure you have it. In this case trap stopped process at ciss_get_request(), which indeed called holding cissmtx lock. But there is no place to sleep or loop there, so may be it was just spontaneous. With bugs I was fixing there was a chance to loop indefinitely between ciss and CAM on resource constraint. That increases chance for such situation to be caught. You may try also look what's going on with `top -HS` and `systat -vm 1`. -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B9A8A27.8050608>