Date: Thu, 02 Jun 2011 18:44:50 +0600 From: "Eugene M. Zheganin" <eugene@zhegan.in> To: freebsd-scsi@freebsd.org Subject: lsi1064e Message-ID: <4DE785C2.6080205@zhegan.in>
next in thread | raw e-mail | index | archive | help
Hi. I'm using FreeBSD 8.2 and IBM system x 3250 servers which are bundled with an onboard LSI 1064e controller. I'm using 'em with geom_mirror and zfs (I have like dozen of these). Last time I noticed weird thing on a server with gmirror: one drive died and the server hung up until it was rebooted. This week I was examining some zfs-related freezes (I guess its about arc size, but someone on the irc told me that disks timeouts can be the reason too) and I was experimenting on my test server (waiting for being put into the production). And I noticed some wrong (at least I think it's wrong) behaviour: keeping in mind that last time I got freeze when drive died, I pulled out one of two drives in a zfs mirrored pool. Then I got immediate freeze - all of the disk operations were freezed, but the system was alive. I entered the kernel debugger and saw a bunch of proccesses in D state, including some of the zfs threads. I updated the LSI1064e firmware (last 1.30.xx found on the IBM site), the BIOS, but nothing helps. When one of the disks is pulled out (there's no need to do that in production, but I guess the exact same thing happens when the drive dies along with all of its electric circuits) the system waits indefinitely, until the drive is pushed back, or until the server is rebooted. Then (if the drive is pushed back) the mpt driver realises that either the drive was reset, or that device was lost (I don't know what this depends from). Funny thing: after the drive is pulled out and pushed back, and the camcontrol rescan is issued, you can pull it out again, and this time (and any time after that) the system willl detect that drive is gone quite fast, and no disk operations freeze will happen. You can imagine that this behaviour is not the one anyone expects when drive dies. So I want to ask - if this, perhaps, can be tuned, so the system will keep running and somehow will detect that the drive is failed in some short time, like 3-15 seconds ? Or is this a bug and I need to write a pr ? Thanks. Eugene.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DE785C2.6080205>