From owner-freebsd-fs@FreeBSD.ORG Tue Oct 23 01:55:46 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 821) id 7890B840; Tue, 23 Oct 2012 01:55:46 +0000 (UTC) Date: Tue, 23 Oct 2012 01:55:46 +0000 From: John To: Dennis Glatting Subject: Re: ZFS HBAs + LSI chip sets (Was: ZFS hang (system #2)) Message-ID: <20121023015546.GA60182@FreeBSD.org> References: <50825598.3070505@FreeBSD.org> <1350744349.88577.10.camel@btw.pki2.com> <1350765093.86715.69.camel@btw.pki2.com> <508322EC.4080700@FreeBSD.org> <1350778257.86715.106.camel@btw.pki2.com> <5084F6D5.5080400@digsys.bg> <1350948545.86715.147.camel@btw.pki2.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1350948545.86715.147.camel@btw.pki2.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Oct 2012 01:55:46 -0000 ----- Dennis Glatting's Original Message ----- > On Mon, 2012-10-22 at 09:31 -0700, Freddie Cash wrote: > > On Mon, Oct 22, 2012 at 6:47 AM, Freddie Cash wrote: > > > I'll double-check when I get to work, but I'm pretty sure it's 10.something. > > > > mpt(4) on alpha has firmware 1.5.20.0. > > > > mps(4) on beta has firmware 09.00.00.00, driver 14.00.00.01-fbsd. > > > > mps(4) on omega has firmware 10.00.02.00, driver 14.00.00.01-fbsd. > > > > Hope that helps. > > > > Because one of the RAID1 OS disks failed (System #1), I replaced both > disks and downgraded to stable/8. Two hours ago I submitted a job. > > I noticed on boot smartd issued warnings about disk firmware, which I'll > update this coming weekend, unless the system hangs before then. > > I first want to see if that system will also hang under 8.3. I have > noticed a looping "ls" of the target ZFS directory is MUCH snappier > under 8.3 than 9.x. > > My CentOS 6.3 ZFS-on-Linux system (System #3) is crunching along (24 > hours now). This system under stable/9 would previously spontaneously > reboot whenever I sent a ZFS data set too it. > > System #2 is hung (stable/9). Hi Folks, I just caught up on this thread and thought I toss out some info. I have a number of systems running 9-stable (with some local patches), none running 8. The basic architecture is: http://people.freebsd.org/~jwd/zfsnfsserver.jpg LSI SAS 9201-16e 6G/s 16-Port SATA+SAS Host Bus Adapter All cards are up-to-date on firmware: mps0: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd mps1: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd mps2: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd All drives a geom multipath configured. Currently, these systems are used almost exclusively for iSCSI. I have seen no lockups that I can track down to the driver. I have seen one lockup which I did post about (received no feedback) where I believe an active I/O from istgt is interupted by an ABRT from the client which causes a lock-up. This one is hard to replicate and on the do-do list. It is worth noting that a few drives were replaced early on due to various I/O problems and one with what might be considered a lockup. As has been noted elsewhere, watching gstat can be informative. Also make sure cables are firmly plugged in.. Seems obvious, I know.. I did recently commit a small patch to current to handle a case where if the system has greater than 255 disks, the 255th disk is hidden/masked by the mps initiator id that is statically coded into the driver. I think it might be good to document a bit better the type of mount and test job/test stream running when/if you see a lockup. I am not currently using NFS so there is an entire code-path I am not exercising. Servers are 12 processor, 96GB Ram. The highest cpu load I've seen on the systems is about 800%. All networking is 10G via Chelsio cards - configured to use isr maxthread 6 with a defaultqlimit of 4096. I have seen no problems in this area. Hope this helps a bit. Happy to answer questions. Cheers, John ps: With all that's been said above, it's worth noting that a correctly configured client makes a huge difference.