Date: Fri, 26 Oct 2012 08:49:20 -0700 From: Dennis Glatting <freebsd@penx.com> To: John <jwd@freebsd.org> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS HBAs + LSI chip sets (Was: ZFS hang (system #2)) Message-ID: <1351266560.49566.12.camel@btw.pki2.com> In-Reply-To: <20121023015546.GA60182@FreeBSD.org> References: <50825598.3070505@FreeBSD.org> <1350744349.88577.10.camel@btw.pki2.com> <1350765093.86715.69.camel@btw.pki2.com> <508322EC.4080700@FreeBSD.org> <1350778257.86715.106.camel@btw.pki2.com> <CAOjFWZ7G%2BaLPiPQTaUOE5oJY3So0cWYKvU86y4BZ2MQL%2BbqGMA@mail.gmail.com> <5084F6D5.5080400@digsys.bg> <CAOjFWZ4FX2TrZ9Ns_uJ19=gXRxRqig3XQKV8Dz1bg-EqEHte_A@mail.gmail.com> <CAOjFWZ7setVxnES-Nuye%2BYye025yvkF3Lhn3UujP8k4LurhjDA@mail.gmail.com> <1350948545.86715.147.camel@btw.pki2.com> <20121023015546.GA60182@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2012-10-23 at 01:55 +0000, John wrote: > ----- Dennis Glatting's Original Message ----- > > On Mon, 2012-10-22 at 09:31 -0700, Freddie Cash wrote: > > > On Mon, Oct 22, 2012 at 6:47 AM, Freddie Cash <fjwcash@gmail.com> wrote: > > > > I'll double-check when I get to work, but I'm pretty sure it's 10.something. > > > > > > mpt(4) on alpha has firmware 1.5.20.0. > > > > > > mps(4) on beta has firmware 09.00.00.00, driver 14.00.00.01-fbsd. > > > > > > mps(4) on omega has firmware 10.00.02.00, driver 14.00.00.01-fbsd. > > > > > > Hope that helps. > > > > > > > Because one of the RAID1 OS disks failed (System #1), I replaced both > > disks and downgraded to stable/8. Two hours ago I submitted a job. > > > > I noticed on boot smartd issued warnings about disk firmware, which I'll > > update this coming weekend, unless the system hangs before then. > > > > I first want to see if that system will also hang under 8.3. I have > > noticed a looping "ls" of the target ZFS directory is MUCH snappier > > under 8.3 than 9.x. > > > > My CentOS 6.3 ZFS-on-Linux system (System #3) is crunching along (24 > > hours now). This system under stable/9 would previously spontaneously > > reboot whenever I sent a ZFS data set too it. > > > > System #2 is hung (stable/9). > > Hi Folks, > > I just caught up on this thread and thought I toss out some info. > > I have a number of systems running 9-stable (with some local patches), > none running 8. > > The basic architecture is: http://people.freebsd.org/~jwd/zfsnfsserver.jpg > > LSI SAS 9201-16e 6G/s 16-Port SATA+SAS Host Bus Adapter > > All cards are up-to-date on firmware: > > mps0: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd > mps1: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd > mps2: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd > > All drives a geom multipath configured. > > Currently, these systems are used almost exclusively for iSCSI. > > I have seen no lockups that I can track down to the driver. I have seen > one lockup which I did post about (received no feedback) where I believe > an active I/O from istgt is interupted by an ABRT from the client which > causes a lock-up. This one is hard to replicate and on the do-do list. > > It is worth noting that a few drives were replaced early on > due to various I/O problems and one with what might be considered a > lockup. As has been noted elsewhere, watching gstat can be informative. > Also make sure cables are firmly plugged in.. Seems obvious, I know.. > > I did recently commit a small patch to current to handle a case > where if the system has greater than 255 disks, the 255th disk > is hidden/masked by the mps initiator id that is statically coded into > the driver. > > I think it might be good to document a bit better the type of > mount and test job/test stream running when/if you see a lockup. > I am not currently using NFS so there is an entire code-path I > am not exercising. > > Servers are 12 processor, 96GB Ram. The highest cpu load I've > seen on the systems is about 800%. > > All networking is 10G via Chelsio cards - configured to > use isr maxthread 6 with a defaultqlimit of 4096. I have seen > no problems in this area. > > Hope this helps a bit. Happy to answer questions. > I realized this morning that I neglected to ask a question: How big are your files? Mine are anywhere up to 12T/ea. From one of my servers: bd3# ls -lh total 7400750995 drwxr-xr-x 3 root wheel 12B Oct 26 08:14 ./ drwxr-xr-x 7 root wheel 7B Aug 14 10:50 ../ drwxr-xr-x 2 root wheel 2B Oct 25 21:37 Kore/ -rw-r--r-- 1 root wheel 12T Sep 8 10:24 Merged.0.txt -rw-r--r-- 1 root wheel 1.1T Jul 18 07:30 Merged.2.cleansed.print.txt.gz -rw-r--r-- 1 root wheel 1.2T Jul 18 04:13 Merged.3.cleansed.print.txt.gz -rw-r--r-- 1 root wheel 985G Sep 7 17:25 Merged.KoreLogic.1.txt.bz2 -rw-r--r-- 1 root wheel 1.1T Sep 16 00:02 Merged.KoreLogic.3.txt.bz2 -rw-r--r-- 1 root wheel 670G Jul 27 10:01 Merged.outpost9.cleansed.print.txt.bz2 -rw-r--r-- 1 root wheel 639G Aug 30 06:47 Merged.packet.storm.1.print.cleansed.txt.bz2 -rw-r--r-- 1 root wheel 733G Jul 21 03:49 Merged.wordlist.0.cleansed.print.txt.bz2 Trying to work with the 12T file eventually hangs that system. > Cheers, > John > > ps: With all that's been said above, it's worth noting that a correctly > configured client makes a huge difference. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1351266560.49566.12.camel>