From owner-freebsd-fs@FreeBSD.ORG Tue Oct 23 04:40:07 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B8BCD63B; Tue, 23 Oct 2012 04:40:07 +0000 (UTC) (envelope-from gezeala@gmail.com) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 7E90E8FC0A; Tue, 23 Oct 2012 04:40:07 +0000 (UTC) Received: by mail-pb0-f54.google.com with SMTP id rp8so146637pbb.13 for ; Mon, 22 Oct 2012 21:40:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=xvmqflH7cII84zLfWRltXU4VY69l2zPLHamxz1Nk5gw=; b=03/DM8omXJgsRMnkyaKFTyYVHQLS9PEqOdLd93G3DfBG/Yy6GUOo4JXbr3x5pN0h+F Q+8Qv3PxmZleY7eNOEGWMdCCSLdDHM/SAEVrTmc3Agghxiz7BD75NtKifAuNFSLpU8b/ EKbhRzeuMBEvd4kAzOmnyouRQqK0KlHB5yJjy22TrpM9KiQAlUYqa0NyoGtx/qZ0YlFp 2e6pTIBryb9xegGj6Le7YTu1XuHMfss/cG2HptrNA8ZbnqFCp4J6K1d/we1bw17BKgzm L3S9tP7++E1e0FnYORUw3bBcIxyP5gxMYJAPHnzYz3KL7wG6BPOQbUrNlJcKmTEjSi0F sXBA== Received: by 10.68.131.40 with SMTP id oj8mr37134044pbb.40.1350967206925; Mon, 22 Oct 2012 21:40:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.74.69 with HTTP; Mon, 22 Oct 2012 21:39:26 -0700 (PDT) In-Reply-To: <20121023015546.GA60182@FreeBSD.org> References: <50825598.3070505@FreeBSD.org> <1350744349.88577.10.camel@btw.pki2.com> <1350765093.86715.69.camel@btw.pki2.com> <508322EC.4080700@FreeBSD.org> <1350778257.86715.106.camel@btw.pki2.com> <5084F6D5.5080400@digsys.bg> <1350948545.86715.147.camel@btw.pki2.com> <20121023015546.GA60182@FreeBSD.org> From: =?ISO-8859-1?Q?Gezeala_M=2E_Bacu=F1o_II?= Date: Mon, 22 Oct 2012 21:39:26 -0700 Message-ID: Subject: Re: ZFS HBAs + LSI chip sets (Was: ZFS hang (system #2)) To: John Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Oct 2012 04:40:07 -0000 On Mon, Oct 22, 2012 at 6:55 PM, John wrote: > ----- Dennis Glatting's Original Message ----- >> On Mon, 2012-10-22 at 09:31 -0700, Freddie Cash wrote: >> > On Mon, Oct 22, 2012 at 6:47 AM, Freddie Cash wrote: >> > > I'll double-check when I get to work, but I'm pretty sure it's 10.something. >> > >> > mpt(4) on alpha has firmware 1.5.20.0. >> > >> > mps(4) on beta has firmware 09.00.00.00, driver 14.00.00.01-fbsd. >> > >> > mps(4) on omega has firmware 10.00.02.00, driver 14.00.00.01-fbsd. >> > >> > Hope that helps. >> > >> >> Because one of the RAID1 OS disks failed (System #1), I replaced both >> disks and downgraded to stable/8. Two hours ago I submitted a job. >> >> I noticed on boot smartd issued warnings about disk firmware, which I'll >> update this coming weekend, unless the system hangs before then. >> >> I first want to see if that system will also hang under 8.3. I have >> noticed a looping "ls" of the target ZFS directory is MUCH snappier >> under 8.3 than 9.x. >> >> My CentOS 6.3 ZFS-on-Linux system (System #3) is crunching along (24 >> hours now). This system under stable/9 would previously spontaneously >> reboot whenever I sent a ZFS data set too it. >> >> System #2 is hung (stable/9). > > Hi Folks, > > I just caught up on this thread and thought I toss out some info. > > I have a number of systems running 9-stable (with some local patches), > none running 8. > > The basic architecture is: http://people.freebsd.org/~jwd/zfsnfsserver.jpg > > LSI SAS 9201-16e 6G/s 16-Port SATA+SAS Host Bus Adapter > > All cards are up-to-date on firmware: > > mps0: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd > mps1: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd > mps2: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd > > All drives a geom multipath configured. > > Currently, these systems are used almost exclusively for iSCSI. > > I have seen no lockups that I can track down to the driver. I have seen > one lockup which I did post about (received no feedback) where I believe > an active I/O from istgt is interupted by an ABRT from the client which > causes a lock-up. This one is hard to replicate and on the do-do list. > > It is worth noting that a few drives were replaced early on > due to various I/O problems and one with what might be considered a > lockup. As has been noted elsewhere, watching gstat can be informative. > Also make sure cables are firmly plugged in.. Seems obvious, I know.. > > I did recently commit a small patch to current to handle a case > where if the system has greater than 255 disks, the 255th disk > is hidden/masked by the mps initiator id that is statically coded into > the driver. > > I think it might be good to document a bit better the type of > mount and test job/test stream running when/if you see a lockup. > I am not currently using NFS so there is an entire code-path I > am not exercising. > > Servers are 12 processor, 96GB Ram. The highest cpu load I've > seen on the systems is about 800%. > > All networking is 10G via Chelsio cards - configured to > use isr maxthread 6 with a defaultqlimit of 4096. I have seen > no problems in this area. > > Hope this helps a bit. Happy to answer questions. > > Cheers, > John > > ps: With all that's been said above, it's worth noting that a correctly > configured client makes a huge difference. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" Hello, I remember seeing your diagram while looking up multipath. Have you used this device (or similar) : http://www.lsi.com/channel/products/storagecomponents/Pages/LSISAS6160Switch.aspx ? If yes, have you setup multipath with it? Thanks.