From owner-freebsd-performance@freebsd.org Mon Aug 1 13:12:09 2016 Return-Path: Delivered-To: freebsd-performance@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 31D76BA7AFD; Mon, 1 Aug 2016 13:12:09 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E526F15E9; Mon, 1 Aug 2016 13:12:08 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost.zedat.fu-berlin.de (Exim 4.85) with esmtps (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (envelope-from ) id <1bUD0m-0037tt-6I>; Mon, 01 Aug 2016 15:12:04 +0200 Received: from p578a69f9.dip0.t-ipconnect.de ([87.138.105.249] helo=freyja.zeit4.iv.bundesimmobilien.de) by inpost2.zedat.fu-berlin.de (Exim 4.85) with esmtpsa (TLSv1.2:AES256-GCM-SHA384:256) (envelope-from ) id <1bUD0l-001Y2q-Rz>; Mon, 01 Aug 2016 15:12:04 +0200 Date: Mon, 1 Aug 2016 15:12:03 +0200 From: "O. Hartmann" To: Borja Marcos Cc: Jason Zhang , freebsd-performance@freebsd.org, freebsd-current@freebsd.org, freebsd-stable@freebsd.org, freebsd-hardware@freebsd.org Subject: Re: mfi driver performance too bad on LSI MegaRAID SAS 9260-8i Message-ID: <20160801151203.14a7a67d@freyja.zeit4.iv.bundesimmobilien.de> In-Reply-To: <1519EC23-0DBC-4139-96F6-250EF872A14B@sarenet.es> References: <16CD100A-3BD0-47BA-A91E-F445E5DF6DBC@cyphytech.com> <1466527001.2694442.644278905.18E236CD@webmail.messagingengine.com> <1790833A-9292-4A46-B43C-BF41C7C801BE@cyphytech.com> <20160801084504.563c79cf@freyja.zeit4.iv.bundesimmobilien.de> <1519EC23-0DBC-4139-96F6-250EF872A14B@sarenet.es> Organization: FU Berlin X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd11.0) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: 87.138.105.249 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Aug 2016 13:12:09 -0000 On Mon, 1 Aug 2016 11:48:30 +0200 Borja Marcos wrote: Hello. First, thanks for responding so quickly. > > On 01 Aug 2016, at 08:45, O. Hartmann wro= te: > >=20 > > On Wed, 22 Jun 2016 08:58:08 +0200 > > Borja Marcos wrote: > > =20 > >> There is an option you can use (I do it all the time!) to make the card > >> behave as a plain HBA so that the disks are handled by the =E2=80=9Cda= =E2=80=9D driver.=20 > >>=20 > >> Add this to /boot/loader.conf > >>=20 > >> hw.mfi.allow_cam_disk_passthrough=3D1 > >> mfip_load=3D=E2=80=9CYES" > >>=20 > >> And do the tests accessing the disks as =E2=80=9Cda=E2=80=9D. To avoid= confusions, it=E2=80=99s > >> better to make sure the disks are not part of a =E2=80=9Cjbod=E2=80=9D= or logical volume > >> configuration. > >>=20 > >>=20 > >>=20 > >>=20 > >> Borja. =20 > > [...] > >=20 > > How is this supposed to work when ALL disks (including boot device) are > > settled with the mfi (in our case, it is a Fujitsu CP400i, based upon > > LSI3008 and detected within FreeBSD 11-BETA and 12-CURRENT) controller > > itself? > >=20 > > I did not find any solution to force the CP400i into a mode making itse= lf > > acting as a HBA (we intend to use all drives with ZFS and let FreeBSD > > kernel/ZFS control everything). =20 >=20 > Have you tried that particular option?=20 I have, indeed, used the "JBOD" function of the PRAID CP400i controller and= the intention of my posting regards to the suspicion, that this is, as mentione= d in many posts concerning RAID controllers and ZFS, the reason for the worse performance. And as I can see, it has been confirmed, sadly. >=20 > With kinda recent LSI based cards you have three options: >=20 > - The most usual and definitely NOT RECOMMENDED option is to define a log= ical > volume per disk which actually LSI Logic called before JBOD mode. It=E2= =80=99s not > recommended at all if you want to run ZFS. This is the only way to expose each disk as it is to the OS with the PRAID CP400i built-in into our RX1330-M2 server (XEON Skylake based). I ordered t= hat specific box with a HBA capable controller. Searching the net reveals that there is another one, called PSAS CP400i, which is also based on LSI/Avago SAS3008 and the possibility to expose drives as-is is explicitely mentioned= . I do not know whether this is a software feature - as I suspect - or something which has been hardwired to the controller. >=20 > - Recent cards, I think I saw this first on the LSI3008, have a JBOD mode > that exposes the drives as =E2=80=9Cmfisyspd=E2=80=9D devices. I don=E2= =80=99t recommend it either, > because the syspd drives are a sort of limited version of a disk device. = With > SSDs, especially, you don=E2=80=99t have access to the TRIM command. They expose the drives as "mfidX" if setup as JBOD. >=20 > - The third option is to make the driver expose the SAS devices like a HBA > would do, so that they are visible to the CAM layer, and disks are handle= d by > the stock =E2=80=9Cda=E2=80=9D driver, which is the ideal solution.=20 I didn't find any switch which offers me the opportunity to put the PRAID CP400i into a simple HBA mode. =20 >=20 > However, this third option might not be available in some custom firmware > versions for certain manufacturers? I don=C2=B4t know. And I would hesita= te to > make the conversion on a production machine unless you have a complete and > reliable full backup of all the data in case you need to rebuild it. The boxes are empty and ready-for-installation, so I do not worry. It is mo= re worrying about this stupid software-based strangulations of options by Fuji= tsu - if any. i do not want to blame them before I haven't double-checked. >=20 > In order to do it you need a couple of things. You need to set the variab= le > hw.mfi.allow_cam_disk_passthrough=3D1 and to load the mfip.ko module. >=20 > When booting installation media, enter command mode and use these command= s: >=20 > ----- > set hw.mfi.allow_cam_disk_passthrough=3D1 > load mfip > boot > =E2=80=94=E2=80=94=E2=80=94 Well, I'm truly aware of this problemacy and solution (now), but I run into= a henn-egg-problem, literally. As long as I can boot off of the installation medium, I have a kernel which deals with the setting. But the boot medium is supposed to be a SSD sitting with the PRAID CP400i controller itself! So, I never be able to boot off the system without crippling the ability to have a fullspeed ZFS configuration which I suppose to have with HBA mode, but not with any of the forced RAID modes offered by the controller.=20 I will check with Fujitsu for a solution. Maybe the PRAID CP400i is capable somehow of being a PSAS CP400i also, even if not exposed by the recent/installed firmware. Kind regards, Oliver=20 =20 >=20 >=20 > Remember that after installation you need to update /boot/loader.conf in = the > system you just installed with the following contents: >=20 > hw.mfi.allow_cam_disk_passthrough=3D1 > mfip_load=3D=E2=80=9CYES=E2=80=9D >=20 >=20 > A note regarding CAM and MFI visibility: On some old firmware versions for > the LSI2008 I=E2=80=99ve even seen the disks available both as =E2=80=9Cm= fi=E2=80=9D and =E2=80=9Cda=E2=80=9D > drivers. If possible, you should try to set them up as =E2=80=9Cunconfigu= red good=E2=80=9D on > the RAID firmware. Use the RAID firmware set up or maybe mfiutil(8) >=20 > Also, make sure you don=E2=80=99t create any logical volumes on the disks= you want > exposed to CAM. You should delete the logical volumes so that the MFI > firmware doesn=E2=80=99t do anything with them.=20 >=20 > AND BEWARE: Doing these changes to a system in production with valuable d= ata > is dangerous. Make sure you have a full and sound backup before making th= ese > changes. >=20 > As a worst case, the card could expose the devices both as =E2=80=9Csyspd= =E2=80=9D and CAM > (i.e., =E2=80=9Cda=E2=80=9D drives) but as long as you don=E2=80=99t touc= h the syspd devices the card > won=E2=80=99t do anything to them as far as I know. It could be a serious= problem, > however, if you access a drive part of a logical volume through CAM, as R= AID > cards tend do to =E2=80=9Cpatrol reads=E2=80=9D and other stuff on them.= =20 >=20 > Provided it=E2=80=99s safe to do what I recommended, try it and follow up= by email.=20 >=20 >=20 >=20 >=20 >=20 > Borja. >=20 >=20 >=20 >=20 > _______________________________________________ > freebsd-performance@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd= .org"