Date: Tue, 25 Mar 2014 19:13:53 +0000 From: "Desai, Kashyap" <Kashyap.Desai@lsi.com> To: Borja Marcos <borjam@sarenet.es> Cc: "scottl@netflix.com" <scottl@netflix.com>, "Radford, Adam" <Adam.Radford@lsi.com>, "sean_bruno@yahoo.com" <sean_bruno@yahoo.com>, "Mankani, Krishnaraddi" <Krishnaraddi.Mankani@lsi.com>, "dwhite@ixsystems.com" <dwhite@ixsystems.com>, "Maloy, Joe" <Joe.Maloy@lsi.com>, "jpaetzel@freebsd.org" <jpaetzel@freebsd.org>, "freebsd-scsi@freebsd.org" <freebsd-scsi@freebsd.org>, Doug Ambrisko <ambrisko@cisco.com>, "Kenneth D. Merry" <ken@kdm.org>, "McConnell, Stephen" <Stephen.McConnell@lsi.com> Subject: RE: LSI - MR-Fusion controller driver <mrsas> patch and man page Message-ID: <45cbdd9366aa4a19997d4ca306d0cdcc@BN1PR07MB247.namprd07.prod.outlook.com> In-Reply-To: <B13F319C-09B9-48A3-B082-A0936D714F12@sarenet.es> References: <e59396595152456dbcde63d48f70aa8f@BN1PR07MB247.namprd07.prod.outlook.com> <20140107181139.GC2080@cisco.com> <20140124185356.GA28724@ambrisko.com> <20140124190047.GA34975@ambrisko.com> <9c3fd2b15e9b4c2cb967519a3b7f98ad@BN1PR07MB247.namprd07.prod.outlook.com> <20140318143738.GA65955@cisco.com> <20140320235534.GA92797@cisco.com> <C698AC2A-06A7-4408-9790-20B69FAF31C6@sarenet.es> <20140321160954.GB99545@cisco.com> <5C32A3C7-B28B-4E69-9DF0-EE53181085F7@sarenet.es> <20140324174519.GA30345@cisco.com> <AE6FAC75-D2EA-457F-8654-2ECD57EDAA13@sarenet.es> <8bd5b88321704b49baaf4538c6941292@BN1PR07MB247.namprd07.prod.outlook.com> <B13F319C-09B9-48A3-B082-A0936D714F12@sarenet.es>
next in thread | previous in thread | raw e-mail | index | archive | help
Borja: I have read your comment. First of all thanks for explaining with lots of t= echnical details. I definitely like to take this as feedback and will work= internally to find out how best we can handle this. As of now you cannot u= se mrsas driver as mfi pass-through. I have observed that most of the benefit you mentioned for pass-through is = mainly faced by some manufacturing divisions and we provide temporary drop = for some specific reason. That driver expose Un-configured drive to OS and = they can do FW upgrade of the Drives without doing lots of manual work. Let me explain you one fundamental problem with pass-through drive. Let's say you have 4 drives and all are exposed to OS as pass-through drive= . Now user can't recognize (using LSI provided configuration utils like sto= rcli/MegaCl), if those drive are used by end user. From LSI config utils, i= t is still Un-configured drive and valid for creating Raid volume. So this = is big issue managing physical disk if Un-configured drive are exposed and = used by user. You can run MR controller in JBOD mod where all drives will be default conv= erted as JBOD and visible to the OS. =20 Also, LSI controller support only T10 Thin provisioning standards. For all = JBOD drives command will go to the actual drive, but for Volumes it disable= s via setting values in vpd page 0xb0 for Volumes. <mrsas> controller map Volumes on bus-0 and syspd to bus-1..so you can easi= ly figure out Raid vs JBOD.=20 LSI developed CAM based HBA device driver <mps> and that was under guidanc= e of FreeBSD key folks. Our first goal is to meet <mrsas> driver with all = latest features (which Linux <megaraid_sas> driver supports) and use CAM b= ase interface same as <mps> driver.=20 We will add new features as and when requested and prioritize.=20 Doug: I have to see your query regarding difference between Thunderbolt and Invad= er.=20 ~ Kashyap > -----Original Message----- > From: Borja Marcos [mailto:borjam@sarenet.es] > Sent: Tuesday, March 25, 2014 8:01 PM > To: Desai, Kashyap > Cc: Doug Ambrisko; scottl@netflix.com; Radford, Adam; Kenneth D. Merry; > sean_bruno@yahoo.com; Mankani, Krishnaraddi; dwhite@ixsystems.com; > Maloy, Joe; jpaetzel@freebsd.org; freebsd-scsi@freebsd.org; McConnell, > Stephen > Subject: Re: LSI - MR-Fusion controller driver <mrsas> patch and man page >=20 >=20 > On Mar 25, 2014, at 12:42 PM, Desai, Kashyap wrote: >=20 > > Borja: > > > > <mrsas> driver will attach Raid volume and JBOD (SysPD) to the CAM laye= r. > It is not good to expose hidden raid volume or what we called as pass- > through device here to the OS for many reason.. Other than management > things like SMART monitor, we cannot/should not do file system IO on pass= - > through devices. >=20 > Of course it's not a good idea to expose drives that are part of a logica= l > volume. But unconfigured drives should. Read on, please ;) >=20 > > With <mfi> it might be true that user always do file system IO on <mfiX= > > deivce and consider /dev/daX as pass-through device... With <mrsas> all > device will be seen as <daX>. You cannot identify which will be a pass- > through and which is configured device by LSI config utils. >=20 > Exposing devices as "da" should not be a mere "esthetic" decision. The "d= a" > driver has some stuff intended for direct access to disks, but not for lo= gical > volumes created by other devices such as advanced RAID cards. For example= , > the "da" device can issue TRIM commands, it reads device serial numbers > (which, now, can be used by GEOM to identify disks), etc. Disks are more > complicated now with that "advanced format" thing and so I think it's ver= y > important for disks to be directly accessible if you want/need it. Of co= urse > other features might be introduced in the future. Features that may be > added to the "da" driver but which will probably be useless for a logical > device, even outright inappropiate. >=20 > I would suggest you to offer choice, and, most critically, to offer a _cl= ear_ > _choice_, as you have different kinds of customers. Some will want/need > logical volumes and advanced RAID stuff, others won't. In some machines I > have I am actually doing *both* things at the same time. I may have a RAI= D > card based mirror for certain tasks, maybe with a UFS filesystem on it, r= elying > on pass-through to the rest of the devices on which I use ZFS. >=20 > I think you should use a specific name for the logical devices, such as t= he mfi > driver does. If I see a "mfid" device name it's clear that it's a logical= device, > not a "bare metal" hard disk, and that its behavior and features depend > mainly on the logical device magic in the card. >=20 > And you should offer a perfectly transparent pass-through option, maybe > restricted to disks not configured as "RAID" ones (to avoid accidents), I= mean, > what you now call "syspd" mode. These disks, ideally, should not be assig= ned > to a special logic-volume like "mfisyspd" driver (or its equivalent), but= to the > "da" driver so that all of the features I expect from a bare metal hard d= isk > would work. SMART, access to mode pages, detecting sector sizes, serial > numbers, whatever, would work without hiccups. >=20 > Doing it the current "syspd" way means that any new feature added to disk= s > must be added to the card firmware and to the "syspd" portion of the driv= er, > while keeping a clear access to the SAS (or SATA-on-SAS) devices with no > other manipulation would mean that the "da" driver would have immediate > access to those features with no need to add support to the card firmware > and driver. >=20 >=20 > > It is not a complex code change if pass-through device is required for > <mrsas>, but it is just a matter of no use and more error prone to expose > devices as pass-through. >=20 > It is certainly error prone if you are using logical devices. But if you = are not > using them (my case and there are many others in this situation) the lack= of a > well supported pass through device can be error prone. >=20 > From a mere engineering point of view, it's a bad idea to add unnecessary > software layers. Advanced RAID card features are a lifesaver for "classic= " > filesystems such as UFS/FFS, EXTwhateverFS, NTFS, etc, but can get in the > way of other filesystems such as ZFS. ZFS intends to perform the function= s of > a RAID device itself. >=20 > > None of the LSI driver does this including <mps> and <mrsas> in FreeBSD= + > <megaraid_sas> and <mpt2sas>/<mpt3sas> in Linux. >=20 > I've been using pass-through disks on Adaptec RAID cards (aac), and LSI L= ogic > (mps and mfi) with different levels of success for years. It can be trick= y, but > ZFS works best with direct access to the disks. >=20 > > If you can express what functionality you think it is missing, if there= is not > pass-through device ? >=20 > Of course. Some of the missing functionalities I would miss by not using = a > pass through are: >=20 > - Inability to support problematic disks with "quirks". The "da" driver o= ffers a > flexible mechanism for that. If not using the da driver I lose that abili= ty, and > you will agree with me that getting a manufacturer (LSI) to update a card= s > firmware is much harder than doing it myself if needed. >=20 > - Inability to support future/special features without a firmware update = for > the card. An example is the diversity of block sizes in SSDs, or, more re= cently, > TRIM for SSDs. ZFS on FreeBSD now supports TRIM, and it's important for > performance and drive health. How does "syspd" handle it currently? >=20 > - Again I will insist on how additional software layers are a bad idea. >=20 > - Also, one of the "features" of LSI cards represents a serious operation= al > issue: the persistent assignment of target numbers to disk serial numbers > keeping a table of target-serial number mappings on NVRAM. There were > some recent messages in this list regarding that problem. And it seems to > happen even when using pass-through devices. >=20 > In the past I have had problems with ZFS and the "old" way of creating > "pseudo JBOD" devices on LSI cards by creating a RAID 0 logical volume fo= r > each disk. For example, hot swapping a broken disk can be more error pron= e > if, apart from just extracting a disk and adding a new one, I need to ru= n > certain tool to have it effectively recognised by the card firmware. It a= dds > unnecessary complexity. Moreover, in some cases (I can't recall the exact > details, as it happened several years ago) it requires a reboot, which de= feats > the purpose of how swappable disks in the first place. >=20 > Please don't underestimate the operational impact of all this. An operato= r > swapping a disk at 3 am should not need to do any complex check to > determine the disk to extract. Nor he/she should require additional actio= ns > such as "mfiutil online this", activate that or, of course, a reboot, to = have it > recognised. ZFS (and, I presume, other advanced filesystems) has its own > commands for that, which include their own sanity checks doing its best t= o > avoid trouble. >=20 > > Are you doing ZFS (File system IO) on Pass-through device. ? >=20 > Indeed I am. And I know there are many successful setups doing the same. >=20 > > If yes, then why can't you create JBOD/SysPD for that purpose? >=20 > It's explained above but I will summarize. >=20 > - Plain simple good engineering practice (avoiding unneeded software > layers), > - Access to special/future features on disks > - Better observability (monitoring, etc) > - Simpler operational procedures which means safer systems operations and > better reliability. >=20 > Let me be brutally honest here and, please, take no offense but take it a= s > feedback from a customer. Right now, advanced RAID cards can be more a > liability than a desirable feature. Look at all the places where people > repurpose RAID cards to be simple HBAs doing all sorts of unsupported > voodoo. >=20 > Ideally this shouldn't happen, but we are somewhat forced by server > manufacturers. At some point at least, for example, Dell refused to sell = "IT > mode" LSI2008 cards for internal devices, selling them just with external= SAS > connectors. So many people just repurpose the internal, "IR firmware" car= ds > to "IT mode" so that they can be simple HBAs even though they still pose = a > problem with that target-serial number feature in NVRAM. I have an IBM > server here with an onboard Invader card which, obviously, has many more > features. >=20 > By defining some design guidelines for your hardware, firmware, and drive= s, > however, you can get to a win-win solution. If a card can fullfill both r= oles > perfectly (advanced RAID features and plain HBA) it will no longer be a > liability. The same hardware will be appropiate for many purposes, and it= will > be even better for the purchasing departments of us, your final customers= . > No need to be keeping track of several SKUs depending on the intended > purpose. Same card usable for, say, NTFS and ZFS depending only on > configuration. >=20 > And those design guidelines I am suggesting are simple: >=20 > - Full functioning pass through mode with a minimal surprise component, > with the simplest, most transparent possible access from the CAM layer to > the SAS/SATA commands so that those true pass-through devices get > assigned to the right drivers such as "ses", "da", "sa", etc. This shoul= d be a > core feature, not an add on to somewhat ease monitoring. >=20 > - Making that transparent, pass through mode clearly distinguishable from > the logical volume magic, so that the device name reflects its nature and > purpose. "mfid" (or "mrsasd", or whatever you like) would the logical > devices, avoiding attaching them to the standard CAM drivers. >=20 >=20 > You could just repurpose the "syspd" configuration in the newer > cards/firmware versions so that drives marked as "syspd" become perfectly > transparent pass throughs. >=20 > Please consider it, I am sure you will have many happy customers. >=20 > (And I hope you endured reading this message until the end!!) >=20 >=20 > Thank you! >=20 >=20 >=20 >=20 >=20 >=20 >=20 > Borja. >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45cbdd9366aa4a19997d4ca306d0cdcc>