From owner-freebsd-scsi@FreeBSD.ORG Tue Mar 25 14:31:30 2014 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6F443961; Tue, 25 Mar 2014 14:31:30 +0000 (UTC) Received: from cu01176a.smtpx.saremail.com (cu01176a.smtpx.saremail.com [195.16.150.151]) by mx1.freebsd.org (Postfix) with ESMTP id BAD9823B; Tue, 25 Mar 2014 14:31:29 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop03.sare.net (Postfix) with ESMTPSA id 696D39DCEFC; Tue, 25 Mar 2014 15:31:27 +0100 (CET) Subject: Re: LSI - MR-Fusion controller driver patch and man page Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Borja Marcos In-Reply-To: <8bd5b88321704b49baaf4538c6941292@BN1PR07MB247.namprd07.prod.outlook.com> Date: Tue, 25 Mar 2014 15:31:25 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20140107181139.GC2080@cisco.com> <20140124185356.GA28724@ambrisko.com> <20140124190047.GA34975@ambrisko.com> <9c3fd2b15e9b4c2cb967519a3b7f98ad@BN1PR07MB247.namprd07.prod.outlook.com> <20140318143738.GA65955@cisco.com> <20140320235534.GA92797@cisco.com> <20140321160954.GB99545@cisco.com> <5C32A3C7-B28B-4E69-9DF0-EE53181085F7@sarenet.es> <20140324174519.GA30345@cisco.com> <8bd5b88321704b49baaf4538c6941292@BN1PR07MB247.namprd07.prod.outlook.com> To: "Desai, Kashyap" X-Mailer: Apple Mail (2.1283) X-Mailman-Approved-At: Tue, 25 Mar 2014 15:20:51 +0000 Cc: "scottl@netflix.com" , "Radford, Adam" , "sean_bruno@yahoo.com" , "Mankani, Krishnaraddi" , "dwhite@ixsystems.com" , "Maloy, Joe" , "jpaetzel@freebsd.org" , "freebsd-scsi@freebsd.org" , Doug Ambrisko , "Kenneth D. Merry" , "McConnell, Stephen" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Mar 2014 14:31:30 -0000 On Mar 25, 2014, at 12:42 PM, Desai, Kashyap wrote: > Borja: >=20 > driver will attach Raid volume and JBOD (SysPD) to the CAM = layer. It is not good to expose hidden raid volume or what we called as = pass-through device here to the OS for many reason.. Other than = management things like SMART monitor, we cannot/should not do file = system IO on pass-through devices. =20 Of course it's not a good idea to expose drives that are part of a = logical volume. But unconfigured drives should. Read on, please ;) > With it might be true that user always do file system IO on = deivce and consider /dev/daX as pass-through device... With = all device will be seen as . You cannot identify which will = be a pass-through and which is configured device by LSI config utils. Exposing devices as "da" should not be a mere "esthetic" decision. The = "da" driver has some stuff intended for direct access to disks, but not = for logical volumes created by other devices such as advanced RAID = cards. For example, the "da" device can issue TRIM commands, it reads = device serial numbers (which, now, can be used by GEOM to identify = disks), etc. Disks are more complicated now with that "advanced format" = thing and so I think it's very important for disks to be directly = accessible if you want/need it. Of course other features might be = introduced in the future. Features that may be added to the "da" driver = but which will probably be useless for a logical device, even outright = inappropiate. I would suggest you to offer choice, and, most critically, to offer a = _clear_ _choice_, as you have different kinds of customers. Some will = want/need logical volumes and advanced RAID stuff, others won't. In some = machines I have I am actually doing *both* things at the same time. I = may have a RAID card based mirror for certain tasks, maybe with a UFS = filesystem on it, relying on pass-through to the rest of the devices on = which I use ZFS. I think you should use a specific name for the logical devices, such as = the mfi driver does. If I see a "mfid" device name it's clear that it's = a logical device, not a "bare metal" hard disk, and that its behavior = and features depend mainly on the logical device magic in the card.=20 And you should offer a perfectly transparent pass-through option, maybe = restricted to disks not configured as "RAID" ones (to avoid accidents), = I mean, what you now call "syspd" mode. These disks, ideally, should not = be assigned to a special logic-volume like "mfisyspd" driver (or its = equivalent), but to the "da" driver so that all of the features I expect = from a bare metal hard disk would work. SMART, access to mode pages, = detecting sector sizes, serial numbers, whatever, would work without = hiccups. Doing it the current "syspd" way means that any new feature added to = disks must be added to the card firmware and to the "syspd" portion of = the driver, while keeping a clear access to the SAS (or SATA-on-SAS) = devices with no other manipulation would mean that the "da" driver would = have immediate access to those features with no need to add support to = the card firmware and driver. > It is not a complex code change if pass-through device is required for = , but it is just a matter of no use and more error prone to = expose devices as pass-through.=20 It is certainly error prone if you are using logical devices. But if you = are not using them (my case and there are many others in this situation) = the lack of a well supported pass through device can be error prone. =46rom a mere engineering point of view, it's a bad idea to add = unnecessary software layers. Advanced RAID card features are a lifesaver = for "classic" filesystems such as UFS/FFS, EXTwhateverFS, NTFS, etc, but = can get in the way of other filesystems such as ZFS. ZFS intends to = perform the functions of a RAID device itself.=20 > None of the LSI driver does this including and in = FreeBSD + and / in Linux. I've been using pass-through disks on Adaptec RAID cards (aac), and LSI = Logic (mps and mfi) with different levels of success for years. It can = be tricky, but ZFS works best with direct access to the disks. > If you can express what functionality you think it is missing, if = there is not pass-through device ? Of course. Some of the missing functionalities I would miss by not using = a pass through are: - Inability to support problematic disks with "quirks". The "da" driver = offers a flexible mechanism for that. If not using the da driver I lose = that ability, and you will agree with me that getting a manufacturer = (LSI) to update a cards firmware is much harder than doing it myself if = needed.=20 - Inability to support future/special features without a firmware update = for the card. An example is the diversity of block sizes in SSDs, or, = more recently, TRIM for SSDs. ZFS on FreeBSD now supports TRIM, and it's = important for performance and drive health. How does "syspd" handle it = currently?=20 =20 - Again I will insist on how additional software layers are a bad idea.=20= - Also, one of the "features" of LSI cards represents a serious = operational issue: the persistent assignment of target numbers to disk = serial numbers keeping a table of target-serial number mappings on = NVRAM. There were some recent messages in this list regarding that = problem. And it seems to happen even when using pass-through devices.=20 In the past I have had problems with ZFS and the "old" way of creating = "pseudo JBOD" devices on LSI cards by creating a RAID 0 logical volume = for each disk. For example, hot swapping a broken disk can be more error = prone if, apart from just extracting a disk and adding a new one, I = need to run certain tool to have it effectively recognised by the card = firmware. It adds unnecessary complexity. Moreover, in some cases (I = can't recall the exact details, as it happened several years ago) it = requires a reboot, which defeats the purpose of how swappable disks in = the first place. Please don't underestimate the operational impact of all this. An = operator swapping a disk at 3 am should not need to do any complex check = to determine the disk to extract. Nor he/she should require additional = actions such as "mfiutil online this", activate that or, of course, a = reboot, to have it recognised. ZFS (and, I presume, other advanced = filesystems) has its own commands for that, which include their own = sanity checks doing its best to avoid trouble.=20 > Are you doing ZFS (File system IO) on Pass-through device. ?=20 Indeed I am. And I know there are many successful setups doing the same. > If yes, then why can't you create JBOD/SysPD for that purpose? It's explained above but I will summarize. - Plain simple good engineering practice (avoiding unneeded software = layers), - Access to special/future features on disks - Better observability (monitoring, etc) - Simpler operational procedures which means safer systems operations = and better reliability. Let me be brutally honest here and, please, take no offense but take it = as feedback from a customer. Right now, advanced RAID cards can be more a liability than a desirable feature. = Look at all the places where people repurpose RAID cards to be simple HBAs doing all sorts of unsupported voodoo.=20 Ideally this shouldn't happen, but we are somewhat forced by server = manufacturers. At some point at least, for example, Dell refused to sell "IT mode" LSI2008 cards for = internal devices, selling them just with external SAS=20 connectors. So many people just repurpose the internal, "IR firmware" = cards to "IT mode" so that they can be simple HBAs even=20 though they still pose a problem with that target-serial number feature = in NVRAM. I have an IBM server here with an onboard Invader card which, obviously, has many more features. By defining some design guidelines for your hardware, firmware, and = drives, however, you can get to a win-win solution. If a card can fullfill both roles perfectly (advanced RAID features and plain = HBA) it will no longer be a liability. The same hardware will be appropiate for many purposes, and it will be even better for the = purchasing departments of us, your final customers. No need to be keeping track of several SKUs depending on the intended purpose. = Same card usable for, say, NTFS and ZFS depending only on configuration. And those design guidelines I am suggesting are simple: - Full functioning pass through mode with a minimal surprise component, = with the simplest, most transparent possible access from the CAM layer to the SAS/SATA commands so that those true pass-through devices = get assigned to the right drivers such as "ses", "da", "sa", etc. This = should be a core feature, not an add on to somewhat ease monitoring. - Making that transparent, pass through mode clearly distinguishable = from the logical volume magic, so that the device name reflects its nature and purpose. "mfid" (or "mrsasd", or whatever you like) would = the logical devices, avoiding attaching them to the standard CAM = drivers.=20 You could just repurpose the "syspd" configuration in the newer = cards/firmware versions so that drives marked as "syspd" become = perfectly transparent pass throughs. Please consider it, I am sure you will have many happy customers. =20 (And I hope you endured reading this message until the end!!) Thank you! Borja.