Date: Tue, 1 Mar 2016 20:07:19 -0700 From: Scott Long <scott4long@yahoo.com> To: "Kenneth D. Merry" <ken@FreeBSD.ORG> Cc: scsi@freebsd.org, current@freebsd.org Subject: Re: CAM Shingled Disk support patches available Message-ID: <CF8C42C1-9024-429C-91F5-10A7175C1089@yahoo.com> In-Reply-To: <20160301224758.GA86834@mithlond.kdm.org> References: <20151118171309.GA3564@mithlond.kdm.org> <20160118223704.GA87506@mithlond.kdm.org> <20160301224758.GA86834@mithlond.kdm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Ken, I=E2=80=99m against changing the function signature of = scsi_ata_pass_16(). Even if you manage to get things right with symbol versioning, it still leads = to problems of code compatibility. Maybe pre-existing binaries will work, = but source code will forever have to include an #if __FreeBSD_version < xxxxxx bit of nonsense. I agree that it was incorrect for dxferlen to be declared as a uint16_t. However, the function already contains a sector count argument pair. In theory the sector count multiplied by the sector length, both of which = the application should know in order to arrive at a sensible dxferlen, can substitute for the dxferlen argument. If so, then we can just ignore = that argument and declare that sector_count has logical priority. Really though, I think that scsi_ata_pass_16() is a crummy function. If = its purpose is to implement SAT-3 12.2.2, it does an incredibly poor job at = it: - By my count, it only covers 12 of the available 13 registers. - It has no 12 byte, opcode 0xa1 variant. - It doesn=E2=80=99t make any allowance for providing the response = registers to the caller on completion. Well, maybe it kinda does through a sense = descriptor, but=E2=80=A6. it=E2=80=99s kinda open to vague interpretation. - Its use of the registers is clunky, assuming for example that you=E2=80=99= ll only want to fill the six LBA registers with a host-ordered 64-bit number. There = are plenty of commands that re-use sub-parts of the LBA, features, and/or = sector count registers for different things. =20 I know you stated that you didn=E2=80=99t want to do this, but I think = it=E2=80=99s better to start over with a better function that has a better signature and a new name. = In fact, I think it=E2=80=99s better to use the existing ata_cmd and ata_res = structures from=20 sys/cam/ata/ata_all.h, provide accessors for the multi-byte registers if = needed, provide a 12-byte compatibility, and simply the signature. Something = like this: void scsi_ata_pass(struct ccb_scsiio *csio, u_int32_t retries, void (*cbfcnp)(struct cam_periph *, union ccb *), u_int32_t flags, u_int8_t tag_action, struct ata_cmd *cmd, struct ata_res *res, u_int8_t *data_ptr, u_int32_t dxfer_len, u_int8_t *data_ptr, u_int16_t dxfer_len, u_int8_t sense_len, u_int32_t timeout); To differentiate between the 12 and 16 byte variants, you=E2=80=99d look = at the AP_EXTEND flag in the protocol field. Btw, the handling of that flag is inconsistent in the implementation of the existing scsi_ata_pass_16(). = If the caller providse an ata_res pointer then it gets filled on = completion, otherwise the caller does its best to look at 12.2.2.6 and extract what = it can from the sense descriptor. So my proposal is to create a new scsi_ata_pass and deprecate but not = remove scsi_ata_pass_16. Tell people that if they need to use it, dxfer_len is = going to have lower priority than sector_count/sector_count_exp if the latter = multiply to more than 65535. Scott > On Mar 1, 2016, at 3:47 PM, Kenneth D. Merry <ken@FreeBSD.ORG> wrote: >=20 > I have a new set of SMR patches available. (See the original message = below > for a more detailed description of what these patches do.) >=20 > The primary change is to add library versioning to libcam so that we = can > change the function prototype of scsi_ata_pass_16() in a way that = won't > break existing binaries. >=20 > If someone more familiar with library versioning wants to review this, = I'd > appreciate it. >=20 > The patches are here: >=20 > FreeBSD/head, as of SVN revision 296278 >=20 > https://people.freebsd.org/~ken/cam_smr.head.20160301.1.txt >=20 > stable/10, as of SVN revision 296248 >=20 > https://people.freebsd.org/~ken/cam_smr.stable10.20160301.1.txt >=20 > (Note that although there is a stable/10 version of the patches, I'm = not > planning to merge them to stable/10 because of the change to struct = bio. I > can't really figure out a good way to make that backward compatible. = If > there is consensus that breaking it is fine because it isn't a user = API, > then that may be another story.) >=20 > The problem is that the existing, in-tree version of = scsi_ata_pass_16() has > a dxfer_len argument that is a uint16_t. That restricts transfer = sizes to > 64KB. So, we need to update it to allow larger than 64K transfers. I > could just create a new function, but I'd rather just retire the = broken > version. >=20 > The intent here is that: >=20 > 1. Binaries built against the old version of libcam, before versioning = was > turned on, will get the old version of the scsi_ata_pass_16() function = with > a uint16_t dxfer_len. >=20 > 2. Binaries built against the new version of libcam will get the new > version of the scsi_ata_pass_16() function with a uint32_t dxfer_len. >=20 > I've tested this, and it appears to work, but I'm not 100% certain = this is > all correct. I looked at Dan Eischen's description of symbol = versioning > here: >=20 > https://people.freebsd.org/~deischen/symver/freebsd_versioning.txt >=20 > And it looks like the actual implementation is a little different than = what > is described there. I looked around the tree, and didn't see anything = that > is obviously exactly like what I'm trying to do here. >=20 > So, what I did is as follows: >=20 > 1. For the kernel, the only change is to switch the dxfer_len argument = from > a uint16_t to a uint32_t. >=20 > 2. For userland, in scsi_all.c, there are now two versions of > scsi_ata_pass_16 -- _ver1 and _ver2. _ver1 is aliased to > scsi_ata_pass_16() for FBSD_1.3 using __sym_compat(). _ver2 is = aliased to > scsi_ata_pass_16() for FBSD_1.4 using __sym_default(). >=20 > 3. In lib/libcam/Versions.def, I defined FBSD_1.3 and FBSD_1.4, which > depends on FBSD_1.3. >=20 > 4. In lib/libcam/Symbol.map, I pulled out all of the functions defined = in > libcam, sorted them, and defined them in FBSD_1.3. I moved > scsi_ata_pass_16() to FBSD_1.4. (According to the = freebsd_versioning.txt > paper linked above, I should have been able to have scsi_ata_pass_16() = in > both FBSD_1.3 and FBSD_1.4, but that isn't the case in practice.) >=20 > In testing an old binary (linked against libcam without symbol = versioning) > against a new libcam (with symbol versioning), the old version of the > function appears to be used. With a new binary, the new version of = the > function appears to be used. >=20 > So it looks like things work as intended, but I don't fully trust my > understanding here. So, if someone could take a look at the changes, = I'd > appreciate it. >=20 > In particular, I have a few questions: >=20 > 1. If this change to scsi_ata_pass_16() gets merged to stable/10 = (apart > from the larger SMR changes), what should be done with the libcam = library > version? >=20 > 2. Are 1.3 and 1.4 the proper versions to use? >=20 > 3. If we make additional CAM helper function library changes, when do = the > versions get bumped? i.e., is this an opportunity to look for other > library functions with issues and make changes if possible? >=20 > 4. When you're going from an unversioned library to a versioned = library, > which version of a function gets linked in to a binary linked to the > unversioned library when you run it against a versioned library? In = other > words, what is supposed to happen in the test scenario I tried above, = and > am I really seeing what is supposed to happen? >=20 > Thanks, >=20 > Ken >=20 > On Mon, Jan 18, 2016 at 17:37:04 -0500, Kenneth D. Merry wrote: >> I have a new set of SMR patches available. See below for the full >> explanation. >>=20 >> The primary change here is that I have added SMR support to the = ada(4) >> driver. I spent some time considering whether to try to make the = da(4) and >> ada(4) probe infrastructure somewhat common, but in the end concluded = it >> would be too involved with not enough code reduction (if any) in the = end. >>=20 >> So, although the ideas are similar, the probe logic is separate. >>=20 >> Note that NCQ support for SMR commands (Report Zones, Reset Write = Pointer, >> etc.) for SATA protocol shingled drives isn't active. For both the = da(4) >> and ada(4) driver this is for lack of a way to plumb the ATA = Auxiliary >> register down to the drive. >>=20 >> In the ada(4) case, we need to add the register to struct ccb_ataio = and >> add support in one or more of the underlying SATA drivers, e.g. = ahci(4). >>=20 >> In the da(4) case, it will require an update of the T-10 SAT spec to >> provide a way to pass the Auxiliary register down via the SCSI ATA >> PASS-THROUGH command, and then a subsquent update of the SAT layer in >> various vendors' SAS controller firmware. At that point, there may = be=20 >> an official mapping of the SCSI ZBC commands to the ATA ZAC commands, = and >> we may be able to just issue the SCSI version of the commands instead = of >> composing ATA commands in the da(4) driver. (We'll still need to = keep the >> ATA passthrough version for a while at least to support controllers = that >> don't have the updated translation code.) >>=20 >> FreeBSD/head as of SVN revision 294105: >>=20 >> https://people.freebsd.org/~ken/cam_smr.head.20160118.1.txt >>=20 >> FreeBSD stable/10 as of SVN revision 294100: >>=20 >> https://people.freebsd.org/~ken/cam_smr.stable10.20160118.1.txt >>=20 >> Testing and comments are welcome. >>=20 >> Ken >>=20 >> On Wed, Nov 18, 2015 at 12:13:09 -0500, Kenneth D. Merry wrote: >>>=20 >>> I have work in progress patches to add SMR (Shingled Magnetic = Recording) >>> support to CAM and GEOM here: >>>=20 >>> FreeBSD/head as of SVN revision 290997: >>>=20 >>> https://people.freebsd.org/~ken/cam_smr.head.20151117.1.txt >>>=20 >>> FreeBSD stable/10 as of SVN revision 290995: >>>=20 >>> https://people.freebsd.org/~ken/cam_smr.stable10.20151117.1.txt >>>=20 >>> This includes support for Host Managed, Host Aware and Drive Managed = SMR >>> drives that are either SCSI (ZBC) or ATA (ZAC) attached via a SAS >>> controller. This does not include support for SMR ATA drives = attched via >>> an ATA controller. Also, I have not yet figured out how to properly = detect >>> a Host Managed ATA drive, so this code won't do that. >>>=20 >>> The big drive vendors are moving to SMR for at least some of their = drives. >>> The primary challenge with SMR is that it requires writing a = relatively >>> large zone sequentially starting at the beginning of the zone. The = usual >>> zone size is 256MB. It is conceptually almost like having a 256MB = sector >>> size. >>>=20 >>> We (Spectra Logic) are working on ZFS changes that will use this CAM = and >>> GEOM infrastructure to make ZFS play well with SMR drives. Those = changes >>> aren't yet done. >>>=20 >>> The patches linked above include: >>> o A new 'camcontrol zone' command that allows displaying and = managing >>> drive zones via SCSI/ATA passthrough. >>> o A new zonectl(8) utility that uses the new DIOCZONECMD ioctl to = display >>> and manage zones via the da(4) (and later ada(4)) driver. >>> o Changes to diskinfo -v to display the zone mode of a drive. >>> o A new disk zone API, sys/sys/disk_zone.h. >>> o A new bio type, BIO_ZONE, and modifications to GEOM to support it. = This >>> new bio will allow filesystems to query zone support in a drive = and >>> manage zoned drives. >>> o Extensive modifications to the da(4) driver to handle probing SCSI = and >>> SATA behind SAS SMR drives. >>> o Additional CAM CDB building functions for zone commands. >>>=20 >>> The current issues that need to be addressed are: >>> o The da(4) driver now has 6 additional probe states, 5 of which are >>> needed for probing ATA drives behind SAS controllers. I have not = yet >>> added support for BIO_ZONE bios to ada(4), but it will be very = similar >>> to the da(4) driver version. The ATA probe code needs to be = pulled >>> out of the da(4) driver and changed into a form that will allow it = to >>> work for either the ada(4) or da(4) driver. Otherwise we'll have = a fair >>> amount of code duplication between the two drivers. >>>=20 >>> o There is a reasonable amount of code duplication between = 'camcontrol zone' >>> and zonectl(8). This was done for speed / expediency's sake, but = it may >>> be possible to make more of the code common there. >>>=20 >>> o In order to add the new BIO_ZONE bio command, I had to change the = bio_cmd >>> field in struct bio from a uint8_t to a uint16_t. This will cause >>> binary compatibility problems with any 3rd party loadable modules. >>> Advice on how to handle this would be welcome. >>>=20 >>> o In the process of developing these changes, I discovered that the >>> dxfer_len paramter for scsi_ata_pass_16() was too small (uint16_t, = and >>> it needed to be uint32_t). I increased it, but that will = potentially >>> cause a binary incompatibility problem with any existing = applications >>> that use the current API via libcam. Advice on how to handle that >>> would be welcome. >>>=20 >>> If you look through the code, you'll notice that the disk_zone.h API = is >>> separate from the SCSI and ATA APIs. The intent is to allow = filesystems >>> and other consumers of the API to just talk to the disk zone API = without >>> dealing with the SCSI and ATA specifics. Another reason behind all = of this >>> is that even though the SCSI ZBC and ATA ZAC specs were developed in >>> concert, and are intended to be functionally identical, they are = still SCSI >>> and ATA. As usual, SCSI is big endian and ATA is little endian. So = to >>> present a common API to the filesystem, we give all of the zone data = back >>> in native byte order, regardless of the underlying device protocol. >>>=20 >>> Another thing to note is the extensive use of ATA passthrough in the = da(4) >>> driver. This is necessary because the SCSI SAT (SCSI to ATA = Translation) >>> specification has not yet caught up with translating SCSI zone = commands >>> (ZBC) to ATA zone commands (ZAC). So, until the spec is updated and = LSI >>> and other vendors update their SCSI to ATA translation layers, we'll = have >>> to use the ATA version of the commands when talking to ATA drives = via SAS >>> controllers. >>>=20 >>> I have only tested the code so far with Seagate SATA Drive Managed = and Host >>> Aware drives. I would appreciate testing with any drives. (And = testing to >>> make sure that the patches don't cause problems with existing = hardware.) >>> Right now, all you can really do is manage the zones manually using >>> camcontrol(8) or zonectl(8). Automatic management will come with = the ZFS >>> changes. (Or changes to other filesysems if people want to do it.) >>>=20 >>> If you have a SATA Host Aware drive, in theory camcontrol(8) should = allow >>> you to manage the drive if you have it attached to a SATA = controller. >>>=20 >>> Here is an example of some of the commands. >>>=20 >>> Get general zoning information: >>>=20 >>> [root@sm4u-1 ~]# zonectl -c params -d /dev/da21 >>> Zone Mode: Host Aware >>> Command support: Report Zones, Open, Close, Finish, Reset Write = Pointer >>> Unrestricted Read in Sequential Write Required Zone (URSWRZ): No >>> Optimal Number of Open Sequential Write Preferred Zones: 128 >>> Optimal Number of Non-Sequentially Written Sequential Write = Preferred Zones: 8 >>> Maximum Number of Open Sequential Write Required Zones: Unlimited >>>=20 >>> Look at information from the da(4) driver: >>>=20 >>> [root@sm4u-1 ~]# sysctl kern.cam.da.21 >>> kern.cam.da.21.delete_method: NONE >>> kern.cam.da.21.delete_max: 1081344 >>> kern.cam.da.21.minimum_cmd_size: 6 >>> kern.cam.da.21.sort_io_queue: -1 >>> kern.cam.da.21.zone_mode: Host Aware >>> kern.cam.da.21.zone_support: Report Zones, Open, Close, Finish, = Reset Write Pointer >>> kern.cam.da.21.optimal_seq_zones: 128 >>> kern.cam.da.21.optimal_nonseq_zones: 8 >>> kern.cam.da.21.max_seq_zones: 4294967295 >>> kern.cam.da.21.error_inject: 0 >>>=20 >>> Display all of the zones with zonectl(8): >>>=20 >>> [root@sm4u-1 ~]# zonectl -d /dev/da21 -c rz >>> 29809 zones, Maximum LBA 0x3a3812aaf (15628053167) >>> Zone lengths and types may vary >>> Start LBA Length WP LBA Zone Type Condition = Sequential Reset >>> 0, 524288, 0x80000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x80000, 524288, 0x100000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x100000, 524288, 0x180000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x180000, 524288, 0x200000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x200000, 524288, 0x280000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x280000, 524288, 0x300000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x300000, 524288, 0x380000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x380000, 524288, 0x400000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x400000, 524288, 0x480000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x480000, 524288, 0x500000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x500000, 524288, 0x580000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x580000, 524288, 0x600000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x600000, 524288, 0x680000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x680000, 524288, 0x700000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x700000, 524288, 0x780000, Conventional, NWP, = Sequential, No Reset Needed >>> [ ... ] >>> 0x1f00000, 524288, 0x1f80000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x1f80000, 524288, 0x2000000, Conventional, NWP, = Sequential, No Reset Needed >>> 0x2000000, 524288, 0x2080000, Seq Preferred, Full, = Sequential, No Reset Needed >>> 0x2080000, 524288, 0x2100000, Seq Preferred, Full, = Sequential, No Reset Needed >>> 0x2100000, 524288, 0x2180000, Seq Preferred, Full, = Sequential, No Reset Needed >>> 0x2180000, 524288, 0x2200000, Seq Preferred, Full, = Sequential, No Reset Needed >>> 0x2200000, 524288, 0x2280000, Seq Preferred, Full, = Sequential, No Reset Needed >>> 0x2280000, 524288, 0x2300000, Seq Preferred, Full, = Sequential, No Reset Needed >>> [ ... ] >>>=20 >>> Use camcontrol zone to reset the write pointer for the first = shingled zone >>> listed above: >>>=20 >>> [root@sm4u-1 ~]# camcontrol zone da21 -v -c rwp -l 0x2000000 >>>=20 >>> Use camcontrol zone to ask the drive to report empty zones: >>>=20 >>> [root@sm4u-1 ~]# camcontrol zone da21 -v -c rz -o empty >>> 1 zones, Maximum LBA 0x3a3812aaf (15628053167) >>> Zone lengths and types may vary >>> Start LBA Length WP LBA Zone Type Condition = Sequential Reset >>> 0x2000000, 524288, 0x2000000, Seq Preferred, Empty, = Sequential, No Reset Needed >>>=20 >>> Get information on a Host Aware drive: >>>=20 >>> root@sm4u-1 ~]# diskinfo -v da21 >>> da21 >>> 512 # sectorsize >>> 8001563222016 # mediasize in bytes (7.3T) >>> 15628053168 # mediasize in sectors >>> 4096 # stripesize >>> 0 # stripeoffset >>> 972801 # Cylinders according to firmware. >>> 255 # Heads according to firmware. >>> 63 # Sectors according to firmware. >>> Z84008NY # Disk ident. >>> enc@5003048001f311fd/elmtype@array_device/slot@22 # = Physical path >>> Host Aware # Zone Mode >>>=20 >>> Get information on a drive managed drive: >>>=20 >>> [root@sm4u-1 ~]# diskinfo -v da20 >>> da20 >>> 512 # sectorsize >>> 8001563222016 # mediasize in bytes (7.3T) >>> 15628053168 # mediasize in sectors >>> 4096 # stripesize >>> 0 # stripeoffset >>> 972801 # Cylinders according to firmware. >>> 255 # Heads according to firmware. >>> 63 # Sectors according to firmware. >>> Z8405938 # Disk ident. >>> enc@5003048001f311fd/elmtype@array_device/slot@21 # = Physical path >>> Drive Managed # Zone Mode >>>=20 >>> Get information on a non-zoned drive: >>>=20 >>> [root@sm4u-1 ~]# diskinfo -v da4 >>> da4 >>> 512 # sectorsize >>> 100030242816 # mediasize in bytes (93G) >>> 195371568 # mediasize in sectors >>> 0 # stripesize >>> 0 # stripeoffset >>> 12161 # Cylinders according to firmware. >>> 255 # Heads according to firmware. >>> 63 # Sectors according to firmware. >>> 124903574F36 # Disk ident. >>> enc@5003048001f311fd/elmtype@array_device/slot@5 # = Physical path >>> Not Zoned # Zone Mode >>>=20 >>> Testing and comments are welcome. >>>=20 >>> Thanks, >>>=20 >>> Ken >>> --=20 >>> Kenneth Merry >>> ken@FreeBSD.ORG >>=20 >> --=20 >> Kenneth Merry >> ken@FreeBSD.ORG >=20 > --=20 > Kenneth Merry > ken@FreeBSD.ORG > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CF8C42C1-9024-429C-91F5-10A7175C1089>