Date: Tue, 1 Mar 2016 20:07:19 -0700 From: Scott Long <scott4long@yahoo.com> To: "Kenneth D. Merry" <ken@FreeBSD.ORG> Cc: scsi@freebsd.org, current@freebsd.org Subject: Re: CAM Shingled Disk support patches available Message-ID: <CF8C42C1-9024-429C-91F5-10A7175C1089@yahoo.com> In-Reply-To: <20160301224758.GA86834@mithlond.kdm.org> References: <20151118171309.GA3564@mithlond.kdm.org> <20160118223704.GA87506@mithlond.kdm.org> <20160301224758.GA86834@mithlond.kdm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Ken,
I=E2=80=99m against changing the function signature of =
scsi_ata_pass_16(). Even
if you manage to get things right with symbol versioning, it still leads =
to
problems of code compatibility. Maybe pre-existing binaries will work, =
but
source code will forever have to include an #if __FreeBSD_version <
xxxxxx bit of nonsense.
I agree that it was incorrect for dxferlen to be declared as a uint16_t.
However, the function already contains a sector count argument pair. In
theory the sector count multiplied by the sector length, both of which =
the
application should know in order to arrive at a sensible dxferlen, can
substitute for the dxferlen argument. If so, then we can just ignore =
that
argument and declare that sector_count has logical priority.
Really though, I think that scsi_ata_pass_16() is a crummy function. If =
its
purpose is to implement SAT-3 12.2.2, it does an incredibly poor job at =
it:
- By my count, it only covers 12 of the available 13 registers.
- It has no 12 byte, opcode 0xa1 variant.
- It doesn=E2=80=99t make any allowance for providing the response =
registers to the
caller on completion. Well, maybe it kinda does through a sense =
descriptor,
but=E2=80=A6. it=E2=80=99s kinda open to vague interpretation.
- Its use of the registers is clunky, assuming for example that you=E2=80=99=
ll only want
to fill the six LBA registers with a host-ordered 64-bit number. There =
are
plenty of commands that re-use sub-parts of the LBA, features, and/or =
sector
count registers for different things. =20
I know you stated that you didn=E2=80=99t want to do this, but I think =
it=E2=80=99s better to start
over with a better function that has a better signature and a new name. =
In fact,
I think it=E2=80=99s better to use the existing ata_cmd and ata_res =
structures from=20
sys/cam/ata/ata_all.h, provide accessors for the multi-byte registers if =
needed,
provide a 12-byte compatibility, and simply the signature. Something =
like this:
void scsi_ata_pass(struct ccb_scsiio *csio, u_int32_t retries,
void (*cbfcnp)(struct cam_periph *, union ccb *),
u_int32_t flags, u_int8_t tag_action,
struct ata_cmd *cmd, struct ata_res *res,
u_int8_t *data_ptr, u_int32_t dxfer_len,
u_int8_t *data_ptr, u_int16_t dxfer_len,
u_int8_t sense_len, u_int32_t timeout);
To differentiate between the 12 and 16 byte variants, you=E2=80=99d look =
at the
AP_EXTEND flag in the protocol field. Btw, the handling of that flag is
inconsistent in the implementation of the existing scsi_ata_pass_16(). =
If
the caller providse an ata_res pointer then it gets filled on =
completion,
otherwise the caller does its best to look at 12.2.2.6 and extract what =
it
can from the sense descriptor.
So my proposal is to create a new scsi_ata_pass and deprecate but not =
remove
scsi_ata_pass_16. Tell people that if they need to use it, dxfer_len is =
going to
have lower priority than sector_count/sector_count_exp if the latter =
multiply to
more than 65535.
Scott
> On Mar 1, 2016, at 3:47 PM, Kenneth D. Merry <ken@FreeBSD.ORG> wrote:
>=20
> I have a new set of SMR patches available. (See the original message =
below
> for a more detailed description of what these patches do.)
>=20
> The primary change is to add library versioning to libcam so that we =
can
> change the function prototype of scsi_ata_pass_16() in a way that =
won't
> break existing binaries.
>=20
> If someone more familiar with library versioning wants to review this, =
I'd
> appreciate it.
>=20
> The patches are here:
>=20
> FreeBSD/head, as of SVN revision 296278
>=20
> https://people.freebsd.org/~ken/cam_smr.head.20160301.1.txt
>=20
> stable/10, as of SVN revision 296248
>=20
> https://people.freebsd.org/~ken/cam_smr.stable10.20160301.1.txt
>=20
> (Note that although there is a stable/10 version of the patches, I'm =
not
> planning to merge them to stable/10 because of the change to struct =
bio. I
> can't really figure out a good way to make that backward compatible. =
If
> there is consensus that breaking it is fine because it isn't a user =
API,
> then that may be another story.)
>=20
> The problem is that the existing, in-tree version of =
scsi_ata_pass_16() has
> a dxfer_len argument that is a uint16_t. That restricts transfer =
sizes to
> 64KB. So, we need to update it to allow larger than 64K transfers. I
> could just create a new function, but I'd rather just retire the =
broken
> version.
>=20
> The intent here is that:
>=20
> 1. Binaries built against the old version of libcam, before versioning =
was
> turned on, will get the old version of the scsi_ata_pass_16() function =
with
> a uint16_t dxfer_len.
>=20
> 2. Binaries built against the new version of libcam will get the new
> version of the scsi_ata_pass_16() function with a uint32_t dxfer_len.
>=20
> I've tested this, and it appears to work, but I'm not 100% certain =
this is
> all correct. I looked at Dan Eischen's description of symbol =
versioning
> here:
>=20
> https://people.freebsd.org/~deischen/symver/freebsd_versioning.txt
>=20
> And it looks like the actual implementation is a little different than =
what
> is described there. I looked around the tree, and didn't see anything =
that
> is obviously exactly like what I'm trying to do here.
>=20
> So, what I did is as follows:
>=20
> 1. For the kernel, the only change is to switch the dxfer_len argument =
from
> a uint16_t to a uint32_t.
>=20
> 2. For userland, in scsi_all.c, there are now two versions of
> scsi_ata_pass_16 -- _ver1 and _ver2. _ver1 is aliased to
> scsi_ata_pass_16() for FBSD_1.3 using __sym_compat(). _ver2 is =
aliased to
> scsi_ata_pass_16() for FBSD_1.4 using __sym_default().
>=20
> 3. In lib/libcam/Versions.def, I defined FBSD_1.3 and FBSD_1.4, which
> depends on FBSD_1.3.
>=20
> 4. In lib/libcam/Symbol.map, I pulled out all of the functions defined =
in
> libcam, sorted them, and defined them in FBSD_1.3. I moved
> scsi_ata_pass_16() to FBSD_1.4. (According to the =
freebsd_versioning.txt
> paper linked above, I should have been able to have scsi_ata_pass_16() =
in
> both FBSD_1.3 and FBSD_1.4, but that isn't the case in practice.)
>=20
> In testing an old binary (linked against libcam without symbol =
versioning)
> against a new libcam (with symbol versioning), the old version of the
> function appears to be used. With a new binary, the new version of =
the
> function appears to be used.
>=20
> So it looks like things work as intended, but I don't fully trust my
> understanding here. So, if someone could take a look at the changes, =
I'd
> appreciate it.
>=20
> In particular, I have a few questions:
>=20
> 1. If this change to scsi_ata_pass_16() gets merged to stable/10 =
(apart
> from the larger SMR changes), what should be done with the libcam =
library
> version?
>=20
> 2. Are 1.3 and 1.4 the proper versions to use?
>=20
> 3. If we make additional CAM helper function library changes, when do =
the
> versions get bumped? i.e., is this an opportunity to look for other
> library functions with issues and make changes if possible?
>=20
> 4. When you're going from an unversioned library to a versioned =
library,
> which version of a function gets linked in to a binary linked to the
> unversioned library when you run it against a versioned library? In =
other
> words, what is supposed to happen in the test scenario I tried above, =
and
> am I really seeing what is supposed to happen?
>=20
> Thanks,
>=20
> Ken
>=20
> On Mon, Jan 18, 2016 at 17:37:04 -0500, Kenneth D. Merry wrote:
>> I have a new set of SMR patches available. See below for the full
>> explanation.
>>=20
>> The primary change here is that I have added SMR support to the =
ada(4)
>> driver. I spent some time considering whether to try to make the =
da(4) and
>> ada(4) probe infrastructure somewhat common, but in the end concluded =
it
>> would be too involved with not enough code reduction (if any) in the =
end.
>>=20
>> So, although the ideas are similar, the probe logic is separate.
>>=20
>> Note that NCQ support for SMR commands (Report Zones, Reset Write =
Pointer,
>> etc.) for SATA protocol shingled drives isn't active. For both the =
da(4)
>> and ada(4) driver this is for lack of a way to plumb the ATA =
Auxiliary
>> register down to the drive.
>>=20
>> In the ada(4) case, we need to add the register to struct ccb_ataio =
and
>> add support in one or more of the underlying SATA drivers, e.g. =
ahci(4).
>>=20
>> In the da(4) case, it will require an update of the T-10 SAT spec to
>> provide a way to pass the Auxiliary register down via the SCSI ATA
>> PASS-THROUGH command, and then a subsquent update of the SAT layer in
>> various vendors' SAS controller firmware. At that point, there may =
be=20
>> an official mapping of the SCSI ZBC commands to the ATA ZAC commands, =
and
>> we may be able to just issue the SCSI version of the commands instead =
of
>> composing ATA commands in the da(4) driver. (We'll still need to =
keep the
>> ATA passthrough version for a while at least to support controllers =
that
>> don't have the updated translation code.)
>>=20
>> FreeBSD/head as of SVN revision 294105:
>>=20
>> https://people.freebsd.org/~ken/cam_smr.head.20160118.1.txt
>>=20
>> FreeBSD stable/10 as of SVN revision 294100:
>>=20
>> https://people.freebsd.org/~ken/cam_smr.stable10.20160118.1.txt
>>=20
>> Testing and comments are welcome.
>>=20
>> Ken
>>=20
>> On Wed, Nov 18, 2015 at 12:13:09 -0500, Kenneth D. Merry wrote:
>>>=20
>>> I have work in progress patches to add SMR (Shingled Magnetic =
Recording)
>>> support to CAM and GEOM here:
>>>=20
>>> FreeBSD/head as of SVN revision 290997:
>>>=20
>>> https://people.freebsd.org/~ken/cam_smr.head.20151117.1.txt
>>>=20
>>> FreeBSD stable/10 as of SVN revision 290995:
>>>=20
>>> https://people.freebsd.org/~ken/cam_smr.stable10.20151117.1.txt
>>>=20
>>> This includes support for Host Managed, Host Aware and Drive Managed =
SMR
>>> drives that are either SCSI (ZBC) or ATA (ZAC) attached via a SAS
>>> controller. This does not include support for SMR ATA drives =
attched via
>>> an ATA controller. Also, I have not yet figured out how to properly =
detect
>>> a Host Managed ATA drive, so this code won't do that.
>>>=20
>>> The big drive vendors are moving to SMR for at least some of their =
drives.
>>> The primary challenge with SMR is that it requires writing a =
relatively
>>> large zone sequentially starting at the beginning of the zone. The =
usual
>>> zone size is 256MB. It is conceptually almost like having a 256MB =
sector
>>> size.
>>>=20
>>> We (Spectra Logic) are working on ZFS changes that will use this CAM =
and
>>> GEOM infrastructure to make ZFS play well with SMR drives. Those =
changes
>>> aren't yet done.
>>>=20
>>> The patches linked above include:
>>> o A new 'camcontrol zone' command that allows displaying and =
managing
>>> drive zones via SCSI/ATA passthrough.
>>> o A new zonectl(8) utility that uses the new DIOCZONECMD ioctl to =
display
>>> and manage zones via the da(4) (and later ada(4)) driver.
>>> o Changes to diskinfo -v to display the zone mode of a drive.
>>> o A new disk zone API, sys/sys/disk_zone.h.
>>> o A new bio type, BIO_ZONE, and modifications to GEOM to support it. =
This
>>> new bio will allow filesystems to query zone support in a drive =
and
>>> manage zoned drives.
>>> o Extensive modifications to the da(4) driver to handle probing SCSI =
and
>>> SATA behind SAS SMR drives.
>>> o Additional CAM CDB building functions for zone commands.
>>>=20
>>> The current issues that need to be addressed are:
>>> o The da(4) driver now has 6 additional probe states, 5 of which are
>>> needed for probing ATA drives behind SAS controllers. I have not =
yet
>>> added support for BIO_ZONE bios to ada(4), but it will be very =
similar
>>> to the da(4) driver version. The ATA probe code needs to be =
pulled
>>> out of the da(4) driver and changed into a form that will allow it =
to
>>> work for either the ada(4) or da(4) driver. Otherwise we'll have =
a fair
>>> amount of code duplication between the two drivers.
>>>=20
>>> o There is a reasonable amount of code duplication between =
'camcontrol zone'
>>> and zonectl(8). This was done for speed / expediency's sake, but =
it may
>>> be possible to make more of the code common there.
>>>=20
>>> o In order to add the new BIO_ZONE bio command, I had to change the =
bio_cmd
>>> field in struct bio from a uint8_t to a uint16_t. This will cause
>>> binary compatibility problems with any 3rd party loadable modules.
>>> Advice on how to handle this would be welcome.
>>>=20
>>> o In the process of developing these changes, I discovered that the
>>> dxfer_len paramter for scsi_ata_pass_16() was too small (uint16_t, =
and
>>> it needed to be uint32_t). I increased it, but that will =
potentially
>>> cause a binary incompatibility problem with any existing =
applications
>>> that use the current API via libcam. Advice on how to handle that
>>> would be welcome.
>>>=20
>>> If you look through the code, you'll notice that the disk_zone.h API =
is
>>> separate from the SCSI and ATA APIs. The intent is to allow =
filesystems
>>> and other consumers of the API to just talk to the disk zone API =
without
>>> dealing with the SCSI and ATA specifics. Another reason behind all =
of this
>>> is that even though the SCSI ZBC and ATA ZAC specs were developed in
>>> concert, and are intended to be functionally identical, they are =
still SCSI
>>> and ATA. As usual, SCSI is big endian and ATA is little endian. So =
to
>>> present a common API to the filesystem, we give all of the zone data =
back
>>> in native byte order, regardless of the underlying device protocol.
>>>=20
>>> Another thing to note is the extensive use of ATA passthrough in the =
da(4)
>>> driver. This is necessary because the SCSI SAT (SCSI to ATA =
Translation)
>>> specification has not yet caught up with translating SCSI zone =
commands
>>> (ZBC) to ATA zone commands (ZAC). So, until the spec is updated and =
LSI
>>> and other vendors update their SCSI to ATA translation layers, we'll =
have
>>> to use the ATA version of the commands when talking to ATA drives =
via SAS
>>> controllers.
>>>=20
>>> I have only tested the code so far with Seagate SATA Drive Managed =
and Host
>>> Aware drives. I would appreciate testing with any drives. (And =
testing to
>>> make sure that the patches don't cause problems with existing =
hardware.)
>>> Right now, all you can really do is manage the zones manually using
>>> camcontrol(8) or zonectl(8). Automatic management will come with =
the ZFS
>>> changes. (Or changes to other filesysems if people want to do it.)
>>>=20
>>> If you have a SATA Host Aware drive, in theory camcontrol(8) should =
allow
>>> you to manage the drive if you have it attached to a SATA =
controller.
>>>=20
>>> Here is an example of some of the commands.
>>>=20
>>> Get general zoning information:
>>>=20
>>> [root@sm4u-1 ~]# zonectl -c params -d /dev/da21
>>> Zone Mode: Host Aware
>>> Command support: Report Zones, Open, Close, Finish, Reset Write =
Pointer
>>> Unrestricted Read in Sequential Write Required Zone (URSWRZ): No
>>> Optimal Number of Open Sequential Write Preferred Zones: 128
>>> Optimal Number of Non-Sequentially Written Sequential Write =
Preferred Zones: 8
>>> Maximum Number of Open Sequential Write Required Zones: Unlimited
>>>=20
>>> Look at information from the da(4) driver:
>>>=20
>>> [root@sm4u-1 ~]# sysctl kern.cam.da.21
>>> kern.cam.da.21.delete_method: NONE
>>> kern.cam.da.21.delete_max: 1081344
>>> kern.cam.da.21.minimum_cmd_size: 6
>>> kern.cam.da.21.sort_io_queue: -1
>>> kern.cam.da.21.zone_mode: Host Aware
>>> kern.cam.da.21.zone_support: Report Zones, Open, Close, Finish, =
Reset Write Pointer
>>> kern.cam.da.21.optimal_seq_zones: 128
>>> kern.cam.da.21.optimal_nonseq_zones: 8
>>> kern.cam.da.21.max_seq_zones: 4294967295
>>> kern.cam.da.21.error_inject: 0
>>>=20
>>> Display all of the zones with zonectl(8):
>>>=20
>>> [root@sm4u-1 ~]# zonectl -d /dev/da21 -c rz
>>> 29809 zones, Maximum LBA 0x3a3812aaf (15628053167)
>>> Zone lengths and types may vary
>>> Start LBA Length WP LBA Zone Type Condition =
Sequential Reset
>>> 0, 524288, 0x80000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x80000, 524288, 0x100000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x100000, 524288, 0x180000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x180000, 524288, 0x200000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x200000, 524288, 0x280000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x280000, 524288, 0x300000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x300000, 524288, 0x380000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x380000, 524288, 0x400000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x400000, 524288, 0x480000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x480000, 524288, 0x500000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x500000, 524288, 0x580000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x580000, 524288, 0x600000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x600000, 524288, 0x680000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x680000, 524288, 0x700000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x700000, 524288, 0x780000, Conventional, NWP, =
Sequential, No Reset Needed
>>> [ ... ]
>>> 0x1f00000, 524288, 0x1f80000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x1f80000, 524288, 0x2000000, Conventional, NWP, =
Sequential, No Reset Needed
>>> 0x2000000, 524288, 0x2080000, Seq Preferred, Full, =
Sequential, No Reset Needed
>>> 0x2080000, 524288, 0x2100000, Seq Preferred, Full, =
Sequential, No Reset Needed
>>> 0x2100000, 524288, 0x2180000, Seq Preferred, Full, =
Sequential, No Reset Needed
>>> 0x2180000, 524288, 0x2200000, Seq Preferred, Full, =
Sequential, No Reset Needed
>>> 0x2200000, 524288, 0x2280000, Seq Preferred, Full, =
Sequential, No Reset Needed
>>> 0x2280000, 524288, 0x2300000, Seq Preferred, Full, =
Sequential, No Reset Needed
>>> [ ... ]
>>>=20
>>> Use camcontrol zone to reset the write pointer for the first =
shingled zone
>>> listed above:
>>>=20
>>> [root@sm4u-1 ~]# camcontrol zone da21 -v -c rwp -l 0x2000000
>>>=20
>>> Use camcontrol zone to ask the drive to report empty zones:
>>>=20
>>> [root@sm4u-1 ~]# camcontrol zone da21 -v -c rz -o empty
>>> 1 zones, Maximum LBA 0x3a3812aaf (15628053167)
>>> Zone lengths and types may vary
>>> Start LBA Length WP LBA Zone Type Condition =
Sequential Reset
>>> 0x2000000, 524288, 0x2000000, Seq Preferred, Empty, =
Sequential, No Reset Needed
>>>=20
>>> Get information on a Host Aware drive:
>>>=20
>>> root@sm4u-1 ~]# diskinfo -v da21
>>> da21
>>> 512 # sectorsize
>>> 8001563222016 # mediasize in bytes (7.3T)
>>> 15628053168 # mediasize in sectors
>>> 4096 # stripesize
>>> 0 # stripeoffset
>>> 972801 # Cylinders according to firmware.
>>> 255 # Heads according to firmware.
>>> 63 # Sectors according to firmware.
>>> Z84008NY # Disk ident.
>>> enc@5003048001f311fd/elmtype@array_device/slot@22 # =
Physical path
>>> Host Aware # Zone Mode
>>>=20
>>> Get information on a drive managed drive:
>>>=20
>>> [root@sm4u-1 ~]# diskinfo -v da20
>>> da20
>>> 512 # sectorsize
>>> 8001563222016 # mediasize in bytes (7.3T)
>>> 15628053168 # mediasize in sectors
>>> 4096 # stripesize
>>> 0 # stripeoffset
>>> 972801 # Cylinders according to firmware.
>>> 255 # Heads according to firmware.
>>> 63 # Sectors according to firmware.
>>> Z8405938 # Disk ident.
>>> enc@5003048001f311fd/elmtype@array_device/slot@21 # =
Physical path
>>> Drive Managed # Zone Mode
>>>=20
>>> Get information on a non-zoned drive:
>>>=20
>>> [root@sm4u-1 ~]# diskinfo -v da4
>>> da4
>>> 512 # sectorsize
>>> 100030242816 # mediasize in bytes (93G)
>>> 195371568 # mediasize in sectors
>>> 0 # stripesize
>>> 0 # stripeoffset
>>> 12161 # Cylinders according to firmware.
>>> 255 # Heads according to firmware.
>>> 63 # Sectors according to firmware.
>>> 124903574F36 # Disk ident.
>>> enc@5003048001f311fd/elmtype@array_device/slot@5 # =
Physical path
>>> Not Zoned # Zone Mode
>>>=20
>>> Testing and comments are welcome.
>>>=20
>>> Thanks,
>>>=20
>>> Ken
>>> --=20
>>> Kenneth Merry
>>> ken@FreeBSD.ORG
>>=20
>> --=20
>> Kenneth Merry
>> ken@FreeBSD.ORG
>=20
> --=20
> Kenneth Merry
> ken@FreeBSD.ORG
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to =
"freebsd-scsi-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CF8C42C1-9024-429C-91F5-10A7175C1089>
