Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 Mar 2016 20:07:19 -0700
From:      Scott Long <scott4long@yahoo.com>
To:        "Kenneth D. Merry" <ken@FreeBSD.ORG>
Cc:        scsi@freebsd.org, current@freebsd.org
Subject:   Re: CAM Shingled Disk support patches available
Message-ID:  <CF8C42C1-9024-429C-91F5-10A7175C1089@yahoo.com>
In-Reply-To: <20160301224758.GA86834@mithlond.kdm.org>
References:  <20151118171309.GA3564@mithlond.kdm.org> <20160118223704.GA87506@mithlond.kdm.org> <20160301224758.GA86834@mithlond.kdm.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Ken,

I=E2=80=99m against changing the function signature of =
scsi_ata_pass_16().  Even
if you manage to get things right with symbol versioning, it still leads =
to
problems of code compatibility.  Maybe pre-existing binaries will work, =
but
source code will forever have to include an #if __FreeBSD_version <
xxxxxx bit of nonsense.

I agree that it was incorrect for dxferlen to be declared as a uint16_t.
However, the function already contains a sector count argument pair.  In
theory the sector count multiplied by the sector length, both of which =
the
application should know in order to arrive at a sensible dxferlen, can
substitute for the dxferlen argument.  If so, then we can just ignore =
that
argument and declare that sector_count has logical priority.

Really though, I think that scsi_ata_pass_16() is a crummy function.  If =
its
purpose is to implement SAT-3 12.2.2, it does an incredibly poor job at =
it:

- By my count, it only covers 12 of the available 13 registers.

- It has no 12 byte, opcode 0xa1 variant.

- It doesn=E2=80=99t make any allowance for providing the response =
registers to the
caller on completion.  Well, maybe it kinda does through a sense =
descriptor,
but=E2=80=A6. it=E2=80=99s kinda open to vague interpretation.

- Its use of the registers is clunky, assuming for example that you=E2=80=99=
ll only want
to fill the six LBA registers with a host-ordered 64-bit number.  There =
are
plenty of commands that re-use sub-parts of the LBA, features, and/or =
sector
count registers for different things. =20

I know you stated that you didn=E2=80=99t want to do this, but I think =
it=E2=80=99s better to start
over with a better function that has a better signature and a new name.  =
In fact,
I think it=E2=80=99s better to use the existing ata_cmd and ata_res =
structures from=20
sys/cam/ata/ata_all.h, provide accessors for the multi-byte registers if =
needed,
provide a 12-byte compatibility, and simply the signature.  Something =
like this:

void scsi_ata_pass(struct ccb_scsiio *csio, u_int32_t retries,
                      void (*cbfcnp)(struct cam_periph *, union ccb *),
                      u_int32_t flags, u_int8_t tag_action,
                      struct ata_cmd *cmd, struct ata_res *res,
                      u_int8_t *data_ptr, u_int32_t dxfer_len,
                      u_int8_t *data_ptr, u_int16_t dxfer_len,
                      u_int8_t sense_len, u_int32_t timeout);

To differentiate between the 12 and 16 byte variants, you=E2=80=99d look =
at the
AP_EXTEND flag in the protocol field.  Btw, the handling of that flag is
inconsistent in the implementation of the existing scsi_ata_pass_16().  =
If
the caller providse an ata_res pointer then it gets filled on =
completion,
otherwise the caller does its best to look at 12.2.2.6 and extract what =
it
can from the sense descriptor.

So my proposal is to create a new scsi_ata_pass and deprecate but not =
remove
scsi_ata_pass_16.  Tell people that if they need to use it, dxfer_len is =
going to
have lower priority than sector_count/sector_count_exp if the latter =
multiply to
more than 65535.

Scott



> On Mar 1, 2016, at 3:47 PM, Kenneth D. Merry <ken@FreeBSD.ORG> wrote:
>=20
> I have a new set of SMR patches available.  (See the original message =
below
> for a more detailed description of what these patches do.)
>=20
> The primary change is to add library versioning to libcam so that we =
can
> change the function prototype of scsi_ata_pass_16() in a way that =
won't
> break existing binaries.
>=20
> If someone more familiar with library versioning wants to review this, =
I'd
> appreciate it.
>=20
> The patches are here:
>=20
> FreeBSD/head, as of SVN revision 296278
>=20
> https://people.freebsd.org/~ken/cam_smr.head.20160301.1.txt
>=20
> stable/10, as of SVN revision 296248
>=20
> https://people.freebsd.org/~ken/cam_smr.stable10.20160301.1.txt
>=20
> (Note that although there is a stable/10 version of the patches, I'm =
not
> planning to merge them to stable/10 because of the change to struct =
bio.  I
> can't really figure out a good way to make that backward compatible.  =
If
> there is consensus that breaking it is fine because it isn't a user =
API,
> then that may be another story.)
>=20
> The problem is that the existing, in-tree version of =
scsi_ata_pass_16() has
> a dxfer_len argument that is a uint16_t.  That restricts transfer =
sizes to
> 64KB.  So, we need to update it to allow larger than 64K transfers.  I
> could just create a new function, but I'd rather just retire the =
broken
> version.
>=20
> The intent here is that:
>=20
> 1. Binaries built against the old version of libcam, before versioning =
was
> turned on, will get the old version of the scsi_ata_pass_16() function =
with
> a uint16_t dxfer_len.
>=20
> 2. Binaries built against the new version of libcam will get the new
> version of the scsi_ata_pass_16() function with a uint32_t dxfer_len.
>=20
> I've tested this, and it appears to work, but I'm not 100% certain =
this is
> all correct.  I looked at Dan Eischen's description of symbol =
versioning
> here:
>=20
> https://people.freebsd.org/~deischen/symver/freebsd_versioning.txt
>=20
> And it looks like the actual implementation is a little different than =
what
> is described there.  I looked around the tree, and didn't see anything =
that
> is obviously exactly like what I'm trying to do here.
>=20
> So, what I did is as follows:
>=20
> 1. For the kernel, the only change is to switch the dxfer_len argument =
from
> a uint16_t to a uint32_t.
>=20
> 2. For userland, in scsi_all.c, there are now two versions of
> scsi_ata_pass_16 -- _ver1 and _ver2.  _ver1 is aliased to
> scsi_ata_pass_16() for FBSD_1.3 using __sym_compat().  _ver2 is =
aliased to
> scsi_ata_pass_16() for FBSD_1.4 using __sym_default().
>=20
> 3. In lib/libcam/Versions.def, I defined FBSD_1.3 and FBSD_1.4, which
> depends on FBSD_1.3.
>=20
> 4. In lib/libcam/Symbol.map, I pulled out all of the functions defined =
in
> libcam, sorted them, and defined them in FBSD_1.3.  I moved
> scsi_ata_pass_16() to FBSD_1.4.  (According to the =
freebsd_versioning.txt
> paper linked above, I should have been able to have scsi_ata_pass_16() =
in
> both FBSD_1.3 and FBSD_1.4, but that isn't the case in practice.)
>=20
> In testing an old binary (linked against libcam without symbol =
versioning)
> against a new libcam (with symbol versioning), the old version of the
> function appears to be used.  With a new binary, the new version of =
the
> function appears to be used.
>=20
> So it looks like things work as intended, but I don't fully trust my
> understanding here.  So, if someone could take a look at the changes, =
I'd
> appreciate it.
>=20
> In particular, I have a few questions:
>=20
> 1. If this change to scsi_ata_pass_16() gets merged to stable/10 =
(apart
> from the larger SMR changes), what should be done with the libcam =
library
> version?
>=20
> 2. Are 1.3 and 1.4 the proper versions to use?
>=20
> 3. If we make additional CAM helper function library changes, when do =
the
> versions get bumped?  i.e., is this an opportunity to look for other
> library functions with issues and make changes if possible?
>=20
> 4. When you're going from an unversioned library to a versioned =
library,
> which version of a function gets linked in to a binary linked to the
> unversioned library when you run it against a versioned library?  In =
other
> words, what is supposed to happen in the test scenario I tried above, =
and
> am I really seeing what is supposed to happen?
>=20
> Thanks,
>=20
> Ken
>=20
> On Mon, Jan 18, 2016 at 17:37:04 -0500, Kenneth D. Merry wrote:
>> I have a new set of SMR patches available.  See below for the full
>> explanation.
>>=20
>> The primary change here is that I have added SMR support to the =
ada(4)
>> driver.  I spent some time considering whether to try to make the =
da(4) and
>> ada(4) probe infrastructure somewhat common, but in the end concluded =
it
>> would be too involved with not enough code reduction (if any) in the =
end.
>>=20
>> So, although the ideas are similar, the probe logic is separate.
>>=20
>> Note that NCQ support for SMR commands (Report Zones, Reset Write =
Pointer,
>> etc.) for SATA protocol shingled drives isn't active.  For both the =
da(4)
>> and ada(4) driver this is for lack of a way to plumb the ATA =
Auxiliary
>> register down to the drive.
>>=20
>> In the ada(4) case, we need to add the register to struct ccb_ataio =
and
>> add support in one or more of the underlying SATA drivers, e.g. =
ahci(4).
>>=20
>> In the da(4) case, it will require an update of the T-10 SAT spec to
>> provide a way to pass the Auxiliary register down via the SCSI ATA
>> PASS-THROUGH command, and then a subsquent update of the SAT layer in
>> various vendors' SAS controller firmware.  At that point, there may =
be=20
>> an official mapping of the SCSI ZBC commands to the ATA ZAC commands, =
and
>> we may be able to just issue the SCSI version of the commands instead =
of
>> composing ATA commands in the da(4) driver.  (We'll still need to =
keep the
>> ATA passthrough version for a while at least to support controllers =
that
>> don't have the updated translation code.)
>>=20
>> FreeBSD/head as of SVN revision 294105:
>>=20
>> https://people.freebsd.org/~ken/cam_smr.head.20160118.1.txt
>>=20
>> FreeBSD stable/10 as of SVN revision 294100:
>>=20
>> https://people.freebsd.org/~ken/cam_smr.stable10.20160118.1.txt
>>=20
>> Testing and comments are welcome.
>>=20
>> Ken
>>=20
>> On Wed, Nov 18, 2015 at 12:13:09 -0500, Kenneth D. Merry wrote:
>>>=20
>>> I have work in progress patches to add SMR (Shingled Magnetic =
Recording)
>>> support to CAM and GEOM here:
>>>=20
>>> FreeBSD/head as of SVN revision 290997:
>>>=20
>>> https://people.freebsd.org/~ken/cam_smr.head.20151117.1.txt
>>>=20
>>> FreeBSD stable/10 as of SVN revision 290995:
>>>=20
>>> https://people.freebsd.org/~ken/cam_smr.stable10.20151117.1.txt
>>>=20
>>> This includes support for Host Managed, Host Aware and Drive Managed =
SMR
>>> drives that are either SCSI (ZBC) or ATA (ZAC) attached via a SAS
>>> controller.  This does not include support for SMR ATA drives =
attched via
>>> an ATA controller.  Also, I have not yet figured out how to properly =
detect
>>> a Host Managed ATA drive, so this code won't do that.
>>>=20
>>> The big drive vendors are moving to SMR for at least some of their =
drives.
>>> The primary challenge with SMR is that it requires writing a =
relatively
>>> large zone sequentially starting at the beginning of the zone.  The =
usual
>>> zone size is 256MB.  It is conceptually almost like having a 256MB =
sector
>>> size.
>>>=20
>>> We (Spectra Logic) are working on ZFS changes that will use this CAM =
and
>>> GEOM infrastructure to make ZFS play well with SMR drives.  Those =
changes
>>> aren't yet done.
>>>=20
>>> The patches linked above include:
>>> o A new 'camcontrol zone' command that allows displaying and =
managing
>>>   drive zones via SCSI/ATA passthrough.
>>> o A new zonectl(8) utility that uses the new DIOCZONECMD ioctl to =
display
>>>   and manage zones via the da(4) (and later ada(4)) driver.
>>> o Changes to diskinfo -v to display the zone mode of a drive.
>>> o A new disk zone API, sys/sys/disk_zone.h.
>>> o A new bio type, BIO_ZONE, and modifications to GEOM to support it. =
 This
>>>   new bio will allow filesystems to query zone support in a drive =
and
>>>   manage zoned drives.
>>> o Extensive modifications to the da(4) driver to handle probing SCSI =
and
>>>   SATA behind SAS SMR drives.
>>> o Additional CAM CDB building functions for zone commands.
>>>=20
>>> The current issues that need to be addressed are:
>>> o The da(4) driver now has 6 additional probe states, 5 of which are
>>>   needed for probing ATA drives behind SAS controllers.  I have not =
yet
>>>   added support for BIO_ZONE bios to ada(4), but it will be very =
similar
>>>   to the da(4) driver version.  The ATA probe code needs to be =
pulled
>>>   out of the da(4) driver and changed into a form that will allow it =
to
>>>   work for either the ada(4) or da(4) driver.  Otherwise we'll have =
a fair
>>>   amount of code duplication between the two drivers.
>>>=20
>>> o There is a reasonable amount of code duplication between =
'camcontrol zone'
>>>   and zonectl(8).  This was done for speed / expediency's sake, but =
it may
>>>   be possible to make more of the code common there.
>>>=20
>>> o In order to add the new BIO_ZONE bio command, I had to change the =
bio_cmd
>>>   field in struct bio from a uint8_t to a uint16_t.  This will cause
>>>   binary compatibility problems with any 3rd party loadable modules.
>>>   Advice on how to handle this would be welcome.
>>>=20
>>> o In the process of developing these changes, I discovered that the
>>>   dxfer_len paramter for scsi_ata_pass_16() was too small (uint16_t, =
and
>>>   it needed to be uint32_t).  I increased it, but that will =
potentially
>>>   cause a binary incompatibility problem with any existing =
applications
>>>   that use the current API via libcam.  Advice on how to handle that
>>>   would be welcome.
>>>=20
>>> If you look through the code, you'll notice that the disk_zone.h API =
is
>>> separate from the SCSI and ATA APIs.  The intent is to allow =
filesystems
>>> and other consumers of the API to just talk to the disk zone API =
without
>>> dealing with the SCSI and ATA specifics.  Another reason behind all =
of this
>>> is that even though the SCSI ZBC and ATA ZAC specs were developed in
>>> concert, and are intended to be functionally identical, they are =
still SCSI
>>> and ATA.  As usual, SCSI is big endian and ATA is little endian.  So =
to
>>> present a common API to the filesystem, we give all of the zone data =
back
>>> in native byte order, regardless of the underlying device protocol.
>>>=20
>>> Another thing to note is the extensive use of ATA passthrough in the =
da(4)
>>> driver.  This is necessary because the SCSI SAT (SCSI to ATA =
Translation)
>>> specification has not yet caught up with translating SCSI zone =
commands
>>> (ZBC) to ATA zone commands (ZAC).  So, until the spec is updated and =
LSI
>>> and other vendors update their SCSI to ATA translation layers, we'll =
have
>>> to use the ATA version of the commands when talking to ATA drives =
via SAS
>>> controllers.
>>>=20
>>> I have only tested the code so far with Seagate SATA Drive Managed =
and Host
>>> Aware drives.  I would appreciate testing with any drives.  (And =
testing to
>>> make sure that the patches don't cause problems with existing =
hardware.)
>>> Right now, all you can really do is manage the zones manually using
>>> camcontrol(8) or zonectl(8).  Automatic management will come with =
the ZFS
>>> changes.  (Or changes to other filesysems if people want to do it.)
>>>=20
>>> If you have a SATA Host Aware drive, in theory camcontrol(8) should =
allow
>>> you to manage the drive if you have it attached to a SATA =
controller.
>>>=20
>>> Here is an example of some of the commands.
>>>=20
>>> Get general zoning information:
>>>=20
>>> [root@sm4u-1 ~]# zonectl -c params -d /dev/da21
>>> Zone Mode: Host Aware
>>> Command support: Report Zones, Open, Close, Finish, Reset Write =
Pointer
>>> Unrestricted Read in Sequential Write Required Zone (URSWRZ): No
>>> Optimal Number of Open Sequential Write Preferred Zones: 128
>>> Optimal Number of Non-Sequentially Written Sequential Write =
Preferred Zones: 8
>>> Maximum Number of Open Sequential Write Required Zones: Unlimited
>>>=20
>>> Look at information from the da(4) driver:
>>>=20
>>> [root@sm4u-1 ~]# sysctl kern.cam.da.21
>>> kern.cam.da.21.delete_method: NONE
>>> kern.cam.da.21.delete_max: 1081344
>>> kern.cam.da.21.minimum_cmd_size: 6
>>> kern.cam.da.21.sort_io_queue: -1
>>> kern.cam.da.21.zone_mode: Host Aware
>>> kern.cam.da.21.zone_support: Report Zones, Open, Close, Finish, =
Reset Write Pointer
>>> kern.cam.da.21.optimal_seq_zones: 128
>>> kern.cam.da.21.optimal_nonseq_zones: 8
>>> kern.cam.da.21.max_seq_zones: 4294967295
>>> kern.cam.da.21.error_inject: 0
>>>=20
>>> Display all of the zones with zonectl(8):
>>>=20
>>> [root@sm4u-1 ~]# zonectl -d /dev/da21 -c rz
>>> 29809 zones, Maximum LBA 0x3a3812aaf (15628053167)
>>> Zone lengths and types may vary
>>>  Start LBA  Length       WP LBA      Zone Type      Condition      =
Sequential             Reset
>>>          0, 524288,     0x80000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>    0x80000, 524288,    0x100000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x100000, 524288,    0x180000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x180000, 524288,    0x200000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x200000, 524288,    0x280000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x280000, 524288,    0x300000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x300000, 524288,    0x380000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x380000, 524288,    0x400000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x400000, 524288,    0x480000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x480000, 524288,    0x500000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x500000, 524288,    0x580000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x580000, 524288,    0x600000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x600000, 524288,    0x680000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x680000, 524288,    0x700000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>   0x700000, 524288,    0x780000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>> [ ... ]
>>> 0x1f00000, 524288,   0x1f80000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>  0x1f80000, 524288,   0x2000000,  Conventional,           NWP,     =
Sequential,  No Reset Needed
>>>  0x2000000, 524288,   0x2080000, Seq Preferred,          Full,     =
Sequential,  No Reset Needed
>>>  0x2080000, 524288,   0x2100000, Seq Preferred,          Full,     =
Sequential,  No Reset Needed
>>>  0x2100000, 524288,   0x2180000, Seq Preferred,          Full,     =
Sequential,  No Reset Needed
>>>  0x2180000, 524288,   0x2200000, Seq Preferred,          Full,     =
Sequential,  No Reset Needed
>>>  0x2200000, 524288,   0x2280000, Seq Preferred,          Full,     =
Sequential,  No Reset Needed
>>>  0x2280000, 524288,   0x2300000, Seq Preferred,          Full,     =
Sequential,  No Reset Needed
>>> [ ... ]
>>>=20
>>> Use camcontrol zone to reset the write pointer for the first =
shingled zone
>>> listed above:
>>>=20
>>> [root@sm4u-1 ~]# camcontrol zone da21 -v -c rwp -l 0x2000000
>>>=20
>>> Use camcontrol zone to ask the drive to report empty zones:
>>>=20
>>> [root@sm4u-1 ~]# camcontrol zone da21 -v -c rz -o empty
>>> 1 zones, Maximum LBA 0x3a3812aaf (15628053167)
>>> Zone lengths and types may vary
>>>  Start LBA  Length       WP LBA      Zone Type      Condition      =
Sequential             Reset
>>>  0x2000000, 524288,   0x2000000, Seq Preferred,         Empty,     =
Sequential,  No Reset Needed
>>>=20
>>> Get information on a Host Aware drive:
>>>=20
>>> root@sm4u-1 ~]# diskinfo -v da21
>>> da21
>>>        512             # sectorsize
>>>        8001563222016   # mediasize in bytes (7.3T)
>>>        15628053168     # mediasize in sectors
>>>        4096            # stripesize
>>>        0               # stripeoffset
>>>        972801          # Cylinders according to firmware.
>>>        255             # Heads according to firmware.
>>>        63              # Sectors according to firmware.
>>>                    Z84008NY    # Disk ident.
>>>        enc@5003048001f311fd/elmtype@array_device/slot@22       # =
Physical path
>>>        Host Aware      # Zone Mode
>>>=20
>>> Get information on a drive managed drive:
>>>=20
>>> [root@sm4u-1 ~]# diskinfo -v da20
>>> da20
>>>        512             # sectorsize
>>>        8001563222016   # mediasize in bytes (7.3T)
>>>        15628053168     # mediasize in sectors
>>>        4096            # stripesize
>>>        0               # stripeoffset
>>>        972801          # Cylinders according to firmware.
>>>        255             # Heads according to firmware.
>>>        63              # Sectors according to firmware.
>>>                    Z8405938    # Disk ident.
>>>        enc@5003048001f311fd/elmtype@array_device/slot@21       # =
Physical path
>>>        Drive Managed   # Zone Mode
>>>=20
>>> Get information on a non-zoned drive:
>>>=20
>>> [root@sm4u-1 ~]# diskinfo -v da4
>>> da4
>>>        512             # sectorsize
>>>        100030242816    # mediasize in bytes (93G)
>>>        195371568       # mediasize in sectors
>>>        0               # stripesize
>>>        0               # stripeoffset
>>>        12161           # Cylinders according to firmware.
>>>        255             # Heads according to firmware.
>>>        63              # Sectors according to firmware.
>>>        124903574F36            # Disk ident.
>>>        enc@5003048001f311fd/elmtype@array_device/slot@5        # =
Physical path
>>>        Not Zoned       # Zone Mode
>>>=20
>>> Testing and comments are welcome.
>>>=20
>>> Thanks,
>>>=20
>>> Ken
>>> --=20
>>> Kenneth Merry
>>> ken@FreeBSD.ORG
>>=20
>> --=20
>> Kenneth Merry
>> ken@FreeBSD.ORG
>=20
> --=20
> Kenneth Merry
> ken@FreeBSD.ORG
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to =
"freebsd-scsi-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CF8C42C1-9024-429C-91F5-10A7175C1089>