Date: Thu, 06 Nov 2014 20:24:54 +0000 From: Steven Hartland <killing@multiplay.co.uk> To: Borja Marcos <borjam@sarenet.es> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS bug: was creating ZIL ignores vfs.zfs.min_auto_ashift, should be ZIL sets improper ashift with AHCI controllers Message-ID: <545BD916.5020609@multiplay.co.uk> In-Reply-To: <21D2A3A9-B6C1-458F-B17F-480251E999AE@sarenet.es> References: <B731A922-3F83-4D8E-A4EA-22C5CA8A3850@sarenet.es> <9C91F97841BC4347910F206618BAA3BB9AF327D1@PAIMAIL.pai.local> <545B76EF.6050709@multiplay.co.uk> <21D2A3A9-B6C1-458F-B17F-480251E999AE@sarenet.es>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------040906050104080702050904 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Something very strange going on. I have a boot pool (tank) and if I add ada1p3 (512b disk with min_auto_ashift = 12) to it as a log device zdb reports its ashift as 9. If I add the same device to another test pool (tpool) on the same machine it gets ashift 12. The attached dtrace script traces the calls and shows that vdev_ashift_optimize is correctly called and that the ashift of the vdev in both cases should be 12 according to the final vdev_config_generate call. More debugging required On 06/11/2014 14:58, Borja Marcos wrote: > On Nov 6, 2014, at 2:26 PM, Steven Hartland wrote: > >> That's not relevant as min when set should override the drives params > There is more to this than it seems, I just found more funny stuff. > > MY CONCLUSION IS: when creating a ZIL device, it behaves differently depending on the disk controller. It works with SAS, > and it doesn't work with AHCI. > > When using an AHCI controller, ZIL ignores *both* the 4K block quirk and the min_auto_ashift variables. Ashift is fixed to 9. It only > uses a different ashift when using a "nop" device. For example, I have tried with a 4 KB gnop device and this time it used the correct ashift, 12. > > When using a SAS controller, ZIL works perfectly with both. > > Seems quite odd to me. I am not running exactly the same version on both machines (the one with AHCI controllers is running -STABLE > from three days ago) and the one with the SAS controller is running 10.1-RC4. But the results should be the same. > > > > > > I've added the relevant quirk to ata_da.c and the SSD is now > properly "quirked": > > ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 > ada1: <INTEL SSDSA2CT040G3 4PC10362> ATA-8 SATA 2.x device > ada1: Serial Number PEPR408501DV040AGN > ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada1: Command Queueing enabled > ada1: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C) > ada1: quirks=0x1<4K> > > > But still something is wrong: > > EXAMPLE ONE: AHCI controller, min_auto_ashift with the default value of 9. > > The log child, has the wrong ashift, 9, regardless of the 4K quirk. > > children[1]: > type: 'disk' > id: 1 > guid: 2447450905312007897 > path: '/dev/ada1' > phys_path: '/dev/ada1' > whole_disk: 1 > metaslab_array: 0 > metaslab_shift: 0 > ashift: 9 > asize: 40015757312 > is_log: 1 > create_txg: 11741519 > > > EXAMPLE 2: AHCI controller, raise min_auto_ashift to 12 > > # sysctl vfs.zfs.min_auto_ashift=12 > vfs.zfs.min_auto_ashift: 9 -> 12 > > # zpool add rpool log ada1 > > And our log child still has the wrong ashift. > > children[1]: > type: 'disk' > id: 1 > guid: 17598938711972588792 > path: '/dev/ada1' > phys_path: '/dev/ada1' > whole_disk: 1 > metaslab_array: 0 > metaslab_shift: 0 > ashift: 9 > asize: 40015757312 > is_log: 1 > create_txg: 11741560 > > > > EXAMPLE 3: Doing the same as example one, but using a SAS controller (mps). > I haven't changed the min_auto_ashift. > > da3: <ATA Samsung SSD 840 BB0Q> Fixed Direct Access SCSI-6 device > da3: Serial Number S1D9NEADA08568E > da3: 600.000MB/s transfers > da3: Command Queueing enabled > da3: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) > da3: quirks=0x8<4K> > da1: <ATA Samsung SSD 840 BB0Q> Fixed Direct Access SCSI-6 device > da1: Serial Number S1D9NEADA08549F > da1: 600.000MB/s transfers > da1: Command Queueing enabled > da1: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) > da1: quirks=0x8<4K> > da2: <ATA Samsung SSD 840 BB0Q> Fixed Direct Access SCSI-6 device > da2: Serial Number S1D9NEADA08548T > da2: 600.000MB/s transfers > da2: Command Queueing enabled > da2: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) > da2: quirks=0x8<4K> > > > Now, we create a pool. I did this in two steps in order to reproduce my AHCI more accurately. > > # zpool create sample mirror da2 da3 > > and add a log device > > # zpool add sample log da1 > > And our log device uses the ashift... > > children[1]: > type: 'disk' > id: 1 > guid: 1327562712929751294 > path: '/dev/da1' > phys_path: '/dev/da1' > whole_disk: 1 > metaslab_array: 38 > metaslab_shift: 33 > ashift: 12 <=============== BINGO! 12!! > asize: 1000199946240 > is_log: 1 > create_txg: 7 > > > EXAMPLE 4: Same hardware as before, but I have compiled a "dequirked" kernel. The Samsung 840 SSD is now > detected with 512 byte sectors. > > # sysctl vfs.zfs.min_auto_ashift=12 > > # zpool create sample da2 da3 > > # zpool add sample log da1 > > # zdb > > sample: > version: 5000 > name: 'sample' > state: 0 > txg: 10 > pool_guid: 10244789911221894670 > hostid: 1065071139 > hostname: 'elibm' > vdev_children: 3 > vdev_tree: > type: 'root' > id: 0 > guid: 10244789911221894670 > create_txg: 4 > children[0]: > type: 'disk' > id: 0 > guid: 147759032286414284 > path: '/dev/da2' > phys_path: '/dev/da2' > whole_disk: 1 > metaslab_array: 37 > metaslab_shift: 33 > ashift: 12 > asize: 1000199946240 > is_log: 0 > create_txg: 4 > children[1]: > type: 'disk' > id: 1 > guid: 2632519121370708463 > path: '/dev/da3' > phys_path: '/dev/da3' > whole_disk: 1 > metaslab_array: 34 > metaslab_shift: 33 > ashift: 12 > asize: 1000199946240 > is_log: 0 > create_txg: 4 > children[2]: > type: 'disk' > id: 2 > guid: 10136980984141171426 > path: '/dev/da1' > phys_path: '/dev/da1' > whole_disk: 1 > metaslab_array: 39 > metaslab_shift: 33 > ashift: 12 <========= 12, ashift for the log device > asize: 1000199946240 > is_log: 1 > create_txg: 8 > features_for_read: > com.delphix:hole_birth > com.delphix:embedded_data > root@elibm:~ # > --------------040906050104080702050904 Content-Type: text/plain; charset=windows-1252; name="ashift.d" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="ashift.d" #!/usr/sbin/dtrace -s fbt::vdev_ashift_optimize:entry { vd = (vdev_t *)arg0; printf("vdev: %s, ashift: %d, physical_ashift: %d, top: %d, min: %d", vd->vdev_path ? stringof(vd->vdev_path) : "n/a", vd->vdev_ashift, vd->vdev_physical_ashift, vd == vd->vdev_top, `zfs_min_auto_ashift ); } fbt::vdev_config_generate:entry { vd = (vdev_t *)arg1; printf("vdev: %s, ashift: %d, physical_ashift: %d, top: %d, min: %d", vd->vdev_path ? stringof(vd->vdev_path) : "n/a", vd->vdev_ashift, vd->vdev_physical_ashift, vd == vd->vdev_top, `zfs_min_auto_ashift ); } fbt::vdev_ashift_optimize:return { printf("%x", arg0); } --------------040906050104080702050904--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?545BD916.5020609>