Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 06 Nov 2014 20:24:54 +0000
From:      Steven Hartland <killing@multiplay.co.uk>
To:        Borja Marcos <borjam@sarenet.es>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS bug: was creating ZIL ignores vfs.zfs.min_auto_ashift, should be ZIL sets improper ashift with AHCI controllers
Message-ID:  <545BD916.5020609@multiplay.co.uk>
In-Reply-To: <21D2A3A9-B6C1-458F-B17F-480251E999AE@sarenet.es>
References:  <B731A922-3F83-4D8E-A4EA-22C5CA8A3850@sarenet.es> <9C91F97841BC4347910F206618BAA3BB9AF327D1@PAIMAIL.pai.local> <545B76EF.6050709@multiplay.co.uk> <21D2A3A9-B6C1-458F-B17F-480251E999AE@sarenet.es>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------040906050104080702050904
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit

Something very strange going on.

I have a boot pool (tank) and if I add ada1p3 (512b disk with 
min_auto_ashift = 12) to it as a log device zdb reports its ashift as 9.

If I add the same device to another test pool (tpool) on the same 
machine it gets ashift 12.

The attached dtrace script traces the calls and shows that 
vdev_ashift_optimize is correctly called and that the ashift of the vdev 
in both cases should be 12 according to the final vdev_config_generate call.

More debugging required

On 06/11/2014 14:58, Borja Marcos wrote:
> On Nov 6, 2014, at 2:26 PM, Steven Hartland wrote:
>
>> That's not relevant as min when set should override the drives params
> There is more to this than it seems, I just found more funny stuff.
>
> MY CONCLUSION IS: when creating a ZIL device, it behaves differently depending on the disk controller. It works with SAS,
> and it doesn't work with AHCI.
>
> When using an AHCI controller, ZIL ignores *both* the 4K block quirk and the min_auto_ashift variables. Ashift is fixed to 9. It only
> uses a different ashift when using a "nop" device. For example, I have tried with a 4 KB gnop device and this time it used the correct ashift, 12.
>
> When using a SAS controller, ZIL works perfectly with both.
>
> Seems quite odd to me. I am not running exactly the same version on both machines (the one with AHCI controllers is running -STABLE
> from three days ago) and the one with the SAS controller is running 10.1-RC4. But the  results should be the  same.
>
>
>
>
>
> I've added the relevant quirk to ata_da.c and the SSD is now
> properly "quirked":
>
> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
> ada1: <INTEL SSDSA2CT040G3 4PC10362> ATA-8 SATA 2.x device
> ada1: Serial Number PEPR408501DV040AGN
> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada1: Command Queueing enabled
> ada1: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C)
> ada1: quirks=0x1<4K>
>
>
> But still something is wrong:
>
> EXAMPLE ONE: AHCI controller, min_auto_ashift with the default value of 9.
>
> The log child, has the wrong ashift, 9, regardless of the 4K  quirk.
>
>         children[1]:
>              type: 'disk'
>              id: 1
>              guid: 2447450905312007897
>              path: '/dev/ada1'
>              phys_path: '/dev/ada1'
>              whole_disk: 1
>              metaslab_array: 0
>              metaslab_shift: 0
>              ashift: 9
>              asize: 40015757312
>              is_log: 1
>              create_txg: 11741519
>
>
> EXAMPLE 2: AHCI controller, raise min_auto_ashift to 12
>
> # sysctl vfs.zfs.min_auto_ashift=12
> vfs.zfs.min_auto_ashift: 9 -> 12
>
> # zpool add rpool log ada1
>
> And our log child still has the wrong ashift.
>
>          children[1]:
>              type: 'disk'
>              id: 1
>              guid: 17598938711972588792
>              path: '/dev/ada1'
>              phys_path: '/dev/ada1'
>              whole_disk: 1
>              metaslab_array: 0
>              metaslab_shift: 0
>              ashift: 9
>              asize: 40015757312
>              is_log: 1
>              create_txg: 11741560
>
>
>
> EXAMPLE 3: Doing the same as example one, but using a SAS controller (mps).
> I haven't changed the  min_auto_ashift.
>
> da3: <ATA Samsung SSD 840 BB0Q> Fixed Direct Access SCSI-6 device
> da3: Serial Number S1D9NEADA08568E
> da3: 600.000MB/s transfers
> da3: Command Queueing enabled
> da3: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
> da3: quirks=0x8<4K>
> da1: <ATA Samsung SSD 840 BB0Q> Fixed Direct Access SCSI-6 device
> da1: Serial Number S1D9NEADA08549F
> da1: 600.000MB/s transfers
> da1: Command Queueing enabled
> da1: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
> da1: quirks=0x8<4K>
> da2: <ATA Samsung SSD 840 BB0Q> Fixed Direct Access SCSI-6 device
> da2: Serial Number S1D9NEADA08548T
> da2: 600.000MB/s transfers
> da2: Command Queueing enabled
> da2: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
> da2: quirks=0x8<4K>
>
>
> Now, we create a pool. I did this in two steps in order to reproduce my AHCI more accurately.
>
> # zpool create sample mirror da2 da3
>
> and add a log device
>
> # zpool add sample log da1
>
> And our log device uses the ashift...
>
>          children[1]:
>              type: 'disk'
>              id: 1
>              guid: 1327562712929751294
>              path: '/dev/da1'
>              phys_path: '/dev/da1'
>              whole_disk: 1
>              metaslab_array: 38
>              metaslab_shift: 33
>              ashift: 12                            <=============== BINGO! 12!!
>              asize: 1000199946240
>              is_log: 1
>              create_txg: 7
>
>
> EXAMPLE 4: Same hardware as before, but I have compiled a "dequirked" kernel. The Samsung 840 SSD is now
> detected with 512 byte sectors.
>
> # sysctl vfs.zfs.min_auto_ashift=12
>
> # zpool create sample da2 da3
>
> # zpool add sample log da1
>
> # zdb
>
> sample:
>      version: 5000
>      name: 'sample'
>      state: 0
>      txg: 10
>      pool_guid: 10244789911221894670
>      hostid: 1065071139
>      hostname: 'elibm'
>      vdev_children: 3
>      vdev_tree:
>          type: 'root'
>          id: 0
>          guid: 10244789911221894670
>          create_txg: 4
>          children[0]:
>              type: 'disk'
>              id: 0
>              guid: 147759032286414284
>              path: '/dev/da2'
>              phys_path: '/dev/da2'
>              whole_disk: 1
>              metaslab_array: 37
>              metaslab_shift: 33
>              ashift: 12
>              asize: 1000199946240
>              is_log: 0
>              create_txg: 4
>          children[1]:
>              type: 'disk'
>              id: 1
>              guid: 2632519121370708463
>              path: '/dev/da3'
>              phys_path: '/dev/da3'
>              whole_disk: 1
>              metaslab_array: 34
>              metaslab_shift: 33
>              ashift: 12
>              asize: 1000199946240
>              is_log: 0
>              create_txg: 4
>          children[2]:
>              type: 'disk'
>              id: 2
>              guid: 10136980984141171426
>              path: '/dev/da1'
>              phys_path: '/dev/da1'
>              whole_disk: 1
>              metaslab_array: 39
>              metaslab_shift: 33
>              ashift: 12							<========= 12, ashift for the log device
>              asize: 1000199946240
>              is_log: 1
>              create_txg: 8
>      features_for_read:
>          com.delphix:hole_birth
>          com.delphix:embedded_data
> root@elibm:~ #
>


--------------040906050104080702050904
Content-Type: text/plain; charset=windows-1252;
 name="ashift.d"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="ashift.d"

#!/usr/sbin/dtrace -s

fbt::vdev_ashift_optimize:entry {
	vd = (vdev_t *)arg0;
	printf("vdev: %s, ashift: %d, physical_ashift: %d, top: %d, min: %d",
		vd->vdev_path ? stringof(vd->vdev_path) : "n/a",
		vd->vdev_ashift,
		vd->vdev_physical_ashift,
		vd == vd->vdev_top,
		`zfs_min_auto_ashift
	);
}
fbt::vdev_config_generate:entry {
	vd = (vdev_t *)arg1;
	printf("vdev: %s, ashift: %d, physical_ashift: %d, top: %d, min: %d",
		vd->vdev_path ? stringof(vd->vdev_path) : "n/a",
		vd->vdev_ashift,
		vd->vdev_physical_ashift,
		vd == vd->vdev_top,
		`zfs_min_auto_ashift
	);
}

fbt::vdev_ashift_optimize:return {
	printf("%x", arg0);
}


--------------040906050104080702050904--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?545BD916.5020609>