From owner-freebsd-fs@FreeBSD.ORG Thu Nov 6 14:58:52 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CE857DE5 for ; Thu, 6 Nov 2014 14:58:52 +0000 (UTC) Received: from cu01176a.smtpx.saremail.com (cu1176c.smtpx.saremail.com [195.16.148.151]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 670FBA30 for ; Thu, 6 Nov 2014 14:58:51 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop02.sare.net (Postfix) with ESMTPSA id B0E989DC7F9; Thu, 6 Nov 2014 15:58:47 +0100 (CET) Subject: Re: ZFS bug: was creating ZIL ignores vfs.zfs.min_auto_ashift, should be ZIL sets improper ashift with AHCI controllers Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Borja Marcos In-Reply-To: <545B76EF.6050709@multiplay.co.uk> Date: Thu, 6 Nov 2014 15:58:46 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <21D2A3A9-B6C1-458F-B17F-480251E999AE@sarenet.es> References: <9C91F97841BC4347910F206618BAA3BB9AF327D1@PAIMAIL.pai.local> <545B76EF.6050709@multiplay.co.uk> To: Steven Hartland X-Mailer: Apple Mail (2.1283) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Nov 2014 14:58:53 -0000 On Nov 6, 2014, at 2:26 PM, Steven Hartland wrote: > That's not relevant as min when set should override the drives params There is more to this than it seems, I just found more funny stuff.=20 MY CONCLUSION IS: when creating a ZIL device, it behaves differently = depending on the disk controller. It works with SAS, and it doesn't work with AHCI. When using an AHCI controller, ZIL ignores *both* the 4K block quirk and = the min_auto_ashift variables. Ashift is fixed to 9. It only uses a different ashift when using a "nop" device. For example, I have = tried with a 4 KB gnop device and this time it used the correct ashift, = 12. When using a SAS controller, ZIL works perfectly with both. Seems quite odd to me. I am not running exactly the same version on both = machines (the one with AHCI controllers is running -STABLE from three days ago) and the one with the SAS controller is running = 10.1-RC4. But the results should be the same. I've added the relevant quirk to ata_da.c and the SSD is now properly "quirked": ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: ATA-8 SATA 2.x device ada1: Serial Number PEPR408501DV040AGN ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C) ada1: quirks=3D0x1<4K> But still something is wrong: EXAMPLE ONE: AHCI controller, min_auto_ashift with the default value of = 9. The log child, has the wrong ashift, 9, regardless of the 4K quirk. children[1]: type: 'disk' id: 1 guid: 2447450905312007897 path: '/dev/ada1' phys_path: '/dev/ada1' whole_disk: 1 metaslab_array: 0 metaslab_shift: 0 ashift: 9 asize: 40015757312 is_log: 1 create_txg: 11741519 EXAMPLE 2: AHCI controller, raise min_auto_ashift to 12 # sysctl vfs.zfs.min_auto_ashift=3D12 vfs.zfs.min_auto_ashift: 9 -> 12 # zpool add rpool log ada1 And our log child still has the wrong ashift. children[1]: type: 'disk' id: 1 guid: 17598938711972588792 path: '/dev/ada1' phys_path: '/dev/ada1' whole_disk: 1 metaslab_array: 0 metaslab_shift: 0 ashift: 9 asize: 40015757312 is_log: 1 create_txg: 11741560 EXAMPLE 3: Doing the same as example one, but using a SAS controller = (mps). I haven't changed the min_auto_ashift. da3: Fixed Direct Access SCSI-6 device=20 da3: Serial Number S1D9NEADA08568E =20 da3: 600.000MB/s transfers da3: Command Queueing enabled da3: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da3: quirks=3D0x8<4K> da1: Fixed Direct Access SCSI-6 device=20 da1: Serial Number S1D9NEADA08549F =20 da1: 600.000MB/s transfers da1: Command Queueing enabled da1: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da1: quirks=3D0x8<4K> da2: Fixed Direct Access SCSI-6 device=20 da2: Serial Number S1D9NEADA08548T =20 da2: 600.000MB/s transfers da2: Command Queueing enabled da2: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da2: quirks=3D0x8<4K> Now, we create a pool. I did this in two steps in order to reproduce my = AHCI more accurately. # zpool create sample mirror da2 da3 and add a log device # zpool add sample log da1 And our log device uses the ashift... children[1]: type: 'disk' id: 1 guid: 1327562712929751294 path: '/dev/da1' phys_path: '/dev/da1' whole_disk: 1 metaslab_array: 38 metaslab_shift: 33 ashift: 12 <=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D BINGO! 12!! asize: 1000199946240 is_log: 1 create_txg: 7 EXAMPLE 4: Same hardware as before, but I have compiled a "dequirked" = kernel. The Samsung 840 SSD is now detected with 512 byte sectors. # sysctl vfs.zfs.min_auto_ashift=3D12 # zpool create sample da2 da3 # zpool add sample log da1 # zdb sample: version: 5000 name: 'sample' state: 0 txg: 10 pool_guid: 10244789911221894670 hostid: 1065071139 hostname: 'elibm' vdev_children: 3 vdev_tree: type: 'root' id: 0 guid: 10244789911221894670 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 147759032286414284 path: '/dev/da2' phys_path: '/dev/da2' whole_disk: 1 metaslab_array: 37 metaslab_shift: 33 ashift: 12 asize: 1000199946240 is_log: 0 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 2632519121370708463 path: '/dev/da3' phys_path: '/dev/da3' whole_disk: 1 metaslab_array: 34 metaslab_shift: 33 ashift: 12 asize: 1000199946240 is_log: 0 create_txg: 4 children[2]: type: 'disk' id: 2 guid: 10136980984141171426 path: '/dev/da1' phys_path: '/dev/da1' whole_disk: 1 metaslab_array: 39 metaslab_shift: 33 ashift: 12 = <=3D=3D=3D=3D=3D=3D=3D=3D=3D 12, ashift for the log device asize: 1000199946240 is_log: 1 create_txg: 8 features_for_read: com.delphix:hole_birth com.delphix:embedded_data root@elibm:~ #=20