From owner-freebsd-fs@FreeBSD.ORG Thu Nov 6 20:26:25 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EC202F43 for ; Thu, 6 Nov 2014 20:26:25 +0000 (UTC) Received: from mail-wi0-f170.google.com (mail-wi0-f170.google.com [209.85.212.170]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 79D9C2BF for ; Thu, 6 Nov 2014 20:26:24 +0000 (UTC) Received: by mail-wi0-f170.google.com with SMTP id r20so2570977wiv.3 for ; Thu, 06 Nov 2014 12:26:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type; bh=wpucUzsn8/2qLRv00HCe9zp3DV2xRVdnDZ0LIakz92c=; b=ZJynnGEmLPj5ZjUchk8ApoDNlERAQHopD26L2ybt7CugXkZbAVzoxALdkRlYY2pUPe mBc/+wiwsjERzVNhRxsJ0myBRwUFVTFJzMJo1J9l+9HM3m4ZB4WvSjoK5CbP++3QTQK6 XlKrQWW+ZYyHCQgCI8t/LLuzheqkvU4+1A8xuOltENpeC15UdZia6rYP6ODeeyqSMkVx YVfWSSCU7FQihjKQP3Kam05IQFtcTqpjJ+O1n8aZbuUG7XchmasnpXtJPdJrI9HORYVm SlUEEbMQ1hjEexWRSKcQQq/GGUqm+4WqtBnRy3L1ZlL3XaocVsdij31Jq2WzwP2emuHT dP1Q== X-Gm-Message-State: ALoCoQnEUxRHsM1Sz0t+HsOj3wtzE9PTSy+f/HGiPHgNXst4A+w5LnlpsfvCtqw0PZxNtG7C6SDi X-Received: by 10.194.110.161 with SMTP id ib1mr9675733wjb.78.1415305577505; Thu, 06 Nov 2014 12:26:17 -0800 (PST) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by mx.google.com with ESMTPSA id l10sm20823973wif.20.2014.11.06.12.26.16 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Nov 2014 12:26:16 -0800 (PST) Message-ID: <545BD916.5020609@multiplay.co.uk> Date: Thu, 06 Nov 2014 20:24:54 +0000 From: Steven Hartland User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Borja Marcos Subject: Re: ZFS bug: was creating ZIL ignores vfs.zfs.min_auto_ashift, should be ZIL sets improper ashift with AHCI controllers References: <9C91F97841BC4347910F206618BAA3BB9AF327D1@PAIMAIL.pai.local> <545B76EF.6050709@multiplay.co.uk> <21D2A3A9-B6C1-458F-B17F-480251E999AE@sarenet.es> In-Reply-To: <21D2A3A9-B6C1-458F-B17F-480251E999AE@sarenet.es> Content-Type: multipart/mixed; boundary="------------040906050104080702050904" Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Nov 2014 20:26:26 -0000 This is a multi-part message in MIME format. --------------040906050104080702050904 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Something very strange going on. I have a boot pool (tank) and if I add ada1p3 (512b disk with min_auto_ashift = 12) to it as a log device zdb reports its ashift as 9. If I add the same device to another test pool (tpool) on the same machine it gets ashift 12. The attached dtrace script traces the calls and shows that vdev_ashift_optimize is correctly called and that the ashift of the vdev in both cases should be 12 according to the final vdev_config_generate call. More debugging required On 06/11/2014 14:58, Borja Marcos wrote: > On Nov 6, 2014, at 2:26 PM, Steven Hartland wrote: > >> That's not relevant as min when set should override the drives params > There is more to this than it seems, I just found more funny stuff. > > MY CONCLUSION IS: when creating a ZIL device, it behaves differently depending on the disk controller. It works with SAS, > and it doesn't work with AHCI. > > When using an AHCI controller, ZIL ignores *both* the 4K block quirk and the min_auto_ashift variables. Ashift is fixed to 9. It only > uses a different ashift when using a "nop" device. For example, I have tried with a 4 KB gnop device and this time it used the correct ashift, 12. > > When using a SAS controller, ZIL works perfectly with both. > > Seems quite odd to me. I am not running exactly the same version on both machines (the one with AHCI controllers is running -STABLE > from three days ago) and the one with the SAS controller is running 10.1-RC4. But the results should be the same. > > > > > > I've added the relevant quirk to ata_da.c and the SSD is now > properly "quirked": > > ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 > ada1: ATA-8 SATA 2.x device > ada1: Serial Number PEPR408501DV040AGN > ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada1: Command Queueing enabled > ada1: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C) > ada1: quirks=0x1<4K> > > > But still something is wrong: > > EXAMPLE ONE: AHCI controller, min_auto_ashift with the default value of 9. > > The log child, has the wrong ashift, 9, regardless of the 4K quirk. > > children[1]: > type: 'disk' > id: 1 > guid: 2447450905312007897 > path: '/dev/ada1' > phys_path: '/dev/ada1' > whole_disk: 1 > metaslab_array: 0 > metaslab_shift: 0 > ashift: 9 > asize: 40015757312 > is_log: 1 > create_txg: 11741519 > > > EXAMPLE 2: AHCI controller, raise min_auto_ashift to 12 > > # sysctl vfs.zfs.min_auto_ashift=12 > vfs.zfs.min_auto_ashift: 9 -> 12 > > # zpool add rpool log ada1 > > And our log child still has the wrong ashift. > > children[1]: > type: 'disk' > id: 1 > guid: 17598938711972588792 > path: '/dev/ada1' > phys_path: '/dev/ada1' > whole_disk: 1 > metaslab_array: 0 > metaslab_shift: 0 > ashift: 9 > asize: 40015757312 > is_log: 1 > create_txg: 11741560 > > > > EXAMPLE 3: Doing the same as example one, but using a SAS controller (mps). > I haven't changed the min_auto_ashift. > > da3: Fixed Direct Access SCSI-6 device > da3: Serial Number S1D9NEADA08568E > da3: 600.000MB/s transfers > da3: Command Queueing enabled > da3: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) > da3: quirks=0x8<4K> > da1: Fixed Direct Access SCSI-6 device > da1: Serial Number S1D9NEADA08549F > da1: 600.000MB/s transfers > da1: Command Queueing enabled > da1: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) > da1: quirks=0x8<4K> > da2: Fixed Direct Access SCSI-6 device > da2: Serial Number S1D9NEADA08548T > da2: 600.000MB/s transfers > da2: Command Queueing enabled > da2: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) > da2: quirks=0x8<4K> > > > Now, we create a pool. I did this in two steps in order to reproduce my AHCI more accurately. > > # zpool create sample mirror da2 da3 > > and add a log device > > # zpool add sample log da1 > > And our log device uses the ashift... > > children[1]: > type: 'disk' > id: 1 > guid: 1327562712929751294 > path: '/dev/da1' > phys_path: '/dev/da1' > whole_disk: 1 > metaslab_array: 38 > metaslab_shift: 33 > ashift: 12 <=============== BINGO! 12!! > asize: 1000199946240 > is_log: 1 > create_txg: 7 > > > EXAMPLE 4: Same hardware as before, but I have compiled a "dequirked" kernel. The Samsung 840 SSD is now > detected with 512 byte sectors. > > # sysctl vfs.zfs.min_auto_ashift=12 > > # zpool create sample da2 da3 > > # zpool add sample log da1 > > # zdb > > sample: > version: 5000 > name: 'sample' > state: 0 > txg: 10 > pool_guid: 10244789911221894670 > hostid: 1065071139 > hostname: 'elibm' > vdev_children: 3 > vdev_tree: > type: 'root' > id: 0 > guid: 10244789911221894670 > create_txg: 4 > children[0]: > type: 'disk' > id: 0 > guid: 147759032286414284 > path: '/dev/da2' > phys_path: '/dev/da2' > whole_disk: 1 > metaslab_array: 37 > metaslab_shift: 33 > ashift: 12 > asize: 1000199946240 > is_log: 0 > create_txg: 4 > children[1]: > type: 'disk' > id: 1 > guid: 2632519121370708463 > path: '/dev/da3' > phys_path: '/dev/da3' > whole_disk: 1 > metaslab_array: 34 > metaslab_shift: 33 > ashift: 12 > asize: 1000199946240 > is_log: 0 > create_txg: 4 > children[2]: > type: 'disk' > id: 2 > guid: 10136980984141171426 > path: '/dev/da1' > phys_path: '/dev/da1' > whole_disk: 1 > metaslab_array: 39 > metaslab_shift: 33 > ashift: 12 <========= 12, ashift for the log device > asize: 1000199946240 > is_log: 1 > create_txg: 8 > features_for_read: > com.delphix:hole_birth > com.delphix:embedded_data > root@elibm:~ # > --------------040906050104080702050904 Content-Type: text/plain; charset=windows-1252; name="ashift.d" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="ashift.d" #!/usr/sbin/dtrace -s fbt::vdev_ashift_optimize:entry { vd = (vdev_t *)arg0; printf("vdev: %s, ashift: %d, physical_ashift: %d, top: %d, min: %d", vd->vdev_path ? stringof(vd->vdev_path) : "n/a", vd->vdev_ashift, vd->vdev_physical_ashift, vd == vd->vdev_top, `zfs_min_auto_ashift ); } fbt::vdev_config_generate:entry { vd = (vdev_t *)arg1; printf("vdev: %s, ashift: %d, physical_ashift: %d, top: %d, min: %d", vd->vdev_path ? stringof(vd->vdev_path) : "n/a", vd->vdev_ashift, vd->vdev_physical_ashift, vd == vd->vdev_top, `zfs_min_auto_ashift ); } fbt::vdev_ashift_optimize:return { printf("%x", arg0); } --------------040906050104080702050904--