Date: Mon, 7 Dec 2020 17:23:18 -0500 From: Paul Mather <paul@gromit.dlib.vt.edu> To: freebsd-questions@freebsd.org Cc: tech-lists@zyxst.net Subject: Re: effect of differing spindle speeds on prospective zfs vdevs Message-ID: <BC1C2E79-82C1-43C1-A9E6-9762F170161B@gromit.dlib.vt.edu> In-Reply-To: <mailman.101.1607256003.84014.freebsd-questions@freebsd.org> References: <mailman.101.1607256003.84014.freebsd-questions@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 5 Dec 2020 19:16:33 +0000, tech-lists <tech-lists@zyxst.net> = wrote: > Hi, >=20 > On Sat, Dec 05, 2020 at 08:51:08AM -0500, Paul Mather wrote: >> IIRC, ZFS pools have a single ashift for the entire pool, so you = should=20 >> set it to accommodate the 4096/4096 devices to avoid performance=20 >> degradation. I believe it defaults to that now, and should = auto-detect=20 >> anyway. But, in a mixed setup of vdevs like you have, you should be=20= >> using ashift=3D12. >>=20 >> I believe having an ashift=3D9 on your mixed-drive setup would have = the=20 >> biggest performance impact in terms of reducing performance. >=20 > Part of my confusion about the ashift thing is I thought ashift=3D9 = was for > 512/512 logical/physical. Is this still the case? >=20 > On a different machine which has been running since FreeBSD12 was = -current, > one of the disks in the array went bang. zdb shows ashift=3D9 (as was = default > when it was created). The only available replacement was an otherwise=20= > identical disk but 512 logical/4096 physical. zpool status mildly = warns=20 > about preformance degradation like this: >=20 > ada2 ONLINE 0 0 0 block size: 512B configured, 4096B = native >=20 > state: ONLINE > status: One or more devices are configured to use a non-native block = size. > Expect reduced performance. > action: Replace affected devices with devices that support the > configured block size, or migrate data to a properly configured > pool. >=20 > The other part of my confusion is that I understood zfs to set its own=20= > blocksize on the fly. You're correct in that ZFS has its own concept of a block size (the = "recordsize" property) but this is not the same as the block size = concerning ashift. When "zpool" complains about "non-native block size" = it is talking about the physical block size of the underlying vdev. = That is the smallest unit of data that are read or written from the = device. (It also has an impact on where partitions can be addressed.) When hard drives became larger the number of bits used to address = logical blocks (LBAs) became insufficient to reference all blocks on the = device. One way around this, and to enable devices to store more total = data, was to make the referenced blocks larger. (Larger block sizes are = also good in that they require relatively less space for ECC data.) = Hence, the 4K "advanced format" drives arrived. Before that, block = (a.k.a. sector) sizes typically had been 512 bytes for hard drives. = After, it became 4096 bytes. For some drives, the device actually utilises 4096-byte sectors but = advertises a 512-byte sector size to the outside world. =46rom a read = standpoint this doesn't create a problem. It is when writing that you = can incur performance issues. This is because writing/updating a = 512-byte sector within a 4096-byte physical sector involves a = read-modify-write operation: the original 4096-byte contents must be = read, then the 512-byte subset updated, and finally the new 4096-byte = whole re-written back to disk. That involves more than simply writing a = 512-byte block as-is to a 512-byte sector. (In similar fashion, = partitions not aligned on a 4K boundary can incur performance = degradation for 4096-byte physical sectors that advertise as 512-byte.) > (I guess there must be some performance degradation but it's not > yet enough for me to notice. Or it might only be noticable if low on = space). ZFS has a lot of caching, plus the use of ZIL "batches" writes, and all = of this can ameliorate the effects of misaligned block sizes and = partition boundaries. (Large sequential writes are best for = performance, especially in spinning disks that incur penalties for head = movement and can incur rotational delays.) But, if you have a = write-intensive pool, you are unnecessarily causing yourself a = performance hit by not using the correct ashift and/or partition = boundaries. BTW, low space mainly affects performance due to fragmentation. It is a = different issue vs. mismatched block size (ashift). When I replaced my ashift=3D9 512-byte drives I eventually recreated the = pool with ashift=3D12. Using ashift=3D12 on pools with 512-byte sector = size drives will not incur any performance penalty, which is why ashift = defaults to 12 nowadays. (I wouldn't be surprised if the default = changes to ashift=3D13 due to the prevalence of SSDs these days.) Cheers, Paul.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BC1C2E79-82C1-43C1-A9E6-9762F170161B>