FreeBSD Mail Archives

Date:      Mon, 7 Dec 2020 17:23:18 -0500
From:      Paul Mather <paul@gromit.dlib.vt.edu>
To:        freebsd-questions@freebsd.org
Cc:        tech-lists@zyxst.net
Subject:   Re: effect of differing spindle speeds on prospective zfs 	vdevs
Message-ID:  <BC1C2E79-82C1-43C1-A9E6-9762F170161B@gromit.dlib.vt.edu>
In-Reply-To: <mailman.101.1607256003.84014.freebsd-questions@freebsd.org>
References:  <mailman.101.1607256003.84014.freebsd-questions@freebsd.org>

On Sat, 5 Dec 2020 19:16:33 +0000, tech-lists <tech-lists@zyxst.net> =
wrote:

> Hi,
>=20
> On Sat, Dec 05, 2020 at 08:51:08AM -0500, Paul Mather wrote:
>> IIRC, ZFS pools have a single ashift for the entire pool, so you =
should=20
>> set it to accommodate the 4096/4096 devices to avoid performance=20
>> degradation.  I believe it defaults to that now, and should =
auto-detect=20
>> anyway.  But, in a mixed setup of vdevs like you have, you should be=20=

>> using ashift=3D12.
>>=20
>> I believe having an ashift=3D9 on your mixed-drive setup would have =
the=20
>> biggest performance impact in terms of reducing performance.
>=20
> Part of my confusion about the ashift thing is I thought ashift=3D9 =
was for
> 512/512 logical/physical. Is this still the case?
>=20
> On a different machine which has been running since FreeBSD12 was =
-current,
> one of the disks in the array went bang. zdb shows ashift=3D9 (as was =
default
> when it was created). The only available replacement was an otherwise=20=

> identical disk but 512 logical/4096 physical. zpool status mildly =
warns=20
> about preformance degradation like this:
>=20
> ada2    ONLINE       0     0     0  block size: 512B configured, 4096B =
native
>=20
>  state: ONLINE
> status: One or more devices are configured to use a non-native block =
size.
>      Expect reduced performance.
> action: Replace affected devices with devices that support the
>      configured block size, or migrate data to a properly configured
>      pool.
>=20
> The other part of my confusion is that I understood zfs to set its own=20=

> blocksize on the fly.

You're correct in that ZFS has its own concept of a block size (the =
"recordsize" property) but this is not the same as the block size =
concerning ashift.  When "zpool" complains about "non-native block size" =
it is talking about the physical block size of the underlying vdev.  =
That is the smallest unit of data that are read or written from the =
device.  (It also has an impact on where partitions can be addressed.)

When hard drives became larger the number of bits used to address =
logical blocks (LBAs) became insufficient to reference all blocks on the =
device.  One way around this, and to enable devices to store more total =
data, was to make the referenced blocks larger.  (Larger block sizes are =
also good in that they require relatively less space for ECC data.)  =
Hence, the 4K "advanced format" drives arrived.  Before that, block =
(a.k.a. sector) sizes typically had been 512 bytes for hard drives.  =
After, it became 4096 bytes.

For some drives, the device actually utilises 4096-byte sectors but =
advertises a 512-byte sector size to the outside world.  =46rom a read =
standpoint this doesn't create a problem.  It is when writing that you =
can incur performance issues.  This is because writing/updating a =
512-byte sector within a 4096-byte physical sector involves a =
read-modify-write operation: the original 4096-byte contents must be =
read, then the 512-byte subset updated, and finally the new 4096-byte =
whole re-written back to disk.  That involves more than simply writing a =
512-byte block as-is to a 512-byte sector.  (In similar fashion, =
partitions not aligned on a 4K boundary can incur performance =
degradation for 4096-byte physical sectors that advertise as 512-byte.)

> (I guess there must be some performance degradation but it's not
> yet enough for me to notice. Or it might only be noticable if low on =
space).

ZFS has a lot of caching, plus the use of ZIL "batches" writes, and all =
of this can ameliorate the effects of misaligned block sizes and =
partition boundaries.  (Large sequential writes are best for =
performance, especially in spinning disks that incur penalties for head =
movement and can incur rotational delays.)  But, if you have a =
write-intensive pool, you are unnecessarily causing yourself a =
performance hit by not using the correct ashift and/or partition =
boundaries.

BTW, low space mainly affects performance due to fragmentation.  It is a =
different issue vs. mismatched block size (ashift).

When I replaced my ashift=3D9 512-byte drives I eventually recreated the =
pool with ashift=3D12.  Using ashift=3D12 on pools with 512-byte sector =
size drives will not incur any performance penalty, which is why ashift =
defaults to 12 nowadays.  (I wouldn't be surprised if the default =
changes to ashift=3D13 due to the prevalence of SSDs these days.)

Cheers,

Paul.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BC1C2E79-82C1-43C1-A9E6-9762F170161B>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation