Date: Mon, 30 Nov 2009 18:31:49 -0600 From: "James R. Van Artsdalen" <james-freebsd-fs2@jrv.org> To: Andrew Snow <andrew@modulus.org> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS guidelines - preparing for future storage expansion Message-ID: <4B1463F5.5020403@jrv.org> In-Reply-To: <4B14495E.7050306@modulus.org> References: <2ae8edf30911300120x627e42a9ha2cf003e847d4fbd@mail.gmail.com> <4B139AEB.8060900@jrv.org> <2ae8edf30911300425g4026909bm9262f6abcf82ddcd@mail.gmail.com> <5f67a8c40911301233s46a2818at9051c4ebbacf7e25@mail.gmail.com> <4B14495E.7050306@modulus.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Snow wrote:
> Currently there is no "read-ahead" for scrubbing and resilvering, so
> it only talks to one disk and at a time and proceeds using only about
> half the I/O capacity of your disks (or less). Read-ahead is one of
> the planned features for ZFS next year.
gstat sometimes shows multiple outstanding I/O requests to a drive
during a scrub. All of the disks are lit up at the same time: there's no
one-disk-at-a-time.
I see roughly 500 MB/sec during a scrub, which is around 50% of the
theoretical bandwidth of both the disk-to-HBA links and the
HBA-to-system slot in my case. I hope to be able to fix both this
spring and see if I can reach gigabyte-per-second levels, especially for
userland reads (I've seen 420 MB/s so far).
(each vdev in my case is a 2-way mirror so 500 MB/s of disk is 250 MB/s
of user data)
> Also, when your disks are 98% or more full and you are doing any
> writes at all ZFS spends a long time looking for free blocks with an
> inefficient algorithm. An improved "disk full" algorithm is also
> planned for next year.
As the disk approaches 100% capacity the free space list(s) become
shorter, not longer. It's fragmentation, or the need to search a long
time for a large block in the right area, that is likely the problem.
If you can accept the block at the head of the list there is no search
at all.
A quick snapshot during a scrub:
dT: 1.006s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 238 238 30525 4.4 0 0 0.0 34.8| ada2
0 235 235 30016 3.1 0 0 0.0 26.3| ada3
0 274 274 35104 3.7 0 0 0.0 35.4| ada4
0 277 277 35485 4.0 0 0 0.0 40.5| ada5
0 273 273 34976 2.9 0 0 0.0 29.4| ada6
4 270 270 34474 7.2 0 0 0.0 53.4| ada7
0 271 271 34722 3.2 0 0 0.0 32.4| ada8
5 270 270 34410 3.4 0 0 0.0 34.0| ada9
7 268 268 34277 5.8 0 0 0.0 43.6| ada10
0 267 267 34213 4.3 0 0 0.0 32.1| ada11
4 269 269 34468 5.7 0 0 0.0 41.6| ada12
7 268 268 34277 4.7 0 0 0.0 33.4| ada13
0 277 277 35421 5.1 0 0 0.0 36.5| ada14
4 270 270 34595 5.4 0 0 0.0 37.5| ada15
0 269 269 34468 6.3 0 0 0.0 43.9| ada16
0 275 275 35167 6.2 0 0 0.0 44.9| ada17
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B1463F5.5020403>
