Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 May 2020 15:47:21 +0000
From:      bugzilla-noreply@freebsd.org
To:        usb@FreeBSD.org
Subject:   [Bug 244356] Writing to a USB 3.0 stick is very slow
Message-ID:  <bug-244356-19105-tgB1f2NesP@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-244356-19105@https.bugs.freebsd.org/bugzilla/>
References:  <bug-244356-19105@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244356

--- Comment #75 from Olivier Certner <olivier.freebsd@free.fr> ---
I've tested again SD_128G, more thoroughly this time.

When copying '/usr' to some UFS partition, I have lots of stalls, and long =
ones
(~ 30s). Copying to an exFAT partition gives a similar iostat profile, but =
with
the difference that duration of stalls is ~74% of the total test duration,
whereas it is ~83% for the UFS test. Also, practically all exFAT accesses a=
re
through 128KiB transactions, whereas for UFS median and average of accesses=
 is
around 70KiB. Not taking into account stalls, average bandwidth is
approximately at 48MiB/s for exFAT and 25MiB/s for UFS (12.3MiB/s vs 4.3MiB=
/s
including stalls). For these two metrics, ratios between the two FSes are c=
lose
(1.82 and 1.89). A priori, I would have expected that the second ratio woul=
d be
lower than the first, because smaller transactions should be compensated in
part, I thought, by more transactions per second. But TPS for both tests is=
 not
that different (median: 440 vs. 466, average: 381 vs. 566, stalls not count=
ed).
More precisely, for a transaction average close to 128KiB/s for UFS, TPS is
around 220, wereas it is around 224 for exFAT (so no significant difference=
),
whereas bandwidth degrades a lot for smaller transactions, because there TPS
does not compensate at all (the computed equivalent bandwidth-preserving TPS
for 128KiB transactions becomes ~34 on average). The worse performance of U=
FS
cannot be attributed to possible stick "degradation" since the UFS test was
actually performed *before* the exFAT one.

Following up on Hans and Gary's responses, I was wondering how the controll=
er
could know which blocks of the sticks are actually used or not. For the rec=
ord,
I tried using `camcontrol sanitize` and `camcontrol security`, but none wor=
ked
(don't know whether it's because umass SCSI's XPT doesn't allow/implement i=
t,
or because they are explicitly not supported in the UAS standard, or because
the sticks don't implement them). So what else is possible? Some ideas: The
controller could note when sectors full of zeros are written, keep track of
that and free the underlying storage. Or, which I find dubious, the control=
ler
could interpret (ex)FAT metadata and thus know which blocks are actually us=
ed.
Without going that far, it could interpret writes to the MBR (or GPT) area =
to
know which partitions are currently allocated (but semantically, this is mo=
re
dangerous, so even more dubious, although simpler to implement).

>From additional experiments, I observe behavior compatible with the first
possibility (or some variant), and can also infirm the third one. For the f=
irst
one, I simply `dd` the whole stick with zeros. After that, a similar UFS `c=
p`
experiment shows an average bandwidth of 7.8MiB/s, so almost twice higher t=
han
the previously obtained 4.3MiB/s above. Transfer is stalled "only" for 71% =
of
the whole test duration. This also infirms the third, since I had done other
UFS tests before the one reported above, and no performance jump was observ=
ed
then.

What I'm working on now is to somehow make FreeBSD access the stick with bi=
gger
transactions, because of the results reported initially in this comment, wh=
ich
are compatible with the fact that the stick is handling badly small
transactions, and because MacOS uses 1MiB transactions (so manufacturers may
want to optimize for that; Linux does as well). Also, flash memory is
structured in blocks of several pages, and erasure can only happens at block
granularity, with current blocks being usually greater than 128KiB (I'm not=
 a
specialist, just read Wikipedia), so partial write may lead to full erasures
and rewrites, degrading performance.

I did a first experiment with UFS block size increased to 64KiB (which happ=
ens
to be the maximum value UFS currently accepts), and thus fragment size to 8=
KiB,
which I expect will increase the minimum (and average) transaction size. And
indeed, results are a bit better. I get 9.8MiB/s on average (vs. 7.8MiB/s w=
ith
default block size as reported above), and transfers are stalled "only" 56%=
 of
the time. Indeed, the minimum transaction size observed is now 8KiB instead=
 of
4KiB, but to my surprise, average and median transaction sizes actually
decrease a bit (51.5 and 25.6KiB vs. 49 and 22.8KiB; stalls omitted). So it
seems that it's the minimum transaction size that really matters, and someh=
ow
increasing it reduces stalls, which could hint indeed to limited/poor cachi=
ng
when writing to non-full blocks. Still, I observe transaction sizes that ca=
n be
any multiple of 8KiB up to 64KiB. I ran the same test a second time, after a
full `dd if=3D/dev/zero` on the whole stick, and got worse results (8.3MiB/=
s on
average, stalls ~63% of the time, 49.4KiB and 25.7KiB for average and median
transaction sizes), which are not much of a clear improvement over UFS with
default block size. So the 64KiB block size tests don't prove decisive to m=
e at
this point.

MAXPHYS limits us to 128KiB/s, but without touching it I tought I could at
least try to have all transactions be 128KiB, as happens with exFAT. So I
turned to GEOM_NOP and then GEOM_CACHE, thinking they could eventually do t=
he
job out of the box, but in fact they can't. GEOM_NOP allows to set the block
size, but then every I/O request's size must be a multiple of this block si=
ze,
so rising the block size simply makes 'newfs' to fail. GEOM_CACHE is closer=
 to
what I would like, but it only caches reads, not writes, plus, while testing
it, it deadlocked the whole GEOM subsystem. So I'm now understanding, fixing
and extending it, in the hope that I can finally force 128KiB transactions =
(at
the price of read-modify-write) and run more tests... All this, because I w=
ant
to see whether the speed of UFS on these sticks can significantly be improv=
ed
with software tuning to workaround potentially limited hardware.

For the RMA, guys from SanDisk want proof that copying is indeed slow when
using exFAT. Copying '/usr' to an exFAT partition happened at 12.3MiB/s (see
first paragraph), not very much superior to latest UFS tests, but this was
before `dd`ing the whole stick with zeros. So I guess I now have to do this
experiment again, to see how much I can get from exFAT. Not sure if the San=
Disk
guys will consider ~12MiB/s slow enough to warrant a replacement. I could
couple the argument with the "sticks don't respect standards" angle (prior
`usbtest`), but I doubt it will add much weight. Anyway, I find it amazing =
that
they can market write speeds up to 100MiB/s if we can reach only ~1/10th of=
 it
with (not so) small files.

Maurizio, S=C3=A9bastien, if you're interested, you can try to `dd if=3D/de=
v/zero`
your sticks and then retry your experiments (with UFS+SU), to see if you can
get any significant speed improvement. Also, it may be important to transfe=
r a
large quantity of data to see whether degradation/more frequent stalls inde=
ed
happen. Ideally, try to fill your sticks almost completely (which means very
long tests). In fact, I usually stopped UFS tests after around ~18GiB had b=
een
transferred (around 30min duration). I realize this may have biased most
results so far, so I'll try some bigger transfers later on. Anyway, I haven=
't
arguably done enough tests for the results to be statistically significant,=
 so
more testing may change current interpretations. Assuming the behavior of
SD_128G can indeed be understood of course.

I've kept all the raw iostat files used to produce the statistics in this p=
ost,
but have not attached them to the PR because there are several of them again
and I'm not sure that would be useful at this point. Feel free to ask if you
want to see them.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-244356-19105-tgB1f2NesP>