Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 07 Jan 2009 19:53:56 +0100
From:      Ivan Voras <ivoras@freebsd.org>
To:        freebsd-geom@freebsd.org
Subject:   Re: performance problem with gstripe
Message-ID:  <gk2to6$6b5$1@ger.gmane.org>
In-Reply-To: <F89B4A24-E877-47E1-89FD-94F1D91761CE@panasas.com>
References:  <4AD370A6-2226-442F-BD80-8CFD4045B094@panasas.com>	<gk1664$g6d$2@ger.gmane.org>	<CA9F4E99-5819-4FDB-B8F4-8AA21759942B@panasas.com>	<gk19r9$p9m$1@ger.gmane.org> <F89B4A24-E877-47E1-89FD-94F1D91761CE@panasas.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig8912F83176CA91661B4DF7AB
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Joel Jacobson wrote:
> still works badly at 64k, but works well if i use 32k (and have
> kern.geom.stripe.fast=3D1).  that being said, i was only seeing 64k I/O=

> through ufs when i was doing the 256k stripe, so im still not sure why
> this matters.

You have robably stumbled on the group of problems collectively knows as
"the MAXPHYS problem". This is what's happening: Many disk drivers in
FreeBSD were first created when the controllers and the motherboards
didn't support DMA larger than 64 kB. In addition to that there's a hard
limit on IO request sizes set to 128 kB (the MAXPHYS kernel option) but
which is not often reached. Thus, the maximum IO size that can reach a
single drive is 64 kB and this limit is propagated in unclear ways back
to UFS. If you have a stripe size larger or equal to then 64 kB then in
no way can the IO request be split between two drives - you get the
performance of a single drive. If the stripe size is smaller, the IO
request can be split between the drives and you get better performance.
All this discussion maps 1:1 to the "dd" utility accessing the raw
device (/dev/something). In FreeBSD, raw device access is not buffered,
so what the dd requests, the drive delivers, in exactly the same way it
was requested, chopped into 64 kB pieces if needed.

The reason why UFS is better is that it asynchronously fills a queue
(bioq) with requests, which are sent to the device in the same way,
asynchronously, so even if a single write cannot span multiple stripes,
there will be many writes queued which can be done in parallel. This
works upto a point, and still breaks down for high loads, large number
of devices, really large stripe sizes etc.

The problem is annoying but not serious if you know about it. It limits
the sequential performance, but if you'd tried a random IO benchmark
that can do parallel IO itself (try http://arctic.org/~dean/randomio/)
on the device and uses small-ish block sizes, you'd probably find that
you still get better performance.

> i have a somewhat hidden agenda here, too, in that i have my own
> filesystem that suffers the same problem im seeing with dd.  i figured

I'm interested in file systems so I'd be happy to test it for you. :)

> there was something ufs does which i do not, and was trying to figure
> out what that might be.  it works fine on 4.6.2 using ccd and a 256k
> stripe size [and i send 128k I/O requests, which is what i would prefer=

> to see sent to the driver, rather than 64k].

I don't know how CCD works - maybe it can queue IO in parallel? Maybe
4.x still had cached block devices (they were thrown out at some point
in time but I don't know when - see
http://www.freebsd.org/doc/en/books/arch-handbook/driverbasics-block.html=
)?
I think there were so many changes in between 4.x and 8-CURRENT that
you'll need to find someone who has worked specifically on VFS to
explain exactly what is going on. Contact me if you need pointers.



--------------enig8912F83176CA91661B4DF7AB
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklk+kUACgkQldnAQVacBchJVwCgs0C2SuVd8GDV8NgOq5L46kKw
YgwAoMqHe8kYSFZ0ISTsmrU+aNydaXvd
=hKo+
-----END PGP SIGNATURE-----

--------------enig8912F83176CA91661B4DF7AB--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?gk2to6$6b5$1>