Date: Wed, 07 Jan 2009 19:53:56 +0100 From: Ivan Voras <ivoras@freebsd.org> To: freebsd-geom@freebsd.org Subject: Re: performance problem with gstripe Message-ID: <gk2to6$6b5$1@ger.gmane.org> In-Reply-To: <F89B4A24-E877-47E1-89FD-94F1D91761CE@panasas.com> References: <4AD370A6-2226-442F-BD80-8CFD4045B094@panasas.com> <gk1664$g6d$2@ger.gmane.org> <CA9F4E99-5819-4FDB-B8F4-8AA21759942B@panasas.com> <gk19r9$p9m$1@ger.gmane.org> <F89B4A24-E877-47E1-89FD-94F1D91761CE@panasas.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig8912F83176CA91661B4DF7AB Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Joel Jacobson wrote: > still works badly at 64k, but works well if i use 32k (and have > kern.geom.stripe.fast=3D1). that being said, i was only seeing 64k I/O= > through ufs when i was doing the 256k stripe, so im still not sure why > this matters. You have robably stumbled on the group of problems collectively knows as "the MAXPHYS problem". This is what's happening: Many disk drivers in FreeBSD were first created when the controllers and the motherboards didn't support DMA larger than 64 kB. In addition to that there's a hard limit on IO request sizes set to 128 kB (the MAXPHYS kernel option) but which is not often reached. Thus, the maximum IO size that can reach a single drive is 64 kB and this limit is propagated in unclear ways back to UFS. If you have a stripe size larger or equal to then 64 kB then in no way can the IO request be split between two drives - you get the performance of a single drive. If the stripe size is smaller, the IO request can be split between the drives and you get better performance. All this discussion maps 1:1 to the "dd" utility accessing the raw device (/dev/something). In FreeBSD, raw device access is not buffered, so what the dd requests, the drive delivers, in exactly the same way it was requested, chopped into 64 kB pieces if needed. The reason why UFS is better is that it asynchronously fills a queue (bioq) with requests, which are sent to the device in the same way, asynchronously, so even if a single write cannot span multiple stripes, there will be many writes queued which can be done in parallel. This works upto a point, and still breaks down for high loads, large number of devices, really large stripe sizes etc. The problem is annoying but not serious if you know about it. It limits the sequential performance, but if you'd tried a random IO benchmark that can do parallel IO itself (try http://arctic.org/~dean/randomio/) on the device and uses small-ish block sizes, you'd probably find that you still get better performance. > i have a somewhat hidden agenda here, too, in that i have my own > filesystem that suffers the same problem im seeing with dd. i figured I'm interested in file systems so I'd be happy to test it for you. :) > there was something ufs does which i do not, and was trying to figure > out what that might be. it works fine on 4.6.2 using ccd and a 256k > stripe size [and i send 128k I/O requests, which is what i would prefer= > to see sent to the driver, rather than 64k]. I don't know how CCD works - maybe it can queue IO in parallel? Maybe 4.x still had cached block devices (they were thrown out at some point in time but I don't know when - see http://www.freebsd.org/doc/en/books/arch-handbook/driverbasics-block.html= )? I think there were so many changes in between 4.x and 8-CURRENT that you'll need to find someone who has worked specifically on VFS to explain exactly what is going on. Contact me if you need pointers. --------------enig8912F83176CA91661B4DF7AB Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAklk+kUACgkQldnAQVacBchJVwCgs0C2SuVd8GDV8NgOq5L46kKw YgwAoMqHe8kYSFZ0ISTsmrU+aNydaXvd =hKo+ -----END PGP SIGNATURE----- --------------enig8912F83176CA91661B4DF7AB--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?gk2to6$6b5$1>