Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 09 Oct 2006 14:47:25 -0600
From:      Scott Long <scottl@samsco.org>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        freebsd-fs@freebsd.org, "Fluffles.net" <info@fluffles.net>, Kris Kennaway <kris@obsecurity.org>
Subject:   Re: 2 bonnies can stop disk activity permanently
Message-ID:  <452AB55D.9090607@samsco.org>
In-Reply-To: <20061010051216.G814@epsplex.bde.org>
References:  <45297DA2.4000509@fluffles.net> <20061010051216.G814@epsplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote:
> On Mon, 9 Oct 2006, Fluffles.net wrote:
> 
>> I'm the "veronica" Arne mentioned in the freebsd-fs mailinglist.
>> Regarding the effectiveness of a higher blocksize, these are my findings:
>>
>> areca RAID5 (8x da, 128KB stripe, default newfs, NCQ enabled)
>>              -------Sequential Output-------- ---Sequential Input--
>> --Random--
>>              -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
>> --Seeks---
>> Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
>> /sec %CPU
>> ARC8xR5  8480 119973 91.3 247178 58.6 67862 17.5 90426 86.9 172490 24.0
>> 120.7  0.5
>>
>> areca RAID5 (8x da, 128KB stripe, 64KB blocksize newfs, NCQ enabled)
>>              -------Sequential Output-------- ---Sequential Input--
>> --Random--
>>              -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
>> --Seeks---
>> Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
>> /sec %CPU
>> ARC8xR5  8480 128920 97.8 265920 58.9 116787 31.0 103261 97.8 392970
>> 53.8 119.8  0.6
>>
>> As you can see, the block read increased from ~172MB/s to ~392MB/s,
>> quite significant increase. Also the reqrite speed increased from
>> ~67MB/s to ~116MB/s.
>>
>> Ofcourse these tests are on a brand clean filesystem, which might not
>> tally with real-life crowded filesystems. But at least there is much
>> ...
> 
> 
> This is a bit surprising.  FreeBSD is supposed to cluster the i/o so
> that (especially for large files on new file systems) almost all i/o
> is done in blocks of size 64K or 128K.
> 
> I suspect the problems are that the 64K-block i/o is usually perfectly
> misaligned unless the fs itself has 64K-blocks and the fs's partition
> starts on a 64K-block boundary, and that some hardware or firmware
> (mainly RAIDs) want the blocks to be aligned.  I'm not very familiar
> with RAIDs but think it would take a fairly advanced/expensive one to
> reblock all the i/at so that the alignment doesn't matter.  It would
> take more advanced/complicated clustering code or better buffering code
> than FreeBSD has to do the reblocking at the clustering or buffering
> level.  Perhaps even 64K-blocks are too small with your RAID's stripe
> size of 128K.
> 
> Bruce

Yes, it's a well-known problem that the combination of 
fdisk+disklabel+ufs means that all FS blocks are mis-aligned in the 
worst way possible (blocks start on odd sector numbers).  This
_horribly_ pessimizes RAID-5 on most controllers.  Solving it reliably
and automatically is hard, though.  The filesystem ultimately needs to
know the physical sector that it starts on, and compensate accordingly.
You could cheat by having the disklabel tools always align partitions,
but the tool would still need to know the physical address of where it
starts in the slice.  Either way, something high up needs to get the
logical to physical translation of the sectors.

Suggestions have been made to just put blind offsets into the disklabel
tool that assumes the common case (mbr is present and is a known length,
and that the disklabel is in the first slice of the MBR).  Obviously,
this is only a crude hack.  I get around this right now by not using a
disklabel or fdisk table on arrays where I value speed.  For those, I
just put a filesystem directly on the array, and boot off of a small
system disk.

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?452AB55D.9090607>