Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Mar 2010 14:50:35 -0700 (PDT)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Thiago Damas <tdamas@gmail.com>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: ATA 4K sector issues
Message-ID:  <201003172150.o2HLoZxW070346@apollo.backplane.com>
References:  <alpine.BSF.2.00.1003171114280.74067@mignon.ki.iif.hu>  <86tysf58a2.fsf@ds4.des.no> <alpine.BSF.2.00.1003171652260.74067@mignon.ki.iif.hu>  <f8e3d83f1003171034m5e75eae4r5e8b31d88d361d3b@mail.gmail.com>  <367b2c981003171112n785ea9d4q21d00b533819ca67@mail.gmail.com>  <f8e3d83f1003171117k20d553b7y7ce4c3c8ed2f5c96@mail.gmail.com>  <201003172023.o2HKNNbj069321@apollo.backplane.com> <201003172111.o2HLBIgJ069873@apollo.backplane.com> <f8e3d83f1003171417s60196803ra4884dae487edb9a@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

:  There is a sysctl, md_compress, that I turned out in my tests, but not
:working as expected.
:  Why using gnop -S 4096 works well?
:
:Thiago

    You are setting the sector size to 4K with gnop -S 4096 so presumably
    ZFS will not do any fragmented writes smaller than that.  I'm not
    sure why that would matter except possibly for ZIL writes.  In the
    case of ZIL if ZFS is using sector-sized writes (I don't know what it
    actually uses) then setting the sector size to 4K would be more
    efficient as the drive would not have to issue a read-before-write
    when the disk cache is flushed after the ZIL write.

    One important aspect of having the filesystem use a larger logical
    block size, such as 4K or 16K or 32K etc, is that the filesystem
    itself knows whether any trailing data is garbage or not and will
    avoid doing a read-before-write when writing small amounts of data.

    Most of the time if the filesystem is allocating space from its blockmap
    it knows the trailing data in the block is garbage and will zero it
    instead of performing a read-before-write.  Also, the buffer cache covers
    hundreds of megabytes verses the hard drive cache which is typically
    only 8-64MB (though the OCZ Colosus has 128M).  Still, this means the
    kernel will do a much better job write-combining than the drive.

    The drive has no knowledge of what is garbage and what is not at the
    drive level, so the moment this stuff moves out of the drive and into
    the kernel you reap rewards on these larger physical sector-sized drives.

						-Matt




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201003172150.o2HLoZxW070346>