From owner-freebsd-hackers@FreeBSD.ORG Wed Mar 17 21:50:38 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B139D106575C for ; Wed, 17 Mar 2010 21:50:38 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id C816E8FC21 for ; Wed, 17 Mar 2010 21:50:37 +0000 (UTC) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.14.4/8.14.1) with ESMTP id o2HLoZxm070347; Wed, 17 Mar 2010 14:50:35 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.14.4/8.13.4/Submit) id o2HLoZxW070346; Wed, 17 Mar 2010 14:50:35 -0700 (PDT) Date: Wed, 17 Mar 2010 14:50:35 -0700 (PDT) From: Matthew Dillon Message-Id: <201003172150.o2HLoZxW070346@apollo.backplane.com> To: Thiago Damas References: <86tysf58a2.fsf@ds4.des.no> <367b2c981003171112n785ea9d4q21d00b533819ca67@mail.gmail.com> <201003172023.o2HKNNbj069321@apollo.backplane.com> <201003172111.o2HLBIgJ069873@apollo.backplane.com> Cc: freebsd-hackers@freebsd.org Subject: Re: ATA 4K sector issues X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Mar 2010 21:50:38 -0000 : There is a sysctl, md_compress, that I turned out in my tests, but not :working as expected. : Why using gnop -S 4096 works well? : :Thiago You are setting the sector size to 4K with gnop -S 4096 so presumably ZFS will not do any fragmented writes smaller than that. I'm not sure why that would matter except possibly for ZIL writes. In the case of ZIL if ZFS is using sector-sized writes (I don't know what it actually uses) then setting the sector size to 4K would be more efficient as the drive would not have to issue a read-before-write when the disk cache is flushed after the ZIL write. One important aspect of having the filesystem use a larger logical block size, such as 4K or 16K or 32K etc, is that the filesystem itself knows whether any trailing data is garbage or not and will avoid doing a read-before-write when writing small amounts of data. Most of the time if the filesystem is allocating space from its blockmap it knows the trailing data in the block is garbage and will zero it instead of performing a read-before-write. Also, the buffer cache covers hundreds of megabytes verses the hard drive cache which is typically only 8-64MB (though the OCZ Colosus has 128M). Still, this means the kernel will do a much better job write-combining than the drive. The drive has no knowledge of what is garbage and what is not at the drive level, so the moment this stuff moves out of the drive and into the kernel you reap rewards on these larger physical sector-sized drives. -Matt