From owner-freebsd-performance@FreeBSD.ORG Sun Oct 18 04:44:44 2009 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0FAD7106566C for ; Sun, 18 Oct 2009 04:44:44 +0000 (UTC) (envelope-from freebsd@sopwith.solgatos.com) Received: from sopwith.solgatos.com (pool-98-108-131-11.ptldor.fios.verizon.net [98.108.131.11]) by mx1.freebsd.org (Postfix) with ESMTP id E3BF28FC0A for ; Sun, 18 Oct 2009 04:44:42 +0000 (UTC) Received: by sopwith.solgatos.com (Postfix, from userid 66) id 82479B653; Sat, 17 Oct 2009 15:15:20 -0700 (PDT) Received: from localhost by sopwith.solgatos.com (8.8.8/6.24) id EAA21373; Sun, 18 Oct 2009 04:40:38 GMT Message-Id: <200910180440.EAA21373@sopwith.solgatos.com> To: freebsd-performance@freebsd.org In-reply-to: Your message of "Tue, 06 Oct 2009 18:03:16 +1100." <20091006174121.V25604@delplex.bde.org> Date: Sat, 17 Oct 2009 21:40:38 PDT From: Dieter Subject: Re: tuning FFS for large files Re: A specific example of a disk i/o problem X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Oct 2009 04:44:44 -0000 > > I found a clue! The problem occurs with my big data partitions, > > which are newfs-ed with options intended to improve things. > > > > Reading a large file from the normal ad4s5b partition only delays other > > commands slightly, as expected. Reading a large file from the tuned > > ad4s11 partition yields the delay of minutes for other i/o. > > ... > > Here is the newfs command used for creating large data partitions: > > newfs -e 57984 -b 65536 -f 8192 -g 67108864 -h 16 -i 67108864 -U -o time $partition > > Any block size above the default (16K) tends to thrash and fragment buffer > cache virtual memory. This is obviously a good pessimization with lots of > small files, and apparently, as you have found, it is a good pessimization > with a few large files too. I think severe fragmentation can easily take > several seconds to recover from. The worst case for causing fragmentaion > is probably a mixture of small and large files. Is there any way to avoid the "thrash and fragment buffercache virtual memory" problem other than keeping the block size 16K or smaller? > Some users fear fs consistency bugs with block sizes >= 16K, but I've never > seen them cause any bugs ecept performance ones. Yep, many TB of files on filesystems created with above newfs command and no corruption/consistency problems. > > And they have way more inodes than needed. (IIRC it doesn't actually > > use -i 67108864) > > It has to have at least 1 inode per cg, and may as well have a full block > of them, which gives a fairly large number of inodes especially if the > block size is too large (64K), so the -i ratio is limited. I converted a few filesystems to the default. In addition to losing space, fsck time went through the roof. So back to playing with newfs options. For some reason, larger block/frag sizes allow fewer cylinder groups, which reduces the number of inodes more than the larger block size increases it. From my reading of the newfs man page, -c only allows making cylinder groups smaller, not larger, and that appears to be the case in practice. default: newfs -U /dev/ad14s4 /dev/ad14s4: 431252.6MB (883205320 sectors) block size 16384, fragment size 2048 using 2348 cylinder groups of 183.72MB, 11758 blks, 23552 inodes. Filesystem 1M-blocks Used Avail Capacity iused ifree %iused Mounted on /dev/ad14s4 417678 0 384263 0% 2 55300092 0% fsck -fp: real 0m37.165s Attempt to reduce number of inodes: newfs -U -i 134217728 -g 134217728 -h 16 -e 261129 /dev/ad14s4 density reduced from 134217728 to 3676160 /dev/ad14s4: 431252.6MB (883205320 sectors) block size 16384, fragment size 2048 using 1923 cylinder groups of 224.38MB, 14360 blks, 64 inodes. Filesystem 1M-blocks Used Avail Capacity iused ifree %iused Mounted on /dev/ad14s4 431162 0 396669 0% 2 123068 0% fsck -fp: real 0m32.687s Bigger block size: newfs -U -i 134217728 -g 134217728 -h 16 -e 261129 -b 65536 /dev/ad14s4 increasing fragment size from 2048 to block size / 8 (8192) density reduced from 134217728 to 14860288 /dev/ad14s4: 431252.6MB (883205312 sectors) block size 65536, fragment size 8192 using 119 cylinder groups of 3628.00MB, 58048 blks, 256 inodes. Filesystem 1M-blocks Used Avail Capacity iused ifree %iused Mounted on /dev/ad14s4 431230 0 396731 0% 2 30460 0% fsck -fp: real 0m3.144s Bigger block size and bigger frag size: newfs -U -i 134217728 -g 134217728 -h 16 -e 261129 -b 65536 -f 65536 /dev/ad14s4 density reduced from 134217728 to 66846720 /dev/ad14s4: 431252.6MB (883205248 sectors) block size 65536, fragment size 65536 using 27 cylinder groups of 16320.56MB, 261129 blks, 512 inodes. Filesystem 1M-blocks Used Avail Capacity iused ifree %iused Mounted on /dev/ad14s4 431245 0 396745 0% 2 13820 0% fsck -fp: real 0m0.369s With -b 65536 -f 65536 I'm finally approaching a reasonable number of inodes (even less would be better). The fsck time varies by a factor of over 100, and results are roughly similar on filesystems with files in them.