Date: Wed, 15 Jul 2015 15:06:21 -0700 From: John-Mark Gurney <jmg@funkthat.com> To: Dieter BSD <dieterbsd@gmail.com> Cc: freebsd-hackers@freebsd.org, freebsd-fs@freebsd.org Subject: Re: format/newfs larger external consumer drives Message-ID: <20150715220621.GP8523@funkthat.com> In-Reply-To: <CAA3ZYrB7i-Cjfv0UX1mb_RPmJdnj2LQw0apDd6%2B0fhKkrhH%2BPQ@mail.gmail.com> References: <CAA3ZYrB7i-Cjfv0UX1mb_RPmJdnj2LQw0apDd6%2B0fhKkrhH%2BPQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Dieter BSD wrote this message on Wed, Jul 15, 2015 at 10:37 -0700: > [ freebsd-fs@ added ] > > >> If the average filesize will be large, use large block/frag sizes. > >> I use 64 KiB / 8 KiB. And reduce the number of inodes. I reduce > >> inodes as much as newfs allows and there are still way too many. > > > > Can you think of an algorithmic way to express this? I.e., you don't > > want blocks to get *too* large as you risk greater losses in "partial > > fragments", etc. Likewise, you don't want to run out of inodes. > > I look at df -i for existing filesystems with similar filesizes. > My data filesystems usually get an entire disk (..., 2TB, 3TB, recently 5TB) > and with 64/8 block/frag and as few inodes as newfs will allow > df still reports numbers like 97% full but only using 0% or 1% > of inodes. > > density reduced from 67108864 to 14860288 > /dev/ada1: 4769307.0MB (9767541168 sectors) block size 65536, fragment size 8192 > using 1315 cylinder groups of 3628.00MB, 58048 blks, 256 inodes. > with soft updates > > I should take another look at increasing the size of cylinder groups. Right now the cg by default is made to fill a block... I don't believe it can be made larger without a major overhaul of the code... The default used to be even smaller than a full block causing even more cg's to be created and you had to do trial and error to figure out how to make a cg a full block... > Newfs likes very small cylinder groups, which made sense 30 years when > disks were like 40 MB and file sizes were a lot smaller. IIRC, each > cylinder group gets at least one block of inodes, and with file sizes > of 1-20 GB I get way too many inodes. This is partly the default number of inodes are too large... The current documented default is an inode for every 4 * frag_size bytes of data space, which isn't correct!!! This was changed to 2 in r228794 to keep the number of inodes the same when the transition from 16k/2k to 32k/4k happened, but the documentation was not updated... It has now been updated in r285615 and will be MFC'd... On my dev server where I have a few source trees checked out: Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/ada0s2d 185G 122G 48G 72% 2.8M 9.5M 23% /a This fs has non-standard config in that my frag size is 8k... If it was standard, I'd have twice as many inodes... Increaseing the frag size both cuts the # of inodes in half, but also increases the cg size... Standard: /dev/ada0s2d: 192068.0MB (393355264 sectors) block size 32768, fragment size 4096 using 307 cylinder groups of 626.09MB, 20035 blks, 80256 inodes. Non-standard: /dev/ada0s2d: 192068.0MB (393355264 sectors) block size 32768, fragment size 8192 using 166 cylinder groups of 1162.97MB, 37215 blks, 74496 inodes. The other thing I didn't realize (and would be useful for someone to benchmark) is that many SSD's now use 8k page size instead of the previous 4k.. Maybe this needs to be more of a sliding scale based upon disk size? Maybe go from 2 * frag to 4 * frag at fs's larger than 1TB? Though this is still something that a system admin needs to address, it's impossible to make the defaults sane for all use cases... There are some people that will only keep multi GB files on their 5 TB fs, and so only need a few thousand inodes, but others may keep more smaller files... It'd be nice to put together a fs survey to see what sizes of filesystems people have, and the distribution of files sizes... I'll try to do that... > Yes, a larger frag size will waste some space in the last frag of a file, > but having smaller block&frag sizes uses a lot of space to keep track of > all those blocks and frags. And makes more work for fsck. Yep... > > "risk" of loss/cost of recovery (when the medium > > *is* unceremoniously dismounted > > Some panics don't sync the disks. Sometimes disks just go into a coma. > Soft updates is supposed to limit problems to those that fsck -p will > automagicly fix. (assuming the disk's write cache is turned off) There > is at least one case where it does not. See PR 166499 (from 2012, > still not fixed). > > As long as I'm whining about unfixed filesystem PRs, see also > bin/170676: Newfs creates a filesystem that does not pass fsck. > (also from 2012) > > > I am concerned with the fact that users can so easily/carelessly "unplug" > > a USB device without the proper incantations beforehand. of course, *their* > > mistake is seen as a "product design flaw"! :-/ > > Superglue the cable in place? :-) > > Perhaps print up something like "Unmount filesystem(s) before unplugging > or powering off external disk, or you might lose your data.", > laminate it and attach it to the cables? Same problem goes for Windows.. They have a policy of turning of write buffering on pluggable thumb drives to help eliminate this.. For UFS, the sync flag should be provided to mount... [...] > Alternately, instead of panicing, could the filesystem just > umount -f the offending filesystem? (And whine to log(9).) > > I am very tired of having an entire machine panic just because > one disk decided to take a nap. This is not how you get 5 9s. :-( There has been lots of work to try to make file systems not panic when the underlying drives disappear, though clearly more work is needed... Patches welcome! :) -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150715220621.GP8523>