Date: Thu, 24 Jan 2013 09:45:52 -0500 From: Zaphod Beeblebrox <zbeeble@gmail.com> To: Adam Nowacki <nowakpl@platinum.linux.pl> Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: ZFS regimen: scrub, scrub, scrub and scrub again. Message-ID: <CACpH0McdJOrCgNWCsRwqnO_AvzzzDCx5gQxJL1nvF%2B8=ysqwRg@mail.gmail.com> In-Reply-To: <51013345.8010701@platinum.linux.pl> References: <CACpH0Mf6sNb8JOsTzC%2BWSfQRB62%2BZn7VtzEnihEKmEV2aO2p%2Bw@mail.gmail.com> <alpine.BSF.2.00.1301211201570.9447@wojtek.tensor.gdynia.pl> <20130122073641.GH30633@server.rulingia.com> <alpine.BSF.2.00.1301232121430.1659@wojtek.tensor.gdynia.pl> <51013345.8010701@platinum.linux.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
Wow!.! OK. It sounds like you (or someone like you) can answer some of my burning questions about ZFS. On Thu, Jan 24, 2013 at 8:12 AM, Adam Nowacki <nowakpl@platinum.linux.pl>wrote: > Lets assume 5 disk raidz1 vdev with ashift=9 (512 byte sectors). > > A worst case scenario could happen if your random i/o workload was reading > random files each of 2048 bytes. Each file read would require data from 4 > disks (5th is parity and won't be read unless there are errors). However if > files were 512 bytes or less then only one disk would be used. 1024 bytes - > two disks, etc. > > So ZFS is probably not the best choice to store millions of small files if > random access to whole files is the primary concern. > > But lets look at a different scenario - a PostgreSQL database. Here table > data is split and stored in 1GB files. ZFS splits the file into 128KiB > records (recordsize property). This record is then again split into 4 > columns each 32768 bytes. 5th column is generated containing parity. Each > column is then stored on a different disk. You could think of it as a > regular RAID-5 with stripe size of 32768 bytes. > Ok... so my question then would be... what of the small files. If I write several small files at once, does the transaction use a record, or does each file need to use a record? Additionally, if small files use sub-records, when you delete that file, does the sub-record get moved or just wasted (until the record is completely free)? I'm considering the difference, say, between cyrus imap (one file per message ZFS, database files on different ZFS filesystem) and dbmail imap (postgresql on ZFS). ... now I realize that PostgreSQL on ZFS has some special issues (but I don't have a choice here between ZFS and non-ZFS ... ZFS has already been chosen), but I'm also figuring that PostgreSQL on ZFS has some waste compared to cyrus IMAP on ZFS. So far in my research, Cyrus makes some compelling arguments that the common use case of most IMAP database files is full scan --- for which it's database files are optimized and SQL-based files are not. I agree that some operations can be more efficient in a good SQL database, but full scan (as a most often used query) is not. Cyrus also makes sense to me as a collection of small files ... for which I expect ZFS to excel... including the ability to snapshot with impunity... but I am terribly curious how the files are handled in transactions. I'm actually (right now) running some filesize statistics (and I'll get back to the list, if asked), but I'd like to know how ZFS is going to store the arriving mail... :).
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACpH0McdJOrCgNWCsRwqnO_AvzzzDCx5gQxJL1nvF%2B8=ysqwRg>