From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 24 14:45:55 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 08BD6567; Thu, 24 Jan 2013 14:45:55 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-lb0-f178.google.com (mail-lb0-f178.google.com [209.85.217.178]) by mx1.freebsd.org (Postfix) with ESMTP id 5F4F9BE9; Thu, 24 Jan 2013 14:45:54 +0000 (UTC) Received: by mail-lb0-f178.google.com with SMTP id n1so4755178lba.23 for ; Thu, 24 Jan 2013 06:45:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=oRSFPzW30F41pK0pa7TGi3bqr8iGepeaoQzU5EOC5QE=; b=LBeEA0fN2+Yjzl4ytf9aDe/LDyMReEDJj2P0dU1tDdC6Nznu7so6zEznfhuQ306/Wx GQaOCHvygKB+RDW2ryYWqKPpjXxEHTeqcFtECaS9Tx9jHu2gihaOWonkG1qJw0S2xVWY pIlgdA1AkmfZslHIdLFgg7oK/vLXNaG8zHh2g9ULD5KB4m3uxs+lhSRVych6Ai495LYR TIGrg49KWkUZozFpnOkfKO8qaq3LfIbU8K73DoDuXnRO9G7oVxIP2vEw2BWtpVAH9KaD jvij6YquNyP1BUg2zhxdHTKIshNQesQP2IgktcbzNCEzZakGISB7YLAj9O0IYZfmfs99 UA3A== MIME-Version: 1.0 X-Received: by 10.112.38.67 with SMTP id e3mr872339lbk.105.1359038753054; Thu, 24 Jan 2013 06:45:53 -0800 (PST) Received: by 10.112.6.38 with HTTP; Thu, 24 Jan 2013 06:45:52 -0800 (PST) In-Reply-To: <51013345.8010701@platinum.linux.pl> References: <20130122073641.GH30633@server.rulingia.com> <51013345.8010701@platinum.linux.pl> Date: Thu, 24 Jan 2013 09:45:52 -0500 Message-ID: Subject: Re: ZFS regimen: scrub, scrub, scrub and scrub again. From: Zaphod Beeblebrox To: Adam Nowacki Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jan 2013 14:45:55 -0000 Wow!.! OK. It sounds like you (or someone like you) can answer some of my burning questions about ZFS. On Thu, Jan 24, 2013 at 8:12 AM, Adam Nowacki wrote: > Lets assume 5 disk raidz1 vdev with ashift=9 (512 byte sectors). > > A worst case scenario could happen if your random i/o workload was reading > random files each of 2048 bytes. Each file read would require data from 4 > disks (5th is parity and won't be read unless there are errors). However if > files were 512 bytes or less then only one disk would be used. 1024 bytes - > two disks, etc. > > So ZFS is probably not the best choice to store millions of small files if > random access to whole files is the primary concern. > > But lets look at a different scenario - a PostgreSQL database. Here table > data is split and stored in 1GB files. ZFS splits the file into 128KiB > records (recordsize property). This record is then again split into 4 > columns each 32768 bytes. 5th column is generated containing parity. Each > column is then stored on a different disk. You could think of it as a > regular RAID-5 with stripe size of 32768 bytes. > Ok... so my question then would be... what of the small files. If I write several small files at once, does the transaction use a record, or does each file need to use a record? Additionally, if small files use sub-records, when you delete that file, does the sub-record get moved or just wasted (until the record is completely free)? I'm considering the difference, say, between cyrus imap (one file per message ZFS, database files on different ZFS filesystem) and dbmail imap (postgresql on ZFS). ... now I realize that PostgreSQL on ZFS has some special issues (but I don't have a choice here between ZFS and non-ZFS ... ZFS has already been chosen), but I'm also figuring that PostgreSQL on ZFS has some waste compared to cyrus IMAP on ZFS. So far in my research, Cyrus makes some compelling arguments that the common use case of most IMAP database files is full scan --- for which it's database files are optimized and SQL-based files are not. I agree that some operations can be more efficient in a good SQL database, but full scan (as a most often used query) is not. Cyrus also makes sense to me as a collection of small files ... for which I expect ZFS to excel... including the ability to snapshot with impunity... but I am terribly curious how the files are handled in transactions. I'm actually (right now) running some filesize statistics (and I'll get back to the list, if asked), but I'd like to know how ZFS is going to store the arriving mail... :).