From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 07:00:31 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16EEF1065670 for ; Fri, 17 Jun 2011 07:00:31 +0000 (UTC) (envelope-from marcus@odin.blazingdot.com) Received: from odin.blazingdot.com (odin.blazingdot.com [199.48.133.254]) by mx1.freebsd.org (Postfix) with ESMTP id F0AC78FC43 for ; Fri, 17 Jun 2011 07:00:30 +0000 (UTC) Received: by odin.blazingdot.com (Postfix, from userid 1001) id 641191140ED; Fri, 17 Jun 2011 06:45:22 +0000 (UTC) Date: Fri, 17 Jun 2011 06:45:22 +0000 From: Marcus Reid To: Per von Zweigbergk Message-ID: <20110617064522.GA91945@blazingdot.com> References: <9544F7B9-E286-4266-86E3-B4D1A667CBBD@itassistans.se> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9544F7B9-E286-4266-86E3-B4D1A667CBBD@itassistans.se> X-Coffee-Level: nearly-fatal User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Disk usage and ZFS deduplication X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 07:00:31 -0000 On Tue, Jun 14, 2011 at 09:19:32AM +0200, Per von Zweigbergk wrote: > I've been following the "Impossible compression ratio on ZFS" thread > with some interest, and it made me ask myself this: > > Let us say we have a hypothetical zfs filesystem with the equally > hypothetical files A and B. The filesystem has deduplication enabled. > Both files have an apparent file size of 100 MB, but 50 MB of that > data is common between the two files and thus can be deduplicated. > This would mean that total disk usage would be 150 MB. > > If you use "du" to determine disk size for a deduplication, what would > be the result? Which file would the common data be accounted to? Or > would it be accounted to both files somehow, in part or in > full? Pretty simple test. [root@luna /root]# zfs create -o mountpoint=/dedup -o dedup=on data/dedup [root@luna /usr/data]# dd if=/dev/urandom of=set_a_50MiB bs=1m count=50 [root@luna /usr/data]# dd if=/dev/urandom of=set_b_50MiB bs=1m count=50 [root@luna /usr/data]# dd if=/dev/urandom of=set_c_50MiB bs=1m count=50 [root@luna /usr/data]# cat set_a_50MiB set_b_50MiB > file_1 [root@luna /usr/data]# cat set_a_50MiB set_c_50MiB > file_2 [root@luna /usr/data]# cp file_1 /dedup [root@luna /usr/data]# cp file_2 /dedup [root@luna /usr/data]# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT data 101G 32.8G 68.2G 32% 1.33x ONLINE - [root@luna /usr/data]# cd /dedup [root@luna /dedup]# du -sk * 102479 file_1 102479 file_2 Marcus