From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 07:19:37 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 15C5C106567B for ; Tue, 14 Jun 2011 07:19:37 +0000 (UTC) (envelope-from pvz@itassistans.se) Received: from zcs1.itassistans.net (zcs1.itassistans.net [212.112.191.37]) by mx1.freebsd.org (Postfix) with ESMTP id 84BBE8FC14 for ; Tue, 14 Jun 2011 07:19:35 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zcs1.itassistans.net (Postfix) with ESMTP id 8B912C0258 for ; Tue, 14 Jun 2011 09:19:34 +0200 (CEST) X-Virus-Scanned: amavisd-new at zcs1.itassistans.net Received: from zcs1.itassistans.net ([127.0.0.1]) by localhost (zcs1.itassistans.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0ZwwLVVtkbVo for ; Tue, 14 Jun 2011 09:19:34 +0200 (CEST) Received: from [192.168.1.239] (c213-89-160-61.bredband.comhem.se [213.89.160.61]) by zcs1.itassistans.net (Postfix) with ESMTPSA id 18E1DC01C5 for ; Tue, 14 Jun 2011 09:19:34 +0200 (CEST) From: Per von Zweigbergk Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Tue, 14 Jun 2011 09:19:32 +0200 Message-Id: <9544F7B9-E286-4266-86E3-B4D1A667CBBD@itassistans.se> To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Subject: Disk usage and ZFS deduplication X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 07:19:37 -0000 I've been following the "Impossible compression ratio on ZFS" thread = with some interest, and it made me ask myself this: Let us say we have a hypothetical zfs filesystem with the equally = hypothetical files A and B. The filesystem has deduplication enabled. = Both files have an apparent file size of 100 MB, but 50 MB of that data = is common between the two files and thus can be deduplicated. This would = mean that total disk usage would be 150 MB. If you use "du" to determine disk size for a deduplication, what would = be the result? Which file would the common data be accounted to? Or = would it be accounted to both files somehow, in part or in full?=