From owner-freebsd-stable@FreeBSD.ORG  Fri Jun 19 07:30:39 2009
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DBF95106566C
	for <stable@freebsd.org>; Fri, 19 Jun 2009 07:30:38 +0000 (UTC)
	(envelope-from mandrews@bit0.com)
Received: from magnum.bit0.com (magnum.bit0.com [207.246.88.226])
	by mx1.freebsd.org (Postfix) with ESMTP id B2F3D8FC13
	for <stable@freebsd.org>; Fri, 19 Jun 2009 07:30:38 +0000 (UTC)
	(envelope-from mandrews@bit0.com)
Received: from localhost (localhost [127.0.0.1])
	by magnum.bit0.com (Postfix) with ESMTP id 170E29D23
	for <stable@freebsd.org>; Fri, 19 Jun 2009 03:30:38 -0400 (EDT)
X-Virus-Scanned: amavisd-new at bit0.com
Received: from magnum.bit0.com ([127.0.0.1])
	by localhost (magnum.int.bit0.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id VaYFY9tEMb0H for <stable@freebsd.org>;
	Fri, 19 Jun 2009 03:30:33 -0400 (EDT)
Received: from beast.int.bit0.com (beast.int.bit0.com [172.27.0.2])
	by magnum.bit0.com (Postfix) with ESMTP
	for <stable@freebsd.org>; Fri, 19 Jun 2009 03:30:33 -0400 (EDT)
Date: Fri, 19 Jun 2009 03:30:33 -0400 (EDT)
From: Mike Andrews <mandrews@bit0.com>
X-X-Sender: mandrews@beast.int.bit0.com
To: stable@freebsd.org
Message-ID: <alpine.BSF.2.00.0906190329420.41402@beast.int.bit0.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
Cc: 
Subject: weird problem w/ ZFS not reclaiming freed space
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jun 2009 07:30:39 -0000

Somehow I've managed to get ZFS on one of my machines into a state where 
it won't reclaim all space after deleting files AND snapshots off of it:
(this is with 7.2-STABLE amd64, compiled June 10)

# ls -la /weird
total 4
drwxr-x---   2 mysql  mysql     2 Jun 19 02:42 .
drwxr-xr-x  29 root   wheel  1024 Jun 19 02:44 ..

# df /weird
Filesystem   1K-blocks      Used     Avail Capacity  Mounted on
scotch/weird 282201472 109151232 173050240    39%    /weird

# zfs list scotch/weird
NAME           USED  AVAIL  REFER  MOUNTPOINT
scotch/weird   104G   164G   104G  /weird

# zfs list -t snapshot | grep scotch/weird

# zfs get all scotch/weird
NAME          PROPERTY              VALUE                  SOURCE
scotch/weird  type                  filesystem             -
scotch/weird  creation              Wed Jun 17  1:20 2009  -
scotch/weird  used                  104G                   -
scotch/weird  available             159G                   -
scotch/weird  referenced            104G                   -
scotch/weird  compressratio         1.00x                  -
scotch/weird  mounted               yes                    -
scotch/weird  quota                 none                   default
scotch/weird  reservation           none                   default
scotch/weird  recordsize            128K                   default
scotch/weird  mountpoint            /weird                 local
scotch/weird  sharenfs              off                    default
scotch/weird  checksum              on                     default
scotch/weird  compression           off                    default
scotch/weird  atime                 off                    local
scotch/weird  devices               on                     default
scotch/weird  exec                  off                    local
scotch/weird  setuid                off                    local
scotch/weird  readonly              off                    default
scotch/weird  jailed                off                    default
scotch/weird  snapdir               hidden                 default
scotch/weird  aclmode               groupmask              default
scotch/weird  aclinherit            restricted             default
scotch/weird  canmount              on                     default
scotch/weird  shareiscsi            off                    default
scotch/weird  xattr                 off                    temporary
scotch/weird  copies                1                      default
scotch/weird  version               3                      -
scotch/weird  utf8only              off                    -
scotch/weird  normalization         none                   -
scotch/weird  casesensitivity       sensitive              -
scotch/weird  vscan                 off                    default
scotch/weird  nbmand                off                    default
scotch/weird  sharesmb              off                    default
scotch/weird  refquota              none                   default
scotch/weird  refreservation        none                   default
scotch/weird  primarycache          all                    default
scotch/weird  secondarycache        all                    default
scotch/weird  usedbysnapshots       0                      -
scotch/weird  usedbydataset         104G                   -
scotch/weird  usedbychildren        0                      -
scotch/weird  usedbyrefreservation  0                      -


If I then rsync stuff to it, space seems OK, if I continue to rsync to it
every few hours, the used space grows, even if no snapshots are being taken
If I do take snapshots, then change stuff, then delete the snapshots, the
snapshot space does appear to be reclaimed.  Also if I 'zfs destroy' the
filesystem, the space is correctly reclaimed, but once I create a new one
and repeat the process, the problem reappears.

I have not had any luck reproducing this on another machine yet, but 
admittedly haven't tried super hard yet.

Scrubbing the zpool returns no errors.

I'm guessing zdb is my only hope at debugging this, but as I've never used 
it before and as it seems to dump core whenever I try running it, can 
someone suggest what I need to check/look for in it?

I did also have a panic a few days ago that, based on the text, might be 
related (I do have the vmdump and core.txt)

panic: solaris assert: P2PHASE(start, 1ULL << sm->sm_shift) == 0, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 146

...for which I have a vmdump and a core.txt if anyone wants to look at it.