From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 15 08:24:42 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id A10A0106566B
	for <freebsd-fs@freebsd.org>; Wed, 15 Aug 2012 08:24:42 +0000 (UTC)
	(envelope-from Karli.Sjoberg@slu.se)
Received: from Exchange2.ad.slu.se (exchange2.ad.slu.se [193.10.100.95])
	by mx1.freebsd.org (Postfix) with ESMTP id 054E08FC0A
	for <freebsd-fs@freebsd.org>; Wed, 15 Aug 2012 08:24:41 +0000 (UTC)
Received: from exmbx3.ad.slu.se ([193.10.100.93]) by Exchange2.ad.slu.se
	([193.10.100.95]) with mapi; Wed, 15 Aug 2012 10:24:40 +0200
From: =?iso-8859-1?Q?Karli_Sj=F6berg?= <Karli.Sjoberg@slu.se>
To: Hugo Lombard <hal@elizium.za.net>
Date: Wed, 15 Aug 2012 10:24:38 +0200
Thread-Topic: Hang when importing pool
Thread-Index: Ac16v2lpsoLTWUL3SXGZN8wSmNyv3g==
Message-ID: <A55DD110-E99D-4904-9C98-A65D6A455F9C@slu.se>
References: <D13A3EA7-B229-4B78-915E-A3CC3162DB8A@slu.se>
	<CAOjFWZ52XvMO+A7cwa3fnkJcXMCbGgWD91gvZsmW8Navh0AZ9A@mail.gmail.com>
	<49C9D08A-85EF-4D23-B07F-F3980CBA5A97@slu.se>
	<20120815073135.GO6757@squishy.elizium.za.net>
In-Reply-To: <20120815073135.GO6757@squishy.elizium.za.net>
Accept-Language: sv-SE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
acceptlanguage: sv-SE, en-US
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject: Re: Hang when importing pool
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Aug 2012 08:24:42 -0000


15 aug 2012 kl. 09.31 skrev Hugo Lombard:

On Wed, Aug 15, 2012 at 08:45:38AM +0200, Karli Sj=F6berg wrote:

I took your advice. I replaced my Core i5 with a Xeon X3470 and ramped
up the RAM to 32GB, maxing out the HW. Sadly enough, it still stalls
in the exact same manner:( This has to be the most frustrating thing
ever, since there=B4s tons of data there that I really need and if it
wasn=B4t for that stupid destroy operation, it would still be
accessible.

I feel that FreeBSD is partly to blame since it was completely
possible in the originating SUN machine with Solaris that only has
16GB RAM to do the same destroy to the same dataset without any
problem. Sure, it took forever and then some (about two weeks) but it
stayed afloat during the whole time.


Sorry to hear about your pain.

I've recently run into a similar problem where destroying a lot of
snapshots on de-duped filesystems caused two boxes (one a replica of the
other) to strangle itself.  After much stuggling, I opted to redo the
slave box, mounted the master box's pool readonly, and rsync'ed the
datasets across.

In retrospect, I shouldn't have deleted so many snapshots at once.
Boxes are both quad-core Opterons with 16GB RAM each.  On the newly
re-done box, I've decided not to use de-dupe.

In the process of searching for an answer I came across this thread:

 http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg47526.html

The person who noted the issue originally finally managed to recover
their pool with a loan machine from Oracle that had 120GB RAM:

 http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg47529.html

Personally, I don't think the problem is purely FreeBSD's fault.

Neither do I, I said "partly".

>From the link you sent me, quoted by a Mr Jim Klimov:

"According to my research (flushed out with the Jive Forums, so I'd
repeat here) it seems that (MY SPECULATION FOLLOWS):
1) some kernel module (probably related to ZFS) takes hold of more
and more RAM;
2) since it is kernel memory, it can not be swapped out;
3) since all RAM is depleted but there are requests for RAM allocation,
the kernel scans all allocated memory to find candidates for swapping
out (hence the high scanrate).
4) Since all RAM is now consumed by a BADLY DESIGNED kernel module
which can not be swapped out, the system dies in a high-scanrate
agony, because there is no RAM available to do anything. It can be
"pinged" for a while, but not much more.
I stress that the module is BADLY DESIGNED as it is in my current
running version of the OS (I don't know yet if it was fixed in
oi_151a), because probably it is trying to build the full ZFS
tree in its adressable memory - regardles of whether it can fit
there. IMHO the module should try to process the pool in smaller
chunks, or allow swapping out, if the hardware constraints like
insufficient RAM force it to."

Wow, repeated two times...


"Symptoms are like what you've described, including the huge scanrate
just before the system dies (becomes unresponsive). Also if you try running=
 with "vmstat 1" you can see that in the last few seconds of
uptime the system would go from several hundred free MBs (or even
over a GB free RAM) down to under 32Mb very quickly - consuming
hundreds of MBs per second."


These symptoms are exactly what I=B4m experiencing! Further down:

"However, with ZDB analysis I managed to find some counter of free
blocks - those which belonged to a killed dataset. Seems that at
first they are quickly marked for deletion (i.e. are not referenced
by any dataset, but are still in the ZFS block tree), and then
during pool's current uptime or further import attempts, these
blocks are actually walked and excluded from the ZFS tree.
In my case I saw that between reboots and import attempts this
counter went down by some 3 million blocks every uptime, and
after a couple of stressful weeks the destroyed dataset was gone
and the pool just worked on and on.

So if you still have this problem, try running ZDB to see if
deferred-free count is decreasing between pool import attempts:

# time zdb -bsvL -e <POOL-GUID-NUMBER>
...
976K 114G 113G 172G 180K 1.01 1.56 deferred free
..."

So hopefully if I just keep at it, maybe it solves itself. Right now, I=B4m=
 trying:
# zpool import -f -F -X id

as Marcelo Araujo suggested. See how long it=B4ll take before it stalls thi=
s time...


/Karli