From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 15 19:18:02 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1959110656C1
	for <freebsd-fs@freebsd.org>; Thu, 15 Oct 2009 19:18:02 +0000 (UTC)
	(envelope-from stef-list@memberwebs.com)
Received: from memberwebs.com (memberwebs.com [94.75.203.95])
	by mx1.freebsd.org (Postfix) with ESMTP id DA94B8FC23
	for <freebsd-fs@freebsd.org>; Thu, 15 Oct 2009 19:18:01 +0000 (UTC)
Received: from [172.27.5.159] (unknown [172.27.5.159])
	by memberwebs.com (Postfix) with ESMTP id 7761B83E4C8
	for <freebsd-fs@freebsd.org>; Thu, 15 Oct 2009 19:04:21 +0000 (UTC)
Message-ID: <4AD77230.3030803@memberwebs.com>
Date: Thu, 15 Oct 2009 14:04:16 -0500
From: Stef Walter <stef-list@memberwebs.com>
User-Agent: Thunderbird 2.0.0.23 (X11/20090817)
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Deadlock after canceled zfs recv
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: stef@memberwebs.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Oct 2009 19:18:02 -0000

I'm running the latest RELENG_8, and been doing some pre-production
stress testing.

I can deadlock (reproduceable) after a canceled ssh + zfs recv. Here's
how I reproduce the problem:

Do this in a new tank without any data in it. Reboot the system, and
make this the first zfs operations done. Files available here:

http://memberwebs.com/stef/misc/recv-snapshots-zfs-hang.tbz

Receive new file system, and then incremental snapshot:

# cat step-one | zfs recv tank/received
# cat step-two | zfs recv tank/received

At this point should look like:

# zfs list -t snapshot,filesystem | grep received
tank                   2.35G  16.4G    22K  /tank
tank/received           491M  16.4G   489M  /tank/received
tank/received@justnow  1.32M      -   160M  -
tank/received@later        0      -   489M  -

The third one goes through ssh. Count about three to five seconds (one
one thousand, two one thousand, three one thousand) and press Ctrl-C

# cat step-three | ssh localhost zfs recv tank/received

Execute the above 'zfs list' command, and more often than not, parts of
the zfs system are hung, and remain deadlocked until reboot.

If it doesn't happen the first time, try the step three + ctrl-c again.

When run through ssh, and ctrl-c cancelled it seems like 'zfs recv'
doesn't have time to do the cleanup that it normally does if run directly.

FreeBSD zfs8.ws.local 8.0-RC1 FreeBSD 8.0-RC1 #0: Wed Oct 14 16:04:50
UTC 2009     root@zfs8.ws.local:/usr/obj/usr/src/sys/GENERIC  i386

I'm available for any further information and want to help nail down
this bug.

Cheers,

Stef