From owner-freebsd-stable@FreeBSD.ORG  Tue Sep 29 08:29:30 2009
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 682EC106566B
	for <freebsd-stable@freebsd.org>; Tue, 29 Sep 2009 08:29:30 +0000 (UTC)
	(envelope-from borjam@sarenet.es)
Received: from proxypop1.sarenet.es (proxypop1.sarenet.es [194.30.0.99])
	by mx1.freebsd.org (Postfix) with ESMTP id EB32E8FC0C
	for <freebsd-stable@freebsd.org>; Tue, 29 Sep 2009 08:29:29 +0000 (UTC)
Received: from [172.16.1.204] (izaro.sarenet.es [192.148.167.11])
	by proxypop1.sarenet.es (Postfix) with ESMTP id 8327E6207
	for <freebsd-stable@freebsd.org>; Tue, 29 Sep 2009 10:29:28 +0200 (CEST)
From: Borja Marcos <borjam@sarenet.es>
Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Date: Tue, 29 Sep 2009 10:29:26 +0200
Message-Id: <089F63A7-574B-4646-97C7-D82B226CD4CF@sarenet.es>
To: freebsd-stable@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1076)
X-Mailer: Apple Mail (2.1076)
Subject: 8.0RC1, ZFS: deadlock
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Sep 2009 08:29:30 -0000


Hello,

I have observed a deadlock condition when using ZFS. We are making a  
heavy usage of zfs send/zfs receive to keep a replica of a dataset on  
a remote machine. It can be done at one minute intervals. Maybe we're  
doing a somehow atypical usage of ZFS, but, well, seems to be a great  
solution to keep filesystem replicas once this is sorted out.


How to reproduce:

Set up two systems. A dataset with heavy I/O activity is replicated  
from the first to the second one. I've used a dataset containing /usr/ 
obj while I did a make buildworld.

Replicate the dataset from the first machine to the second one using  
an incremental send

zfs send -i pool/dataset@Nminus1 pool/dataset@N | ssh destination zfs  
receive -d pool

When there is read activity on the second system, reading the  
replicated system, I mean, having read access while zfs receive is  
updating it, there can be a deadlock. We have discovered this doing a  
test on a hopefully soon in production server, with 8 GB RAM. A Bacula  
backup agent was running and ZFS deadlocked.

I have set up a couple of VMWare Fussion virtual machines in order to  
test this, and it has deadlocked as well. The virtual machines have  
little memory, 512 MB, but I don't believe this is the actual problem.  
There is no complaint about lack of memory.

A running top shows processes stuck on "zfsvfs"

last pid:  2051;  load averages:  0.00,  0.07,  0.55    up 0+01:18:25   
12:05:48
37 processes:  1 running, 36 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 18M Active, 20M Inact, 114M Wired, 40K Cache, 59M Buf, 327M Free
Swap: 1024M Total, 1024M Free

   PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU  
COMMAND
  1914 root        1  62    0 11932K  2564K zfsvfs  0   0:51  0.00%  
bsdtar
  1093 borjam      1  44    0  8304K  2464K CPU1    1   0:32  0.00% top
  1913 root        1  54    0 11932K  2600K rrl->r  0   0:19  0.00%  
bsdtar
  1019 root        1  44    0 25108K  4812K select  0   0:05  0.00% sshd
  2008 root        1  76    0 13600K  1904K tx->tx  0   0:04  0.00% zfs
  1089 borjam      1  44    0 37040K  5216K select  1   0:04  0.00% sshd
   995 root        1  76    0  8252K  2652K pause   0   0:02  0.00% csh
   840 root        1  44    0 11044K  3828K select  1   0:02  0.00%  
sendmail
  1086 root        1  76    0 37040K  5156K sbwait  1   0:01  0.00% sshd
   850 root        1  44    0  6920K  1612K nanslp  0   0:01  0.00% cron
   607 root        1  44    0  5992K  1540K select  1   0:01  0.00%  
syslogd
  1090 borjam      1  76    0  8252K  2636K pause   1   0:01  0.00% csh
   990 borjam      1  44    0 37040K  5220K select  0   0:00  0.00% sshd
   985 root        1  48    0 37040K  5160K sbwait  1   0:00  0.00% sshd
   911 root        1  44    0  8252K  2608K ttyin   0   0:00  0.00% csh
   991 borjam      1  56    0  8252K  2636K pause   0   0:00  0.00% csh
   844 smmsp       1  46    0 11044K  3852K pause   0   0:00  0.00%  
sendmail

Interestingly, this has blocked access to all the filesystems. I  
cannot, for instance, ssh into the machine anymore, even though all  
the system-important filesystems are on  ufs, I was just using ZFS for  
a test.

Any ideas on what information might be useful to collect? I have the  
vmware machine right now. I've made a couple of VMWare snapshots of  
it, first before breaking into DDB with the deadlock just started, the  
second being into DDB (I've broken into DDB with sysctl).

Also, a copy of the VMWare virtual machine with snapshots is avaiable  
on request. Your choice ;)


Borja.