From owner-freebsd-stable@FreeBSD.ORG Tue Sep 29 08:29:30 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 682EC106566B for ; Tue, 29 Sep 2009 08:29:30 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from proxypop1.sarenet.es (proxypop1.sarenet.es [194.30.0.99]) by mx1.freebsd.org (Postfix) with ESMTP id EB32E8FC0C for ; Tue, 29 Sep 2009 08:29:29 +0000 (UTC) Received: from [172.16.1.204] (izaro.sarenet.es [192.148.167.11]) by proxypop1.sarenet.es (Postfix) with ESMTP id 8327E6207 for ; Tue, 29 Sep 2009 10:29:28 +0200 (CEST) From: Borja Marcos Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Date: Tue, 29 Sep 2009 10:29:26 +0200 Message-Id: <089F63A7-574B-4646-97C7-D82B226CD4CF@sarenet.es> To: freebsd-stable@freebsd.org Mime-Version: 1.0 (Apple Message framework v1076) X-Mailer: Apple Mail (2.1076) Subject: 8.0RC1, ZFS: deadlock X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Sep 2009 08:29:30 -0000 Hello, I have observed a deadlock condition when using ZFS. We are making a heavy usage of zfs send/zfs receive to keep a replica of a dataset on a remote machine. It can be done at one minute intervals. Maybe we're doing a somehow atypical usage of ZFS, but, well, seems to be a great solution to keep filesystem replicas once this is sorted out. How to reproduce: Set up two systems. A dataset with heavy I/O activity is replicated from the first to the second one. I've used a dataset containing /usr/ obj while I did a make buildworld. Replicate the dataset from the first machine to the second one using an incremental send zfs send -i pool/dataset@Nminus1 pool/dataset@N | ssh destination zfs receive -d pool When there is read activity on the second system, reading the replicated system, I mean, having read access while zfs receive is updating it, there can be a deadlock. We have discovered this doing a test on a hopefully soon in production server, with 8 GB RAM. A Bacula backup agent was running and ZFS deadlocked. I have set up a couple of VMWare Fussion virtual machines in order to test this, and it has deadlocked as well. The virtual machines have little memory, 512 MB, but I don't believe this is the actual problem. There is no complaint about lack of memory. A running top shows processes stuck on "zfsvfs" last pid: 2051; load averages: 0.00, 0.07, 0.55 up 0+01:18:25 12:05:48 37 processes: 1 running, 36 sleeping CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 18M Active, 20M Inact, 114M Wired, 40K Cache, 59M Buf, 327M Free Swap: 1024M Total, 1024M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 1914 root 1 62 0 11932K 2564K zfsvfs 0 0:51 0.00% bsdtar 1093 borjam 1 44 0 8304K 2464K CPU1 1 0:32 0.00% top 1913 root 1 54 0 11932K 2600K rrl->r 0 0:19 0.00% bsdtar 1019 root 1 44 0 25108K 4812K select 0 0:05 0.00% sshd 2008 root 1 76 0 13600K 1904K tx->tx 0 0:04 0.00% zfs 1089 borjam 1 44 0 37040K 5216K select 1 0:04 0.00% sshd 995 root 1 76 0 8252K 2652K pause 0 0:02 0.00% csh 840 root 1 44 0 11044K 3828K select 1 0:02 0.00% sendmail 1086 root 1 76 0 37040K 5156K sbwait 1 0:01 0.00% sshd 850 root 1 44 0 6920K 1612K nanslp 0 0:01 0.00% cron 607 root 1 44 0 5992K 1540K select 1 0:01 0.00% syslogd 1090 borjam 1 76 0 8252K 2636K pause 1 0:01 0.00% csh 990 borjam 1 44 0 37040K 5220K select 0 0:00 0.00% sshd 985 root 1 48 0 37040K 5160K sbwait 1 0:00 0.00% sshd 911 root 1 44 0 8252K 2608K ttyin 0 0:00 0.00% csh 991 borjam 1 56 0 8252K 2636K pause 0 0:00 0.00% csh 844 smmsp 1 46 0 11044K 3852K pause 0 0:00 0.00% sendmail Interestingly, this has blocked access to all the filesystems. I cannot, for instance, ssh into the machine anymore, even though all the system-important filesystems are on ufs, I was just using ZFS for a test. Any ideas on what information might be useful to collect? I have the vmware machine right now. I've made a couple of VMWare snapshots of it, first before breaking into DDB with the deadlock just started, the second being into DDB (I've broken into DDB with sysctl). Also, a copy of the VMWare virtual machine with snapshots is avaiable on request. Your choice ;) Borja.