From owner-freebsd-current@FreeBSD.ORG Tue Sep 4 23:07:38 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EFFB516A418 for ; Tue, 4 Sep 2007 23:07:38 +0000 (UTC) (envelope-from Benjamin.Close@clearchain.com) Received: from ipmail03.adl2.internode.on.net (ipmail03.adl2.internode.on.net [203.16.214.135]) by mx1.freebsd.org (Postfix) with ESMTP id 570C813C468 for ; Tue, 4 Sep 2007 23:07:38 +0000 (UTC) (envelope-from Benjamin.Close@clearchain.com) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ah4FADN+3UZ5La7a/2dsb2JhbACBWQ X-IronPort-AV: E=Sophos;i="4.20,208,1186324200"; d="scan'208";a="143407475" Received: from ppp121-45-174-218.lns11.adl2.internode.on.net (HELO mail.clearchain.com) ([121.45.174.218]) by ipmail03.adl2.internode.on.net with ESMTP; 05 Sep 2007 08:22:18 +0930 Received: from benjamin-closes-powerbook-g4-12.local (wcl.ml.unisa.edu.au [130.220.166.5]) (authenticated bits=0) by mail.clearchain.com (8.13.8/8.13.8) with ESMTP id l84Mq7un065049 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 5 Sep 2007 08:22:15 +0930 (CST) (envelope-from Benjamin.Close@clearchain.com) Message-ID: <46DDE44B.1060203@clearchain.com> Date: Wed, 05 Sep 2007 08:33:39 +0930 From: Benjamin Close User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Kenneth Vestergaard Schmidt References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV version 0.91.2, clamav-milter version 0.91.2 on pegasus.clearchain.com X-Virus-Status: Clean X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (mail.clearchain.com [192.168.154.1]); Wed, 05 Sep 2007 08:22:15 +0930 (CST) X-Mailman-Approved-At: Wed, 05 Sep 2007 02:08:14 +0000 Cc: freebsd-current@freebsd.org Subject: Re: Unkillable and runaway processes X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Sep 2007 23:07:39 -0000 Kenneth Vestergaard Schmidt wrote: > Hello. > > Our ZFS testbed is experiencing some weird problems with rsync. We run a > nightly backup of about 1.6 TB data (that's how much is stored, not how > much is transferred), but after the initial sync I haven't been able to > get the machine through one full cycle. > > After many hours of rsyncing data from 50+ machines, suddenly one > rsync-process will hang, spinning on the CPU. > > It switches state between CPU0, CPU1, RUN and 'zfs:(&', but doesn't > really do anything. It can't be killed, and you can't reboot the machine > - it'll get past syncing disks, but won't shutdown or reboot. > > I can't do an 'ls' in the directory that rsync is running on - it'll > just hang, too. > > The machine is running current from August 29th. > > I could use some pointers on what to do - is there some way I can debug > this better, maybe give some better info? > > I do a similar thing with close to 3 TB of data and have found that too much activity causes the same hang you mention. Disabiling ZIL fixes the issues: vfs.zfs.zil_disable=1 in /boot/loader.conf Since ZFS is always consistent on disk and ZIL and it's a nightly rsync, disabling ZIL is quite safe. I'd love to debug here this but can't as the box uses a USB mouse/keyboard so every time I drop to a debugger I lose keyboard support :( Cheers, Benjamin