From owner-freebsd-current@FreeBSD.ORG Tue Sep 4 13:33:55 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 36E2416A418 for ; Tue, 4 Sep 2007 13:33:55 +0000 (UTC) (envelope-from kvs@binarysolutions.dk) Received: from solow.pil.dk (relay.pil.dk [195.41.47.164]) by mx1.freebsd.org (Postfix) with ESMTP id 015B313C457 for ; Tue, 4 Sep 2007 13:33:54 +0000 (UTC) (envelope-from kvs@binarysolutions.dk) Received: from coruscant.pil.dk (fw2.pil.dk [83.90.227.58]) by solow.pil.dk (Postfix) with ESMTP id 412291CC0B8 for ; Tue, 4 Sep 2007 15:08:20 +0200 (CEST) Received: by coruscant.pil.dk (Postfix, from userid 502) id 8DF575FF4AF; Tue, 4 Sep 2007 15:08:20 +0200 (CEST) To: freebsd-current@freebsd.org From: Kenneth Vestergaard Schmidt Date: Tue, 04 Sep 2007 15:08:20 +0200 Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (darwin) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailman-Approved-At: Tue, 04 Sep 2007 13:36:21 +0000 Subject: Unkillable and runaway processes X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Sep 2007 13:33:55 -0000 Hello. Our ZFS testbed is experiencing some weird problems with rsync. We run a nightly backup of about 1.6 TB data (that's how much is stored, not how much is transferred), but after the initial sync I haven't been able to get the machine through one full cycle. After many hours of rsyncing data from 50+ machines, suddenly one rsync-process will hang, spinning on the CPU. It switches state between CPU0, CPU1, RUN and 'zfs:(&', but doesn't really do anything. It can't be killed, and you can't reboot the machine - it'll get past syncing disks, but won't shutdown or reboot. I can't do an 'ls' in the directory that rsync is running on - it'll just hang, too. The machine is running current from August 29th. I could use some pointers on what to do - is there some way I can debug this better, maybe give some better info? -- Kenneth Schmidt pil.dk