From owner-freebsd-stable@FreeBSD.ORG  Fri Sep 10 09:04:07 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 67A8F1065670
	for <freebsd-stable@freebsd.org>; Fri, 10 Sep 2010 09:04:07 +0000 (UTC)
	(envelope-from free.bsd@webstyle.ch)
Received: from zimbra.webstyle.ch (zimbra.webstyle.ch [212.103.68.7])
	by mx1.freebsd.org (Postfix) with ESMTP id F128F8FC19
	for <freebsd-stable@freebsd.org>; Fri, 10 Sep 2010 09:04:06 +0000 (UTC)
Received: from localhost (localhost.localdomain [127.0.0.1])
	by zimbra.webstyle.ch (Postfix) with ESMTP id 6F63910C00B8
	for <freebsd-stable@freebsd.org>; Fri, 10 Sep 2010 10:45:12 +0200 (CEST)
X-Virus-Scanned: amavisd-new at zimbra.webstyle.ch
Received: from zimbra.webstyle.ch ([127.0.0.1])
	by localhost (zimbra.webstyle.ch [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id 4s6O290OkokI; Fri, 10 Sep 2010 10:45:11 +0200 (CEST)
Received: from [192.168.1.128] (unknown [212.60.63.146])
	by zimbra.webstyle.ch (Postfix) with ESMTPA id 154F710C00A1;
	Fri, 10 Sep 2010 10:45:11 +0200 (CEST)
Message-ID: <4C89F014.1050601@webstyle.ch>
Date: Fri, 10 Sep 2010 10:45:08 +0200
From: freebsd <free.bsd@webstyle.ch>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; de;
	rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Subject: strange problem with FreeBSD 7.3 64bit
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Sep 2010 09:04:07 -0000

hi list,

we upgraded some 20 boxes from 7.1 and 7.2 to 7.3-RELEASE-p2 (all amd64) 
and now are experiencing some weird behaviour on 6 of them with rsnapshot:

after a few days/several weeks (seems to be completely random), 
rsnapshot reports that it can't start due it's lockfile and process 
still being present. on such boxes either a zombie rm or find process 
(which presumably were launched by rsnapshot) can be found.
if the backup was done to a separate partition (physical disks or RAIDs) 
any access (ls, stat, fsck, etc) to the partition would kill the current 
SSH session, creating a new zombie of the process one just started. 
unmounting the affected partition would render the server completely 
unresponsive and required a hardware reset.

when trying to restart, the machines wouldn't even shut down completely 
but hanged somewhere after syncing buffers, only a hardware reset 
worked. after the reboot, those partitions were unmounted and fscked. 
after which the backups would work again until the next error happened 
again.

the hardware of affected and unaffected system are:

HP ProLiant DL380 G4
HP ProLiant DL380 G5
HP ProLiant DL360 G5

there is no visible pattern between affected and unaffected boxes. also 
those machines were upgraded the exact same way, running identical 
kernels (more or less GENERIC, with QUOTA activated).

we upgraded the most critical boxes which showed that behaviour on a 
daily interval to 8.0-RELEASE and ever since this behavior has 
disappeared since nearly 3 months now.

we installed a debug-kernel on an affected box, but the machine wouldn't 
panic when the error occured. when trying to unmount the affected 
partition it just went completely unresponsive, as mentioned above.

before trying to unmount procstat -ak showed some processes with 
VOP_LOCK1_APV:

55396 100135 find - mi_switch sleepq_switch sleepq_wait _sleep acquire 
_lockmgr ffs_lock VOP_LOCK1_APV _vn_lock vget cache_lookup 
vfs_cache_lookup VOP_LOOKUP_APV lookup namei kern_lstat lstat syscall
70923 100146 rsync - mi_switch sleepq_switch sleepq_wait _sleep acquire 
_lockmgr ffs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ffs_vgetf 
ufs_lookup_ vfs_cache_lookup OP_LOOKUP_APV lookup namei kern_lstat

since this hardware has been working before 7.3 and -- as we assume -- 
would work again with 8.*, we would be grateful for any hints what could 
be the cause of all this.

kind regards
Flo