From owner-freebsd-stable@FreeBSD.ORG Fri Sep 10 09:04:07 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 67A8F1065670 for ; Fri, 10 Sep 2010 09:04:07 +0000 (UTC) (envelope-from free.bsd@webstyle.ch) Received: from zimbra.webstyle.ch (zimbra.webstyle.ch [212.103.68.7]) by mx1.freebsd.org (Postfix) with ESMTP id F128F8FC19 for ; Fri, 10 Sep 2010 09:04:06 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by zimbra.webstyle.ch (Postfix) with ESMTP id 6F63910C00B8 for ; Fri, 10 Sep 2010 10:45:12 +0200 (CEST) X-Virus-Scanned: amavisd-new at zimbra.webstyle.ch Received: from zimbra.webstyle.ch ([127.0.0.1]) by localhost (zimbra.webstyle.ch [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4s6O290OkokI; Fri, 10 Sep 2010 10:45:11 +0200 (CEST) Received: from [192.168.1.128] (unknown [212.60.63.146]) by zimbra.webstyle.ch (Postfix) with ESMTPA id 154F710C00A1; Fri, 10 Sep 2010 10:45:11 +0200 (CEST) Message-ID: <4C89F014.1050601@webstyle.ch> Date: Fri, 10 Sep 2010 10:45:08 +0200 From: freebsd User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; de; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: strange problem with FreeBSD 7.3 64bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Sep 2010 09:04:07 -0000 hi list, we upgraded some 20 boxes from 7.1 and 7.2 to 7.3-RELEASE-p2 (all amd64) and now are experiencing some weird behaviour on 6 of them with rsnapshot: after a few days/several weeks (seems to be completely random), rsnapshot reports that it can't start due it's lockfile and process still being present. on such boxes either a zombie rm or find process (which presumably were launched by rsnapshot) can be found. if the backup was done to a separate partition (physical disks or RAIDs) any access (ls, stat, fsck, etc) to the partition would kill the current SSH session, creating a new zombie of the process one just started. unmounting the affected partition would render the server completely unresponsive and required a hardware reset. when trying to restart, the machines wouldn't even shut down completely but hanged somewhere after syncing buffers, only a hardware reset worked. after the reboot, those partitions were unmounted and fscked. after which the backups would work again until the next error happened again. the hardware of affected and unaffected system are: HP ProLiant DL380 G4 HP ProLiant DL380 G5 HP ProLiant DL360 G5 there is no visible pattern between affected and unaffected boxes. also those machines were upgraded the exact same way, running identical kernels (more or less GENERIC, with QUOTA activated). we upgraded the most critical boxes which showed that behaviour on a daily interval to 8.0-RELEASE and ever since this behavior has disappeared since nearly 3 months now. we installed a debug-kernel on an affected box, but the machine wouldn't panic when the error occured. when trying to unmount the affected partition it just went completely unresponsive, as mentioned above. before trying to unmount procstat -ak showed some processes with VOP_LOCK1_APV: 55396 100135 find - mi_switch sleepq_switch sleepq_wait _sleep acquire _lockmgr ffs_lock VOP_LOCK1_APV _vn_lock vget cache_lookup vfs_cache_lookup VOP_LOOKUP_APV lookup namei kern_lstat lstat syscall 70923 100146 rsync - mi_switch sleepq_switch sleepq_wait _sleep acquire _lockmgr ffs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ffs_vgetf ufs_lookup_ vfs_cache_lookup OP_LOOKUP_APV lookup namei kern_lstat since this hardware has been working before 7.3 and -- as we assume -- would work again with 8.*, we would be grateful for any hints what could be the cause of all this. kind regards Flo