Date: Fri, 10 Sep 2010 10:45:08 +0200 From: freebsd <free.bsd@webstyle.ch> To: freebsd-stable@freebsd.org Subject: strange problem with FreeBSD 7.3 64bit Message-ID: <4C89F014.1050601@webstyle.ch>
next in thread | raw e-mail | index | archive | help
hi list, we upgraded some 20 boxes from 7.1 and 7.2 to 7.3-RELEASE-p2 (all amd64) and now are experiencing some weird behaviour on 6 of them with rsnapshot: after a few days/several weeks (seems to be completely random), rsnapshot reports that it can't start due it's lockfile and process still being present. on such boxes either a zombie rm or find process (which presumably were launched by rsnapshot) can be found. if the backup was done to a separate partition (physical disks or RAIDs) any access (ls, stat, fsck, etc) to the partition would kill the current SSH session, creating a new zombie of the process one just started. unmounting the affected partition would render the server completely unresponsive and required a hardware reset. when trying to restart, the machines wouldn't even shut down completely but hanged somewhere after syncing buffers, only a hardware reset worked. after the reboot, those partitions were unmounted and fscked. after which the backups would work again until the next error happened again. the hardware of affected and unaffected system are: HP ProLiant DL380 G4 HP ProLiant DL380 G5 HP ProLiant DL360 G5 there is no visible pattern between affected and unaffected boxes. also those machines were upgraded the exact same way, running identical kernels (more or less GENERIC, with QUOTA activated). we upgraded the most critical boxes which showed that behaviour on a daily interval to 8.0-RELEASE and ever since this behavior has disappeared since nearly 3 months now. we installed a debug-kernel on an affected box, but the machine wouldn't panic when the error occured. when trying to unmount the affected partition it just went completely unresponsive, as mentioned above. before trying to unmount procstat -ak showed some processes with VOP_LOCK1_APV: 55396 100135 find - mi_switch sleepq_switch sleepq_wait _sleep acquire _lockmgr ffs_lock VOP_LOCK1_APV _vn_lock vget cache_lookup vfs_cache_lookup VOP_LOOKUP_APV lookup namei kern_lstat lstat syscall 70923 100146 rsync - mi_switch sleepq_switch sleepq_wait _sleep acquire _lockmgr ffs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ffs_vgetf ufs_lookup_ vfs_cache_lookup OP_LOOKUP_APV lookup namei kern_lstat since this hardware has been working before 7.3 and -- as we assume -- would work again with 8.*, we would be grateful for any hints what could be the cause of all this. kind regards Flo
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C89F014.1050601>