From owner-freebsd-questions@FreeBSD.ORG Thu Nov 20 18:29:10 2014 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6E3FC6B9 for ; Thu, 20 Nov 2014 18:29:10 +0000 (UTC) Received: from mgaterz1.oekb.co.at (mgaterz1.oekb.co.at [143.245.5.111]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "mgaterz1.oekb.co.at", Issuer "Symantec Class 3 Secure Server CA - G4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BEDB435E for ; Thu, 20 Nov 2014 18:29:09 +0000 (UTC) Received: from exchhubcas1.oekb.co.at ([143.245.3.64]) by mgaterz1.oekb.co.at with ESMTP/TLS/AES128-SHA; 20 Nov 2014 19:27:56 +0100 Received: from aurora.oekb.co.at (143.245.9.16) by internal-relay-exchhubcas1.oekb.co.at (143.245.3.65) with Microsoft SMTP Server id 14.3.210.2; Thu, 20 Nov 2014 19:27:49 +0100 Received: from aurora.oekb.co.at (localhost [127.0.0.1]) by aurora.oekb.co.at (8.14.9/8.14.9) with ESMTP id sAKIRnVB003590; Thu, 20 Nov 2014 19:27:49 +0100 (CET) (envelope-from a@jenisch.at) Received: (from ej@localhost) by aurora.oekb.co.at (8.14.9/8.14.9/Submit) id sAKIRmGV003587; Thu, 20 Nov 2014 19:27:48 +0100 (CET) (envelope-from a@jenisch.at) X-Authentication-Warning: aurora.oekb.co.at: ej set sender to a@jenisch.at using -f Date: Thu, 20 Nov 2014 19:27:48 +0100 From: Ewald Jenisch To: Subject: SCSI errors on VMware guest Message-ID: <20141120182748.GA3546@aurora.oekb.co.at> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2014 18:29:10 -0000 Hi, I'm running a FreeBSD 9.3 machine (amd64) on a VMware 5.5 host. "Disks" used by this VM come from a NetApp system providing storage for 200+ virtual machines. Increasingly I'm seeing errors like these: Nov 19 09:02:59 igue kernel: (da0:mpt0:0:0:0): WRITE(10). CDB: 2a 00 00 20 02 a2 00 00 40 00 Nov 19 09:02:59 igue kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error Nov 19 09:02:59 igue kernel: (da0:mpt0:0:0:0): SCSI status: Busy Nov 19 09:02:59 igue kernel: (da0:mpt0:0:0:0): Retrying command Nov 19 09:02:59 igue kernel: (da0:mpt0:0:0:0): WRITE(10). CDB: 2a 00 00 c0 00 a2 00 00 08 00 Nov 19 09:02:59 igue kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error Nov 19 09:02:59 igue kernel: (da0:mpt0:0:0:0): SCSI status: Busy Nov 19 09:02:59 igue kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error Nov 19 09:02:59 igue kernel: (da0:mpt0:0:0:0): SCSI status: Busy Nov 19 09:02:59 igue kernel: (da0:mpt0:0:0:0): Retrying command Eventually this ends up in a crash of the system: Nov 19 09:48:17 igue syslogd: kernel boot file is /boot/kernel/kernel Nov 19 09:48:17 igue kernel: panic: initiate_write_inodeblock_ufs2: already started Nov 19 09:48:17 igue kernel: cpuid = 0 Nov 19 09:48:17 igue kernel: KDB: stack backtrace: Nov 19 09:48:17 igue kernel: #0 0xffffffff80934766 at kdb_backtrace+0x66 Nov 19 09:48:17 igue kernel: #1 0xffffffff808fa2ee at panic+0x1ce Nov 19 09:48:17 igue kernel: #2 0xffffffff80b35814 at softdep_disk_io_initiation+0xf04 Nov 19 09:48:17 igue kernel: #3 0xffffffff80b4352f at ffs_geom_strategy+0x17f Nov 19 09:48:17 igue kernel: #4 0xffffffff80980195 at bufwrite+0x145 Nov 19 09:48:17 igue kernel: #5 0xffffffff80979fdf at vfs_bio_awrite+0x7f Nov 19 09:48:17 igue kernel: #6 0xffffffff80986448 at vop_stdfsync+0x288 Nov 19 09:48:17 igue kernel: #7 0xffffffff807dc5e8 at devfs_fsync+0x98 Nov 19 09:48:17 igue kernel: #8 0xffffffff80de31c8 at VOP_FSYNC_APV+0x78 Nov 19 09:48:17 igue kernel: #9 0xffffffff8099ab5b at sync_vnode+0x16b Nov 19 09:48:17 igue kernel: #10 0xffffffff8099ae65 at sched_sync+0x1c5 Nov 19 09:48:17 igue kernel: #11 0xffffffff808c810f at fork_exit+0x11f Nov 19 09:48:17 igue kernel: #12 0xffffffff80ccc5be at fork_trampoline+0xe Nov 19 09:48:17 igue kernel: Uptime: 12d20h44m1s Nov 19 09:48:17 igue kernel: (da0:mpt0:0:0:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 Nov 19 09:48:17 igue kernel: (da0:mpt0:0:0:0): CAM status: Command timeout Nov 19 09:48:17 igue kernel: (da0:mpt0:0:0:0): Error 5, Retries exhausted Nov 19 09:48:17 igue kernel: (da0:mpt0:0:0:0): Synchronize cache failed I've fsck-ed all filesystems in singleuser to be sure there's no logical error, but these SCSI error messages keep coming - and with them crashes of the VM. Also "vmware-checkvm" means everything's fine # vmware-checkvm -h VM's hw version is 4 VMware software version 6 (good) # BTW, I've installed vmware tools that come with the vmware host following this http://hephaex.blogspot.co.at/2013/01/installing-vmware-tools-on-freebsd-91.html advice, i.e. pkg_add -r perl cd /usr/ports/misc/compat6x/; make && make install mount -t cd9660 /dev/cd0 /cdrom cd /tmp mkdir VMware-tools cd VMware-tools/ cp /cdrom/vmware-freebsd-tools.tar.gz . tar zxf vmware-freebsd-tools.tar.gz cd vmware-tools-distrib/lib/modules/source/ tar xf vmmemctl.tar cd vmmemctl-only/ make; make install cd /tmp/VMware-tools/vmware-tools-distrib/lib/modules/source/ tar xf vmblock.tar cd vmblock-only make && make install cd /tmp/VMware-tools/vmware-tools-distrib/ ./vmware-install.pl No errors during building and/or installing. BTW, I've seen these errors only on a FreeBSD system - no other VM running on the host has reported any error whatsoever; so to my understanding it's a FreeBSD related thing going on here.2 Has anybody else seen these errors? Any known cure against it? Thanks much in advance for any clue, -ewald