From owner-freebsd-hackers@FreeBSD.ORG  Fri Mar 30 21:18:22 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 65C8D106566B
	for <freebsd-hackers@freebsd.org>; Fri, 30 Mar 2012 21:18:22 +0000 (UTC)
	(envelope-from dieterbsd@engineer.com)
Received: from mailout-us.gmx.com (mailout-us.gmx.com [74.208.5.67])
	by mx1.freebsd.org (Postfix) with SMTP id 0B3188FC1F
	for <freebsd-hackers@freebsd.org>; Fri, 30 Mar 2012 21:18:21 +0000 (UTC)
Received: (qmail 12276 invoked by uid 0); 30 Mar 2012 21:18:20 -0000
Received: from 67.206.186.20 by rms-us002.v300.gmx.net with HTTP
Content-Type: text/plain; charset="utf-8"
Date: Fri, 30 Mar 2012 17:18:18 -0400
From: "Dieter BSD" <dieterbsd@engineer.com>
Message-ID: <20120330211819.155070@gmx.com>
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
X-Authenticated: #74169980
X-Flags: 0001
X-Mailer: GMX.com Web Mailer
x-registered: 0
Content-Transfer-Encoding: 8bit
X-GMX-UID: TEYEb/dd3zOlNR3dAHAh+ft+IGRvb0AS
Subject: Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Mar 2012 21:18:22 -0000

> Subsequent inspection suggested that it was happening during the
> periodic daily, though we never managed to get it to happen by manually
> forcing periodic daily, so that's only a theory.

Perhaps due to a bunch of VMs all running periodic daily at the same time?

> We had a perfectly functional, nearly zero-traffic VM, since Jabber
> traffic averages no more than a few messages per hour.  It was working
> for quite some time.
>
> We moved it from a local datastore to an iSCSI datastore that ended up
> getting periodically crushed by the load (in particular during the
> periodic daily load imposed by a bunch of VM's all running at once).
> At this point, this one VM started hanging on I/O.  We expected that
> this would clear up upon return to a host with a local datastore.  It
> did not.
>
> This ended up as a broken VM, one that would hang up overnite, maybe
> not every night, but several times a week at least.

...

> For the problem to "follow" the VM in this manner, and afflict *only*
> the one VM, strongly suggests that it is something that is contained
> within the VM files that constitute this VM.  That is consistent with
> the observation that the problem arose at a point where the VM is
> known to have had all those files moved from one location to a dodgy
> location.
>
> That's why I believe the evidence points to corruption of some sort.

Compare a backup of the VM before it broke to a backup of the same VM
after it broke.  Hopefully the haystack of insignificant differences
isn't too large, or the significant difference needle might be a
lot of "fun" to find.