From owner-freebsd-hackers@FreeBSD.ORG  Thu Mar 29 20:20:19 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id A002A10657EB
	for <freebsd-hackers@freebsd.org>; Thu, 29 Mar 2012 20:20:19 +0000 (UTC)
	(envelope-from dieterbsd@engineer.com)
Received: from mailout-us.gmx.com (mailout-us.gmx.com [74.208.5.67])
	by mx1.freebsd.org (Postfix) with SMTP id 450E78FC16
	for <freebsd-hackers@freebsd.org>; Thu, 29 Mar 2012 20:20:16 +0000 (UTC)
Received: (qmail 26259 invoked by uid 0); 29 Mar 2012 17:53:50 -0000
Received: from 67.206.186.239 by rms-us002.v300.gmx.net with HTTP
Content-Type: text/plain; charset="utf-8"
Date: Thu, 29 Mar 2012 13:53:49 -0400
From: "Dieter BSD" <dieterbsd@engineer.com>
Message-ID: <20120329175350.155040@gmx.com>
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
X-Authenticated: #74169980
X-Flags: 0001
X-Mailer: GMX.com Web Mailer
x-registered: 0
Content-Transfer-Encoding: 8bit
X-GMX-UID: +cQGb/Rd3zOlNR3dAHAhKd9+IGRvbwDo
Subject: Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Mar 2012 20:20:19 -0000

> FreeBSD ?? - 7.4 never crash
> FreeBSD 8.0 - 8.2 crashes

Obvious short term workaround is to run production on 7.4 (assuming you can)
until you figure out what is wrong with 8.x.

What filesystem(s) are you running? UFS? ZFS? other?

> started randomly disconnecting people every morning

Due to timeouts? Something might be keeping interrupts disabled
too long.

> there were other good reasons to reload the
> VM, so I nuked the VM, which, of course, fixed it.

> I can look at recovering the faulty VM from backup

Sounds like corruption.  Can you compare the bad VM against a good
one?  If you find corruption, the question then becomes what is causing
the corruption?  Sounds like the same thing is getting corrupted
every time, rather than something at random.

Sounds like the corruption is causing a deadlock in something
common, like the buffer cache, or filesystem, or...

Is it possible to have root be a ramdisk?  This might give you
access to the utilities, depending on where the problem is.

I have vague memories that the sticky bit used to lock a program in
memory, but sticky(8) indicates that this is no longer the case.
Is there a way to lock a program in memory? (So that it will be available
when the system can't do disk i/o.)  If not, you could keep some
windows open with things like top and systat -vmstat running.

Some of the utilities have options to look at a disk file rather than
the live system, if you can get a core dump (swap to NFS?).