From owner-freebsd-hackers@FreeBSD.ORG Thu Mar 29 20:20:19 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A002A10657EB for ; Thu, 29 Mar 2012 20:20:19 +0000 (UTC) (envelope-from dieterbsd@engineer.com) Received: from mailout-us.gmx.com (mailout-us.gmx.com [74.208.5.67]) by mx1.freebsd.org (Postfix) with SMTP id 450E78FC16 for ; Thu, 29 Mar 2012 20:20:16 +0000 (UTC) Received: (qmail 26259 invoked by uid 0); 29 Mar 2012 17:53:50 -0000 Received: from 67.206.186.239 by rms-us002.v300.gmx.net with HTTP Content-Type: text/plain; charset="utf-8" Date: Thu, 29 Mar 2012 13:53:49 -0400 From: "Dieter BSD" Message-ID: <20120329175350.155040@gmx.com> MIME-Version: 1.0 To: freebsd-hackers@freebsd.org X-Authenticated: #74169980 X-Flags: 0001 X-Mailer: GMX.com Web Mailer x-registered: 0 Content-Transfer-Encoding: 8bit X-GMX-UID: +cQGb/Rd3zOlNR3dAHAhKd9+IGRvbwDo Subject: Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Mar 2012 20:20:19 -0000 > FreeBSD ?? - 7.4 never crash > FreeBSD 8.0 - 8.2 crashes Obvious short term workaround is to run production on 7.4 (assuming you can) until you figure out what is wrong with 8.x. What filesystem(s) are you running? UFS? ZFS? other? > started randomly disconnecting people every morning Due to timeouts? Something might be keeping interrupts disabled too long. > there were other good reasons to reload the > VM, so I nuked the VM, which, of course, fixed it. > I can look at recovering the faulty VM from backup Sounds like corruption.  Can you compare the bad VM against a good one?  If you find corruption, the question then becomes what is causing the corruption?  Sounds like the same thing is getting corrupted every time, rather than something at random. Sounds like the corruption is causing a deadlock in something common, like the buffer cache, or filesystem, or... Is it possible to have root be a ramdisk?  This might give you access to the utilities, depending on where the problem is. I have vague memories that the sticky bit used to lock a program in memory, but sticky(8) indicates that this is no longer the case. Is there a way to lock a program in memory? (So that it will be available when the system can't do disk i/o.)  If not, you could keep some windows open with things like top and systat -vmstat running. Some of the utilities have options to look at a disk file rather than the live system, if you can get a core dump (swap to NFS?).