From owner-freebsd-questions@FreeBSD.ORG Thu Mar 29 17:24:37 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C1C561065672 for ; Thu, 29 Mar 2012 17:24:37 +0000 (UTC) (envelope-from jerry@seibercom.net) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id 643608FC14 for ; Thu, 29 Mar 2012 17:24:37 +0000 (UTC) Received: by yhgm50 with SMTP id m50so2011599yhg.13 for ; Thu, 29 Mar 2012 10:24:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:to:subject:message-id:in-reply-to:references:reply-to :organization:x-mailer:face:mime-version:content-type :content-transfer-encoding:x-gm-message-state; bh=2xC9caMuV/eEuh9LSEnD5Cj+wWI6109H9dfhH50fHdw=; b=iQ41qqpdbhB1POvTVjbqtXSnKl4iSC9dHEyJWEVfJUpSUe2RMXjyIg09Y/NOI6OhmO 5fYoEkKuac+wKJCXmhL6ep0YrWFGtt01g7VoibC0R28JF4YO9wcuQgR4HiCZ6RYuiYhb b63OtA1g290xe9p94L2LL08wEDKIS0PpcCUWw5L1x3Aw7SoJW51viwZhAt95R/dy6T2u Nq1u55SPhuSQjepWcDmO6EPh7ChPOOe1ffoR75az7LSdKLhnUH/5CsfOhvofcrlChG6K 32HJxqRmc+bYpr3tbM8uCZCqDa8hjrpiIGIxSV5KaWtBqBFipFNi+RpRNgVZ0p89qgBq LzUA== Received: by 10.236.168.41 with SMTP id j29mr34711196yhl.24.1333041876624; Thu, 29 Mar 2012 10:24:36 -0700 (PDT) Received: from scorpio.seibercom.net ([76.182.104.150]) by mx.google.com with ESMTPS id 34sm8719089anu.6.2012.03.29.10.24.35 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 29 Mar 2012 10:24:35 -0700 (PDT) Received: from scorpio (localhost [127.0.0.1]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: jerry@scorpio.seibercom.net) by scorpio.seibercom.net (Postfix) with ESMTPSA id 3VJXrb0zx2z2CG5d for ; Thu, 29 Mar 2012 13:24:31 -0400 (EDT) Date: Thu, 29 Mar 2012 13:24:30 -0400 From: Jerry To: FreeBSD Message-ID: <20120329132430.13dc08e7@scorpio> In-Reply-To: <4F749141.8010109@gmail.com> References: <4F749141.8010109@gmail.com> Organization: seibercom.net X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.6; amd64-portbld-freebsd8.2) Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAHlBMVEUAAABYRlwJCw4FAgAIBwKprDkBAQFQLR0BAgCir7VRttp8AAACAUlEQVQ4jZWUTYvbMBCGTVl8V2hX6Gg5G5FbWQdBj0lEfE7BhN4cyzi5Wt1E5L70roWy6N92xok/skkP+5IYrMcz78xIduDWpNM3vFzuA/jX5EY1AI6KHFwW/CzFuQAwqUBbV12p+CzIh6Awq7sg33pn5D64SQXAexffeuQlA/L35RrkaB551OjGfP/cAO8mCNaDcgvfky5ijoD0pAXlCQCnljiAjsJD9Ax05Ko5sZxbnLQcmM+dZg5IjREfZrWIHK0JuwU68pAGwHvfRxBundRzTxxz3r9dNUikPsEihjz2Dc4kjp1hKsJGuot4EDxaxzMoC7XqhxhOSfZrTS6gSX1JVdjp+o1PvWfekXgw3WL0g70nDEwA0H0HQsEZc8sTmFMTkWUfYWC/vdR1zQy3xLQgLwzu90QnlnFLjeiGWBjwhb4Sa42IqOg2qqS4O1/zhKokFUb1Q8Rj4Eb69WVflXEehJ35DgChVTE5n50eaGyMLOfH8AOodoSM4PVYAQgQdBulOa+knklYks3vAuQ+uX492lTl+A+e8qBV2AKoXalVKFfyuUp0pUp1ARaUHh82lv9MN+Ig7CZtgE6FNYvjlywT2VP2dMgOG46gTIWcqdfvuwyXNz0oMJNd/N5lh1YNiJt19ADTUo3VuFSNeQwVqRSrGjSCp53fk2g+Mvfk/gfoPxHeUS8MH9vRAAAAAElFTkSuQmCC Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Gm-Message-State: ALoCoQnqtcG4Z7JvD2G2XrmeSU9MbuSodBijSP6HiO+BQvz72AfIal+Q+axc126pMnf04ZlrvM8G Subject: Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: FreeBSD List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Mar 2012 17:24:37 -0000 On Thu, 29 Mar 2012 11:43:45 -0500 Jim Bryant articulated: > Mark Felder wrote: > > Alright guys, I'm at the end of my rope here. For those that > > haven't seen my previous emails here's the (not so) quick breakdown: > > > > Overview: > > > > FreeBSD ?? - 7.4 never crash > > FreeBSD 8.0 - 8.2 crashes > > FreeBSD 8-STABLE, 8.3, and 9.0 are untested (Sorry, not possible in > > our production at this time, and we were hoping we could base some > > stuff on 8.3 for long term stability...) > > ESXi: Confirmed ESXi 4.0 - 5.0 has this problem. Haven't tested on > > others. > > > > > > History: > > > > Over the course of the last 2 years we've been banging our heads on > > the wall. VMWare is done debugging this. They claim it's not a > > VMWare issue. They can't identify what the heck happens. We had a > > glimmer of hope with ESXi 5.0 fixing it because we never saw any > > crashes in the handful of deployments, but our dreams were crushed > > today -- two days before an outage to begin migration to ESXi 5.0 > > -- when a customer's ESXi 5.0 server and FreeBSD 8.2 guest crashed. > > > > > > Crash Details: > > > > The keyboard/mouse usually stops responding for input on the > > console; normally we can't type in a username or password. However, > > we can switch VTs. > > > > If there's a shell on the console and we can type, we can only run > > things in memory. Any time we try to access the disk it will hang > > indefinitely. > > > > The server still has network access. We can ping it without issue. > > SSH of course kicks you out because it can't do any I/O. > > > > If we were to serve a lightweight http server off a memory backed > > filesystem I'm confident it would run just fine as long as it > > wasn't logging or anything. > > > > On ESXi you see that there is a CPU spike of 100% that goes on > > indefinitely. No idea what the FreeBSD OS itself thinks it is doing > > because we can't run top during the crash. > > > > This crash can affect a server and happen multiple times a week. It > > can also not show up for 180 days or more. But it does happen. The > > server can be 100% idle and crash. We have servers that do more I/O > > than the ones that crash could ever attempt to do and these don't > > crash at all. Completely inexplicable. > > > > > > Things we've looked into: > > > > Nothing about the installed software matters. We've tried cross > > referencing the crashed servers by the programs they run but the > > base OS is the only common denominator due to the wide variety of > > servers it has affected. > > > > Storage doesn't matter. We've tried different iSCSI SANs, we've > > tried different switches, we've tried local datastores on the ESXi > > servers themselves. > > > > HP servers, Dell servers -- doesn't seem to matter either. (All > > with latest firmwares, BIOSes, etc) > > > > VMWare gave us a ton of debugging tasks, and we've given them > > gigabytes of debugging info and data; they can't find anything. > > > > VMWare tools -- with, without, using open-vm-tools makes no > > difference. I think we've done a fair job ruling out VMWare. > > > > > > I think we've finally found enough data that this is definitely > > something in the FreeBSD world. I'm going to begin prepping some of > > the known crashy servers with more debugging. Any suggestions on > > what I should build the kernel with? They never do a proper panic, > > but I definitely want to at least *try* to get into the debugger > > the next time it crashes. And when it crashes, what the heck should > > I be running? I've never played with the KDB before... > > > > > > Thank you for any suggestions and help you can give me.... > > This sounds just like a race condition that happens under Windows 7 > on this laptop. The race condition, as far as I can tell involves > heavy disk access and heavy network access, and usually leaves the > drive light on, while all activity monitors (alldisk, allcpu, > allnetwork) are still active, although on this laptop disk takes > priority, and network slows to a crawl. occasionally, the mouse will > stop working, along with everything else, but usually not. keyboard > is lower priority, and doesn't do anything. > > You might want to check with mickeysoft, this might just be their > problem. This sounds so freaking similar to the issue I get, and I > think it's a race condition (shared interrupts??). > > This laptop is a Compaq Presario C300 series, with the 945GM chipset > and a T7600 Core2 Duo CPU, with 3G of RAM. {TOP POSTING CORRECTED} I just started reading this tread, but I am wondering if I missed something here. What does this have to do with "Windows 7"? -- Jerry ♔ Disclaimer: off-list followups get on-list replies or get ignored. Please do not ignore the Reply-To header. __________________________________________________________________