From owner-freebsd-hackers@FreeBSD.ORG Mon May 21 16:41:55 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 30D601065677; Mon, 21 May 2012 16:41:54 +0000 (UTC) (envelope-from feld@feld.me) Received: from feld.me (unknown [IPv6:2607:f4e0:100:300::2]) by mx1.freebsd.org (Postfix) with ESMTP id 3922F8FC23; Mon, 21 May 2012 16:41:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=feld.me; s=blargle; h=In-Reply-To:Message-Id:From:Mime-Version:Date:References:Subject:To:Content-Type; bh=I2Yk5ShcxO8I5YmHjGdt1oy0JLGatJBu3bSukJzztZA=; b=AD8Ks0kb7IVZehINEb53evCenPc1YfnPPd/e9tapU/TgDkY3lcrwWom7CZQ5rUjG2PdcfWrxHr7VGw27Tw7LzP3I2zjR1sqJFWie8KFuMX1Lrp6BDGwkrB2g3D7pqzfK; Received: from localhost ([127.0.0.1] helo=mwi1.coffeenet.org) by feld.me with esmtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1SWVg9-0005cb-Aa; Mon, 21 May 2012 11:41:53 -0500 Received: from feld@feld.me by mwi1.coffeenet.org (Archiveopteryx 3.1.4) with esmtpa id 1337618507-3288-3287/5/16; Mon, 21 May 2012 16:41:47 +0000 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: freebsd-questions@freebsd.org, freebsd-hackers@freebsd.org References: Date: Mon, 21 May 2012 11:41:46 -0500 Mime-Version: 1.0 From: Mark Felder Message-Id: In-Reply-To: User-Agent: Opera Mail/11.64 (FreeBSD) X-SA-Score: -1.5 Cc: Subject: Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 May 2012 16:41:55 -0000 OK guys I've been talking with another user who can recreate this crash and the last bit of information we've learned seems to be leaning towards interrupts/IRQ issues like someone (bz@ perhaps?) suggested. I'm still trying to test this myself, but the other user was able to recreate my crash pretty much on demand. The fix was to not use the first NIC in the VM because it will always share an IRQ with mpt0. Once mpt0 is on its own the crash does not seem to be reproducible anymore. Before: $ vmstat -i interrupt total rate irq1: atkbd0 378 0 irq6: fdc0 9 0 irq15: ata1 34 0 irq16: em1 687237 1 irq18: em0 mpt0 319094024 539 cpu0: timer 236770821 400 Total 556552503 940 After: $ vmstat -i interrupt total rate irq1: atkbd0 38 0 irq6: fdc0 9 0 irq15: ata1 34 0 irq16: em1 2811 15 irq17: em2 5 0 cpu0: timer 71013 398 irq256: mpt0 12163 68 Total 86073 483 Is there any other way we can make mpt0 get its own dedicated IRQ without having to do this? The problem is that it causes us to have to make rc.conf changes, pf.conf changes, and who knows what other software could be on these machines that is trying to bind to a specific NIC... Thanks!