From owner-freebsd-hackers@FreeBSD.ORG Fri May 25 00:56:01 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D30A31065686; Fri, 25 May 2012 00:56:01 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 9B3758FC16; Fri, 25 May 2012 00:56:01 +0000 (UTC) Received: by pbbro2 with SMTP id ro2so1110551pbb.13 for ; Thu, 24 May 2012 17:56:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=I7HheE1ZGCtCHC7jctudiPdqfUjkw5uVEiuH2vvq9WE=; b=RC7DbixAUpHAmuGbksX6wLbr8Hwp7VSEAB2SHgnD6zNdIAxdLQweWHdPgeVzHElxyD wYggtMa6NPL8wCIN/gKERtwDuCZaMhT1reSuJ/46KTx6dBwt8SwQf5cZ6fl5fIi5oJbI 2r5G5osJPvMPP2S6MQ18ykYuedbz7taN0xJsaJ1Ojko9xlGApmdz1HuDm+my4QBNQIjx FDp25tMF+qwkPGjQZvex5HKgqaMKiBjGl2fvJNC6qdRWO23tK6x0RqFlTYzEPoKTWlBi /4ABPuUvh9SWyFOehwEZSmm+qkNwDZHb8rgKu6qGFb7pu5EeOblMwj05o7jRNjmMqbMt lzcg== MIME-Version: 1.0 Received: by 10.68.232.129 with SMTP id to1mr16852467pbc.27.1337907361063; Thu, 24 May 2012 17:56:01 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.142.203.2 with HTTP; Thu, 24 May 2012 17:56:00 -0700 (PDT) In-Reply-To: <62F1D149-FC1C-4E00-98FD-DF6C46A5DC55@ilovedene.com> References: <490F2075-3E4D-4F85-9935-937CED8FB10B@averesystems.com> <62F1D149-FC1C-4E00-98FD-DF6C46A5DC55@ilovedene.com> Date: Thu, 24 May 2012 17:56:00 -0700 X-Google-Sender-Auth: FTak9jtnuLEV_Oxpxg7_rw9Ncso Message-ID: From: Adrian Chadd To: dane foster Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org, Mark Felder , freebsd-questions@freebsd.org Subject: Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 May 2012 00:56:01 -0000 Hi, You guys now absolutely, positively have enough information for a PR. It's still not clear whether it's a device/interrupt layer issue in FreeBSD, or whether vmware is doing something wrong with how it implements shared interrupts, or a bit of both.. Adrian On 24 May 2012 13:54, dane foster wrote: > Hey all, > > On 25/05/2012, at 1:47 AM, Mark Felder wrote: > >> On Wed, 23 May 2012 17:30:40 -0500, Adrian Chadd wr= ote: >> >>> Hi, >>> >>> can you please, -please- file a PR? And place all of the above >>> information in it so we don't lose it? >>> >> >> I'd be glad to post a PR and assist in helping to get it permanently fix= ed. I certainly don't want this data to get lost and honestly our business = uses FreeBSD on VMWare so much that we really need a permanent fix as much = as anyone else :-) >> >> The reason I've hesitated to post a PR so far is that I didn't have any = truly useful or concrete evidence of where the problem lies. After Dane Fos= ter contacted me and told me he could recreate the crash on demand with his= workload it was easier to narrow things down. The suggestion that it was a= n interrupts issue (by possibly Bjoern Zeeb?) and Dane's discovery that his= crashes ceased when em0 and mpt0 share an IRQ, but em0 is completely unuse= d was starting to prove there is some strong evidence here in favor of the = interrupts issue. >> >> Dane, what's the status on your end? Has your fix still been successful?= Is it also stable if you simply set hint.mpt.0.msi_enable=3D"1" ? >> > > The situation I've got that's stable now is: > > hw.pci.enable_msi=3D"0" > hw.pci.enable_msix=3D"0" > > in /boot/loader.conf > > and: > > samael:~:% vmstat -i =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0[ 6:31PM] > interrupt =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0total =A0 = =A0 =A0 rate > irq1: atkbd0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 6 =A0 = =A0 =A0 =A0 =A00 > irq18: em0 mpt0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A03061100 =A0 =A0 =A0 = =A0 15 > irq19: em1 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 6891706 =A0 =A0 = =A0 =A0 35 > cpu0: timer =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0166383735 =A0 =A0 =A0 = =A0868 > cpu1: timer =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0166382123 =A0 =A0 =A0 = =A0868 > cpu3: timer =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0166382123 =A0 =A0 =A0 = =A0868 > cpu2: timer =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0166382121 =A0 =A0 =A0 = =A0868 > Total =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0675482914 =A0 = =A0 =A0 3525 > > Not using em0. This works for 8 (FreeBSD samael.slush.ca 8.3-STABLE FreeB= SD 8.3-STABLE #1: Mon May =A07 11:51:03 NZST 2012 =A0 =A0 root@samael.slush= .ca:/usr/obj/usr/src/sys/DENE =A0amd64). > > Neither of those settings on their own seem to stop it from happening. > > The 9 box I've tried this on still hangs almost every time i run handbrak= e, no matter whether MSI/MSIX is enabled, or I have separate IRQs for mpt0 = and em0/1 > > I can cause the hang mostly on demand, but not quite sure what informatio= n to provide from the hung system. If somebody can let me know what they ne= ed, including root access, I can make that happen. > > Cheers, > > Dane > > > >> >> Thanks! > > > >