Date: Thu, 24 May 2012 17:56:00 -0700 From: Adrian Chadd <adrian@freebsd.org> To: dane foster <dene@ilovedene.com> Cc: freebsd-hackers@freebsd.org, Mark Felder <feld@feld.me>, freebsd-questions@freebsd.org Subject: Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash Message-ID: <CAJ-VmokWTKdTOVF7fXnkERbgNurtGZB5OHV8QDA7TzXEAFVWQA@mail.gmail.com> In-Reply-To: <62F1D149-FC1C-4E00-98FD-DF6C46A5DC55@ilovedene.com> References: <op.wbwe9s0k34t2sn@tech304> <op.wen3bwws34t2sn@tech304> <490F2075-3E4D-4F85-9935-937CED8FB10B@averesystems.com> <op.wen42clw34t2sn@tech304> <CAJ-Vmoneopo8xNpThbewfE2tg6HrdH74DXurO38P_aVs=YS9%2BA@mail.gmail.com> <op.wete9wbq34t2sn@tech304> <62F1D149-FC1C-4E00-98FD-DF6C46A5DC55@ilovedene.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, You guys now absolutely, positively have enough information for a PR. It's still not clear whether it's a device/interrupt layer issue in FreeBSD, or whether vmware is doing something wrong with how it implements shared interrupts, or a bit of both.. Adrian On 24 May 2012 13:54, dane foster <dene@ilovedene.com> wrote: > Hey all, > > On 25/05/2012, at 1:47 AM, Mark Felder wrote: > >> On Wed, 23 May 2012 17:30:40 -0500, Adrian Chadd <adrian@freebsd.org> wr= ote: >> >>> Hi, >>> >>> can you please, -please- file a PR? And place all of the above >>> information in it so we don't lose it? >>> >> >> I'd be glad to post a PR and assist in helping to get it permanently fix= ed. I certainly don't want this data to get lost and honestly our business = uses FreeBSD on VMWare so much that we really need a permanent fix as much = as anyone else :-) >> >> The reason I've hesitated to post a PR so far is that I didn't have any = truly useful or concrete evidence of where the problem lies. After Dane Fos= ter contacted me and told me he could recreate the crash on demand with his= workload it was easier to narrow things down. The suggestion that it was a= n interrupts issue (by possibly Bjoern Zeeb?) and Dane's discovery that his= crashes ceased when em0 and mpt0 share an IRQ, but em0 is completely unuse= d was starting to prove there is some strong evidence here in favor of the = interrupts issue. >> >> Dane, what's the status on your end? Has your fix still been successful?= Is it also stable if you simply set hint.mpt.0.msi_enable=3D"1" ? >> > > The situation I've got that's stable now is: > > hw.pci.enable_msi=3D"0" > hw.pci.enable_msix=3D"0" > > in /boot/loader.conf > > and: > > samael:~:% vmstat -i =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0[ 6:31PM] > interrupt =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0total =A0 = =A0 =A0 rate > irq1: atkbd0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 6 =A0 = =A0 =A0 =A0 =A00 > irq18: em0 mpt0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A03061100 =A0 =A0 =A0 = =A0 15 > irq19: em1 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 6891706 =A0 =A0 = =A0 =A0 35 > cpu0: timer =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0166383735 =A0 =A0 =A0 = =A0868 > cpu1: timer =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0166382123 =A0 =A0 =A0 = =A0868 > cpu3: timer =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0166382123 =A0 =A0 =A0 = =A0868 > cpu2: timer =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0166382121 =A0 =A0 =A0 = =A0868 > Total =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0675482914 =A0 = =A0 =A0 3525 > > Not using em0. This works for 8 (FreeBSD samael.slush.ca 8.3-STABLE FreeB= SD 8.3-STABLE #1: Mon May =A07 11:51:03 NZST 2012 =A0 =A0 root@samael.slush= .ca:/usr/obj/usr/src/sys/DENE =A0amd64). > > Neither of those settings on their own seem to stop it from happening. > > The 9 box I've tried this on still hangs almost every time i run handbrak= e, no matter whether MSI/MSIX is enabled, or I have separate IRQs for mpt0 = and em0/1 > > I can cause the hang mostly on demand, but not quite sure what informatio= n to provide from the hung system. If somebody can let me know what they ne= ed, including root access, I can make that happen. > > Cheers, > > Dane > > > >> >> Thanks! > > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmokWTKdTOVF7fXnkERbgNurtGZB5OHV8QDA7TzXEAFVWQA>