Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Mar 2012 13:22:46 -0500
From:      Mark Felder <feld@feld.me>
To:        freebsd-hackers@freebsd.org, freebsd-questions@freebsd.org
Cc:        alc@freebsd.org, Alan Cox <alan.l.cox@gmail.com>
Subject:   Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash
Message-ID:  <op.wbx2n80s34t2sn@tech304>
In-Reply-To: <CAJUyCcNn%2B8uDrWGJMUD8vmmJKLA0iJjy6bhDSZvGB82X6awAPw@mail.gmail.com>
References:  <201203291549.q2TFnUc7080406@aurora.sol.net> <201203291755.36651.hselasky@c2i.net> <op.wbxxb9cz34t2sn@tech304> <CAJUyCcNn%2B8uDrWGJMUD8vmmJKLA0iJjy6bhDSZvGB82X6awAPw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 29 Mar 2012 11:53:02 -0500, Alan Cox <alan.l.cox@gmail.com> wrote:

>
> Not so long ago, VMware implemented a clever scheme for reducing the
> overhead of virtualized interrupts that must be delivered by at least  
> some
> (if not all) of their emulated storage controllers:
>
> http://static.usenix.org/events/atc11/tech/techAbstracts.html#Ahmad
>
> Perhaps, there is a bad interaction between this scheme and FreeBSD's mpt
> driver.
>
> Alan

If we assume mpt is the culprit how can I go about diagnosing this more  
accurately? Is there something I should be looking for in vmstat -i? Too  
many interrupts? Not enough? Rate too high or too low? Or is this  
something that is much harder to track down because we're dealing with  
emulated hardware?

If any BSD devs are interested in access to our environment I think we  
could comply. I might even be able to get authorization to give you an  
account on the most crash-prone server which doesn't have any sensitive  
customer data on it. I think at this point we'd even be willing to pay  
someone to look at a server in this state just so we (and hopefully  
others) can benefit.... and hopefully we end up with a more reliable  
FreeBSD-on-VMWare for everyone.

I know Doug mentioned running newer OS versions and that is definitely  
tempting but because it's not 100% reproducible on demand it's hard to  
prove it fixes it without waiting 6 months. We're fighting internally here  
with "trust 9.0 fixes it" vs "jump back to 7.4 because we KNOW it doesn't  
happen there". Having someone look at this and say "oh, yes, that's a  
deficiency in mpt that appears to be fixed in the newer driver that was  
MFC'd to 8-STABLE and you'll find in 8.3-RELEASE and 9.0-RELEASE" would be  
more comforting.

Thanks to everyone for their time on this!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?op.wbx2n80s34t2sn>