Date: Thu, 29 Mar 2012 13:22:46 -0500 From: Mark Felder <feld@feld.me> To: freebsd-hackers@freebsd.org, freebsd-questions@freebsd.org Cc: alc@freebsd.org, Alan Cox <alan.l.cox@gmail.com> Subject: Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash Message-ID: <op.wbx2n80s34t2sn@tech304> In-Reply-To: <CAJUyCcNn%2B8uDrWGJMUD8vmmJKLA0iJjy6bhDSZvGB82X6awAPw@mail.gmail.com> References: <201203291549.q2TFnUc7080406@aurora.sol.net> <201203291755.36651.hselasky@c2i.net> <op.wbxxb9cz34t2sn@tech304> <CAJUyCcNn%2B8uDrWGJMUD8vmmJKLA0iJjy6bhDSZvGB82X6awAPw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 29 Mar 2012 11:53:02 -0500, Alan Cox <alan.l.cox@gmail.com> wrote: > > Not so long ago, VMware implemented a clever scheme for reducing the > overhead of virtualized interrupts that must be delivered by at least > some > (if not all) of their emulated storage controllers: > > http://static.usenix.org/events/atc11/tech/techAbstracts.html#Ahmad > > Perhaps, there is a bad interaction between this scheme and FreeBSD's mpt > driver. > > Alan If we assume mpt is the culprit how can I go about diagnosing this more accurately? Is there something I should be looking for in vmstat -i? Too many interrupts? Not enough? Rate too high or too low? Or is this something that is much harder to track down because we're dealing with emulated hardware? If any BSD devs are interested in access to our environment I think we could comply. I might even be able to get authorization to give you an account on the most crash-prone server which doesn't have any sensitive customer data on it. I think at this point we'd even be willing to pay someone to look at a server in this state just so we (and hopefully others) can benefit.... and hopefully we end up with a more reliable FreeBSD-on-VMWare for everyone. I know Doug mentioned running newer OS versions and that is definitely tempting but because it's not 100% reproducible on demand it's hard to prove it fixes it without waiting 6 months. We're fighting internally here with "trust 9.0 fixes it" vs "jump back to 7.4 because we KNOW it doesn't happen there". Having someone look at this and say "oh, yes, that's a deficiency in mpt that appears to be fixed in the newer driver that was MFC'd to 8-STABLE and you'll find in 8.3-RELEASE and 9.0-RELEASE" would be more comforting. Thanks to everyone for their time on this!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?op.wbx2n80s34t2sn>