Date: Mon, 1 Oct 2012 13:00:40 -0700 (PDT) From: guy.helmer@gmail.com To: fa.freebsd.hackers@googlegroups.com Cc: freebsd-hackers@freebsd.org, freebsd-questions@freebsd.org Subject: Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash Message-ID: <452f3689-b2ad-43e2-8835-8691b25f75c9@googlegroups.com> In-Reply-To: <fa.WpvXSexDuh60oPevUfqY%2BfAuWnE@ifi.uio.no> References: <fa.AteGcyczS0yepFNHJLTAcaouoeQ@ifi.uio.no> <fa.6wX0axVDJXcSbIQcQBtBTui7/9U@ifi.uio.no> <fa.NpTOEiPP0T0zl1kPGJMatOK9x%2B8@ifi.uio.no> <fa.h%2BLa4qtNP%2BefKwHpYLeSk4Y0Kcc@ifi.uio.no> <fa.WpvXSexDuh60oPevUfqY%2BfAuWnE@ifi.uio.no>
index | next in thread | previous in thread | raw e-mail
On Wednesday, June 6, 2012 8:36:04 PM UTC-5, Mark Felder wrote: > Hi guys I'm excitedly posting this from my phone. Good news for you guys, bad news for us -- we were building HA storage on vmware for a client and can now replicate the crash on demand. I'll be posting details when I get home to my PC tonight, but this hopefully is enough to replicate the crash for any curious followers: > > > > ESXi 5 > > 9 or 9-STABLE > > HAST > > 1 cpu is fine > > 1GB of ram > > UFS SUJ on HAST device > > No special loader.conf, sysctl, etc > > No need for VMWare tools > > Run Bonnie++ on the HAST device > > > > We can get the crash to happen on the first run of bonnie++ right now. I'll post the exact specs and precise command run in the PR. We found an old post from 2004 when we looked up the process state obtained from CTRL+T -- flswai -- which describes the symptoms nearly perfectly. > > > > http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2004-02/0250.html > > > > Hopefully this gets us closer to a fix... Is this a crash or a hang? Over the past couple of weeks, I've been working with a FreeBSD 9.1RC1 system under VMware ESXi 5.0 with a 64GB UFS root FS and 2TB ZFS filesystem mounted via a virtual LSI SAS interface. Sometimes during heavy I/O load (rsync from other servers) on the ZFS FS, this shows up in /var/log/messages: Sep 21 02:14:55 backups kernel: (da1:mpt0:0:1:0): WRITE(10). CDB: 2a 0 5 ee 60 16 0 1 0 0 Sep 21 02:14:55 backups kernel: (da1:mpt0:0:1:0): CAM status: SCSI Status Error Sep 21 02:14:55 backups kernel: (da1:mpt0:0:1:0): SCSI status: Busy Sep 21 02:14:55 backups kernel: (da1:mpt0:0:1:0): Retrying command Sep 21 02:18:44 backups kernel: (da1:mpt0:0:1:0): WRITE(10). CDB: 2a 0 3 ef 42 51 0 1 0 0 Sep 21 02:18:44 backups kernel: (da1:mpt0:0:1:0): CAM status: SCSI Status Error Sep 21 02:18:44 backups kernel: (da1:mpt0:0:1:0): SCSI status: Busy Sep 21 02:18:44 backups kernel: (da1:mpt0:0:1:0): Retrying command Sep 21 02:18:48 backups kernel: (da1:mpt0:0:1:0): WRITE(10). CDB: 2a 0 3 ef 64 51 0 1 0 0 Sep 21 02:18:48 backups kernel: (da1:mpt0:0:1:0): CAM status: SCSI Status Error Sep 21 02:18:48 backups kernel: (da1:mpt0:0:1:0): SCSI status: Busy Sep 21 02:18:48 backups kernel: (da1:mpt0:0:1:0): Retrying command Sep 21 02:18:49 backups kernel: (da1:mpt0:0:1:0): WRITE(10). CDB: 2a 0 3 ef 66 51 0 1 0 0 Sep 21 02:18:49 backups kernel: (da1:mpt0:0:1:0): CAM status: SCSI Status Error Sep 21 02:18:49 backups kernel: (da1:mpt0:0:1:0): SCSI status: Busy ... Sep 21 05:06:18 backups kernel: (da1:mpt0:0:1:0): WRITE(10). CDB: 2a 0 41 f3 94 99 0 1 0 0 Sep 21 05:06:18 backups kernel: (da1:mpt0:0:1:0): CAM status: SCSI Status Error Sep 21 05:06:18 backups kernel: (da1:mpt0:0:1:0): SCSI status: Busy Sep 21 05:06:18 backups kernel: (da1:mpt0:0:1:0): Retrying command These have been happening roughly every other day. mpt0 and em0 were sharing int 18, so today I put hint.mpt.0.msi_enable="1" into /boot/devices.hints and rebooted; now mpt0 is using int 256. I'll see if it helps. Guyhelp
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?452f3689-b2ad-43e2-8835-8691b25f75c9>
