Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 May 2012 01:29:13 -0400
From:      Charles Sprickman <spork@bway.net>
To:        freebsd-scsi@freebsd.org
Subject:   mfi and "copy out failed" messages
Message-ID:  <B1A5F7F8-E396-4906-A182-C98D068502DF@bway.net>

next in thread | raw e-mail | index | archive | help
I'm wondering if anyone has some interest in this issue, I recently =
think I tracked down a long-standing fs corruption and panic issue on a =
Dell 2970 that I was never able to solve:

http://lists.freebsd.org/pipermail/freebsd-fs/2010-July/008858.html =
(there are other threads, but that's the gist of the issue)

I'd read in various threads that the "mfiX: Copy out failed" was a =
harmless message.  But recently I started thinking that there had to be =
some relation between those messages and the panics.  The timing fits - =
I had megacli performing a status check on the controller in a periodic =
script that kicked off with the daily run.  Most of my panics were =
during or shortly after the daily run.  The "Copy out failed" messages =
always corresponded to megacli being run.

132 days ago I removed the daily megacli check and the box has not had a =
kernel panic since then.  Previous to this my longest uptime was not =
more than a few months.  While this is by no means 100% definitive, it =
sure seems like something is going on here.  My best guess is that =
megacli and/or the mfi driver are interacting in a bad way and that the =
"Copy out failed" message is indicating something did not hit the =
controller that should have.  My earlier assumption was that it was just =
some control message megacli was sending that didn't make it, but now =
I'm thinking it's some request to write actual data to the drive that's =
failing.

As a reminder, the card in question is:

mfi0: <Dell PERC 6> port 0xec00-0xecff mem =
0xe9f80000-0xe9fbffff,0xe9fc0000-0xe9ffffff irq 37 at device 0.0 on pci7
mfi0: 3049 (boot + 3s/0x0020/info) - Firmware version 1.22.02-0612
mfi0: 3051 (boot + 23s/0x0020/info) - Controller hardware revision ID =
(0x0)
mfi0: 3052 (boot + 23s/0x0020/info) - Package version 6.2.0-0013

If anyone with knowledge of the mfi driver would like to comment, I'd =
very much appreciate it.  This box is going to be repurposed in the =
coming months as an ESXi host to hold some backup/standby VMs, but =
before that I would not mind taking some time to test any patches, extra =
debug printfs in mfi, etc.  I suspect I can probably trigger the panic =
pretty easily by mimicking the daily run conditions - just kick off a =
find from "/" and then repeatedly loop the megacli command to check the =
array health. =20

The box is still on 7.3, but I'd gladly upgrade to 8.3 and test there if =
needed once the box is freed up.

Thanks,

Charles

--
Charles Sprickman
NetEng/SysAdmin
Bway.net - New York's Best Internet www.bway.net
spork@bway.net - 212.655.9344








Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B1A5F7F8-E396-4906-A182-C98D068502DF>