Date: Mon, 17 Jun 2013 13:39:27 +0200 From: Philipp Maechler <philipp.maechler@hostpoint.ch> To: freebsd-stable@freebsd.org Subject: HP Blade Gen 7 and 8 IO stall with ciss driver Message-ID: <51BEF56F.4020800@hostpoint.ch>
next in thread | raw e-mail | index | archive | help
hello Since a while we're experiencing completely I/O stalls / stopps with some HP Blade Hardware and FreeBDS 9.1. If it occurs, we have to power cycle the server to recover. The stalls occur only every few days to weeks, so it's difficult to reproduce them; we also tried to do a lot of I/O (e.g. scrubbing nonstop) but this didn't help to reproduce more often; taking load away from the productive machines helped them to reduce the frequency stalls further, but is not a solution. We figured out, that it's probably not a zfs relevated problem, because also any dd to the swap partitions stopp's. At the moment we are investigating and try different settings for the ciss driver like set to SIMPLE Mode instead of PERFORMANCE Mode and next maybe we change the heartbeat-Settings, even we DON'T see any normally mentioned error messages on the console or in the remote logs. Our main question: does anybody have similar experiences? Maybe on hp supported os? With best regards, Philipp Maechler Some more information: I/O Stall: The only solution to resolve the state is to "powercycle" the server. Any network traffic or processing stuff also like entering kernel debugger is possible, but shows only full I/O Queues... reproducing: scrubbing nonstop helps, but does result in a stall every week instead of every 4 weeks... so nothing like 'next hour'. Hardware: G7: hp blade BL465cG7: hp smart array p410i G8: hp blade BL465cG8: hp smart array P220i ciss0: <HP Smart Array P220i> port 0x4000-0x40ff mem 0xfdd00000-0xfddfffff,0xfdcf0000-0xfdcf03ff irq 44 at device 0.0 on pci3 ciss0: PERFORMANT Transport On all systems, we are already on the latest bios and firmware releases on controller's (G8: Raid Controller Firmware 3.54) . FreeBSD: 9.1-RELEASE-p3 / we also tried kind of a backport of some patches to ciss from head, didn't change anything. * driver: ciss We didn't try stuff like FB 8.3 because we'd like to have userland dtrace and it's lot of work for transferring productive systems to that - maybe we will have to in future. So the only question now: Is somebody else experiencing similar issues? Maybe on hp support os? -- Hostpoint AG | The Data Residence | St. Dionysstrasse 31 | Postfach | CH-8640 Rapperswil-Jona Tel +41 844 800777 | Fax +41 844 090909 | http://www.hostpoint.ch
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51BEF56F.4020800>