Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Jun 2013 13:39:27 +0200
From:      Philipp Maechler <philipp.maechler@hostpoint.ch>
To:        freebsd-stable@freebsd.org
Subject:   HP Blade Gen 7 and 8 IO stall with ciss driver
Message-ID:  <51BEF56F.4020800@hostpoint.ch>

next in thread | raw e-mail | index | archive | help
hello

Since a while we're experiencing completely I/O stalls / stopps with
some HP Blade Hardware and FreeBDS 9.1. If it occurs, we have to power
cycle the server to recover.

The stalls occur only every few days to weeks, so it's difficult to
reproduce them; we also tried to do a lot of I/O (e.g. scrubbing
nonstop) but this didn't help to reproduce more often; taking load away
from the productive machines helped them to reduce the frequency stalls
further, but is not a solution.

We figured out, that it's probably not a zfs relevated problem, because
also any dd to the swap partitions stopp's.

At the moment we are investigating and try different settings for the
ciss driver like set to SIMPLE Mode instead of PERFORMANCE Mode and next
maybe we change the heartbeat-Settings, even we DON'T see any normally
mentioned error messages on the console or in the remote logs.

Our main question: does anybody have similar experiences? Maybe on hp
supported os?

With best regards,

Philipp Maechler

Some more information:
I/O Stall:
The only solution to resolve the state is to "powercycle" the server.
Any network traffic or processing stuff also like entering kernel
debugger is possible, but shows only full I/O Queues...

reproducing:
scrubbing nonstop helps, but does result in a stall every week instead
of every 4 weeks... so nothing like 'next hour'.

Hardware:
G7: hp blade BL465cG7: hp smart array p410i
G8: hp blade BL465cG8: hp smart array P220i
ciss0: <HP Smart Array P220i> port 0x4000-0x40ff mem
0xfdd00000-0xfddfffff,0xfdcf0000-0xfdcf03ff irq 44 at device 0.0 on pci3
ciss0: PERFORMANT Transport

On all systems, we are already on the latest bios and firmware releases
on controller's (G8: Raid Controller Firmware 3.54) .

FreeBSD:
9.1-RELEASE-p3 / we also tried kind of a backport of some patches to
ciss from head, didn't change anything.
* driver: ciss

We didn't try stuff like FB 8.3 because we'd like to have userland
dtrace and it's lot of work for transferring productive systems to that
- maybe we will have to in future. So the only question now: Is somebody
else experiencing similar issues? Maybe on hp support os?

-- 
Hostpoint AG         | The Data Residence |
St. Dionysstrasse 31 | Postfach           | CH-8640 Rapperswil-Jona
  Tel +41 844 800777 | Fax +41 844 090909 | http://www.hostpoint.ch



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51BEF56F.4020800>