From owner-freebsd-fs@FreeBSD.ORG  Wed Jan 25 11:33:51 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1A2D91065679;
	Wed, 25 Jan 2012 11:33:51 +0000 (UTC)
	(envelope-from borjam@sarenet.es)
Received: from proxypop03.sare.net (proxypop03.sare.net [194.30.0.207])
	by mx1.freebsd.org (Postfix) with ESMTP id A795A8FC0C;
	Wed, 25 Jan 2012 11:33:50 +0000 (UTC)
Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11])
	by proxypop03.sare.net (Postfix) with ESMTPSA id 3ABAD9DC639;
	Wed, 25 Jan 2012 12:13:54 +0100 (CET)
From: Borja Marcos <borjam@sarenet.es>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Date: Wed, 25 Jan 2012 12:13:53 +0100
Message-Id: <88146602-824A-47DD-B1EC-1F62BCF54389@sarenet.es>
To: freebsd-scsi@freebsd.org,
 freebsd-fs@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1084)
X-Mailer: Apple Mail (2.1084)
Cc: 
Subject: To JBOD or just to pass, that is the  question
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jan 2012 11:33:51 -0000

	=09
Hi

Sorry for the cross-posting, but this pertains both filesystems and host =
adapters.

Since ZFS  was added to FreeBSD it has collided with the standard =
practice of server manufacturers: including "RAID" cards.

ZFS is designed to control the disks directly, and RAID cards tend to =
get in the middle, complicating things. Some cards do provide access to =
disks, working as plain host adapters without more added features, as =
they should be. But so far I have found two problematic cases: mfi and =
aac cards.=20

Of course, the standard suggestion is to create a logical volume for =
each disk, so that you have the rough equivalent of a hard disk attached =
to a host adapter. However, it has its drawbacks:

- Added layer of complexity. At least with mfi cards, replacing a broken =
disk involves a bit of device dependant voodoo incantations. It should =
be a matter of physically replacing the disk and maybe do a camcontrol =
rescan, nothing else.=20

- Are such volume-per-disk transportable from one controller to another? =
What happens if I need to install them on a different machine with a =
different host adapter? ZFS provides that interoperability, but the RAID =
cards can be a problem.

- More complexity: What's, for instance, the caching behavior of the =
RAID card? ZFS decides when to flush, not to flush, etc. Battery backed =
RAID cards show (as far as I know) a configuration dependent caching, =
maybe ignoring commands received from the OS storage subsystem? At least =
there's no detailed documentation as far as I know. So I tend to dislike =
that "firmware in the middle".

Long ago I asked for help on freebsd-scsi and Scott Long sent a simple =
patch to make hard disks shown as pass-through devices available to the =
"da" driver, hence becoming real hard disks. It's just a matter of =
deleting all the logical volumes before using the disks. I've been =
running this on a machine with MFI since 2007 and so far so good. The =
machine is now on 8.1 and I hope to update to 9 soon.

The freebsd-scsi thread:             =
http://lists.freebsd.org/pipermail/freebsd-scsi/2007-October/003224.html

The behavior with my torture tests was good. One of the things I use to =
do when testing a configuration is to remove a disk suddenly with the =
system working. That was a pain in the ass with the mfi thingy, really =
straightforward with the disks accessed in pass through mode.

Now I am installing a Sun X4240 server, and, surprisingly, I've stumbled =
upon a similar problem. Now it's an "aac"  card:

aac0: <SG-XPCIESAS-R-IN> mem 0xdfe00000-0xdfffffff irq 17 at device 0.0 =
on pci4
aac0: Enabling 64-bit address support
aac0: Enable Raw I/O
aac0: Enable 64-bit array
aac0: New comm. interface enabled
aac0: Sun STK RAID INT, aac driver 2.1.9-1
aacp0: <SCSI Passthrough Bus> on aac0
aacp1: <SCSI Passthrough Bus> on aac0
aacp2: <SCSI Passthrough Bus> on aac0

This is a disk on /var/run/dmesg.boot,

da0: <SEAGATE ST914603SSUN146G 0868> Fixed Direct Access SCSI-5 device=20=

da0: 0KB/s transfers
da0: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C)


and this is what I see from camcontrol:

# camcontrol devlist
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 8 lun 0 (da0,pass0)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 9 lun 0 (da1,pass1)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 10 lun 0 (da2,pass2)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 11 lun 0 (da3,pass3)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 12 lun 0 (da4,pass4)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 13 lun 0 (da5,pass5)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 14 lun 0 (da6,pass6)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 15 lun 0 (da7,pass7)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 16 lun 0 (da8,pass8)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 17 lun 0 (da9,pass9)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 18 lun 0 =
(da10,pass10)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 19 lun 0 =
(da11,pass11)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 20 lun 0 =
(da12,pass12)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 21 lun 0 =
(da13,pass13)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 22 lun 0 =
(da14,pass14)
<SEAGATE ST914603SSUN146G 0868>    at scbus6 target 23 lun 0 =
(da15,pass15)
<LSILOGIC SASX28 A.0 5021>         at scbus8 target 0 lun 0 =
(ses0,pass16)
<ADAPTEC Virtual SGPIO  0 0001>    at scbus8 target 1 lun 0 =
(ses1,pass17)
<ADAPTEC Virtual SGPIO  1 0001>    at scbus8 target 2 lun 0 =
(ses2,pass18)
<TSSTcorp CD/DVDW TS-T632A SR03>   at scbus15 target 0 lun 0 =
(cd0,pass19)
<SanDisk Cruzer Blade 1.20>        at scbus16 target 0 lun 0 =
(da16,pass20)


camcontrol inq 6:8:0
pass0: <SEAGATE ST914603SSUN146G 0868> Fixed Direct Access SCSI-5 device
pass0: Serial Number 000946821D70 3SD21D70
pass0: 3.300MB/s transfers=20

The transfer speed seems to be silly, but Bonnie++ on a 16 disk raidz2 =
gave 200+MBps block writing, 700+MBps block reading, so it seems to be =
working.

So far there's just one side effect of accessing the disks in pass =
through mode: I cannot reboot the machine, seems to hang after flushing =
the buffers. It happens both with the mfi and the aac drivers.


Just wondering, should we have, maybe, a tunable allowing aac and mfi to =
bypass the RAID firmware thingy? Is there any kind of exhaustive test we =
could perform to make sure that the card isn't doing weird things.

I've noticed, in the case of the aac machine I'm testing, that =
camcontrol tags shows just one "device opening". I'm wondering if it =
would be safe to increase them? Right now the machine isn't in =
production, so I can perform some tests.


Best regards,


Borja.