From owner-freebsd-scsi@FreeBSD.ORG  Sat Jan 14 05:16:18 2012
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5D36B106564A
	for <freebsd-scsi@FreeBSD.org>; Sat, 14 Jan 2012 05:16:18 +0000 (UTC)
	(envelope-from jwd@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 414528FC08
	for <freebsd-scsi@FreeBSD.org>; Sat, 14 Jan 2012 05:16:18 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q0E5GIQe076424
	for <freebsd-scsi@FreeBSD.org>; Sat, 14 Jan 2012 05:16:18 GMT
	(envelope-from jwd@freefall.freebsd.org)
Received: (from jwd@localhost)
	by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q0E5GIid076422
	for freebsd-scsi@FreeBSD.org; Sat, 14 Jan 2012 05:16:18 GMT
	(envelope-from jwd)
Date: Sat, 14 Jan 2012 05:16:18 +0000
From: John <jwd@FreeBSD.org>
To: freebsd-scsi@FreeBSD.org
Message-ID: <20120114051618.GA41288@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
Cc: 
Subject: mps driver chain_alloc_fail / performance ?
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 14 Jan 2012 05:16:18 -0000

Hi Folks,

   I've started poking through the source for this, but thought I'd
go ahead and post to ask other's their opinion.

   I have a system with 3 LSI SAS hba cards installed:

mps0: <LSI SAS2116> port 0x5000-0x50ff mem 0xf5ff0000-0xf5ff3fff,0xf5f80000-0xf5fbffff irq 30 at device 0.0 on pci13
mps0: Firmware: 05.00.13.00
mps0: IOCCapabilities: 285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay>
mps1: <LSI SAS2116> port 0x7000-0x70ff mem 0xfbef0000-0xfbef3fff,0xfbe80000-0xfbebffff irq 48 at device 0.0 on pci33
mps1: Firmware: 07.00.00.00
mps1: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps2: <LSI SAS2116> port 0x6000-0x60ff mem 0xfbcf0000-0xfbcf3fff,0xfbc80000-0xfbcbffff irq 56 at device 0.0 on pci27
mps2: Firmware: 07.00.00.00
mps2: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>

   Basically, one for internal and two for external drives, for a total
of about 200 drives, ie:

# camcontrol inquiry da10
pass21: <HP EG0600FBLSH HPD2> Fixed Direct Access SCSI-5 device 
pass21: Serial Number 6XR14KYV0000B148LDKM
pass21: 600.000MB/s transfers, Command Queueing Enabled

   When running the system under load, I see the following reported:

hw.mps.0.allow_multiple_tm_cmds: 0
hw.mps.0.io_cmds_active: 0
hw.mps.0.io_cmds_highwater: 772
hw.mps.0.chain_free: 2048
hw.mps.0.chain_free_lowwater: 1832
hw.mps.0.chain_alloc_fail: 0         <--- Ok

hw.mps.1.allow_multiple_tm_cmds: 0
hw.mps.1.io_cmds_active: 0
hw.mps.1.io_cmds_highwater: 1019
hw.mps.1.chain_free: 2048
hw.mps.1.chain_free_lowwater: 0
hw.mps.1.chain_alloc_fail: 14369     <---- ??

hw.mps.2.allow_multiple_tm_cmds: 0
hw.mps.2.io_cmds_active: 0
hw.mps.2.io_cmds_highwater: 1019
hw.mps.2.chain_free: 2048
hw.mps.2.chain_free_lowwater: 0
hw.mps.2.chain_alloc_fail: 13307     <---- ??

   So finally my question (sorry, I'm long winded): What is the
correct way to increase the number of elements in sc->chain_list
so mps_alloc_chain() won't run out?

static __inline struct mps_chain *
mps_alloc_chain(struct mps_softc *sc)
{
        struct mps_chain *chain;
        
        if ((chain = TAILQ_FIRST(&sc->chain_list)) != NULL) {  
                TAILQ_REMOVE(&sc->chain_list, chain, chain_link);
                sc->chain_free--;
                if (sc->chain_free < sc->chain_free_lowwater)
                        sc->chain_free_lowwater = sc->chain_free;
        } else
                sc->chain_alloc_fail++;
        return (chain);
}

   A few layers up, it seems like it would be nice if the buffer
exhaustion was reported outside of debug being enabled... at least
maybe the first time.

   It looks like changing the related #define is the only way.

   Does anyone have any experience with tuning this driver for high
throughput/large disk arrays? The shelves are all dual pathed, and with
the new gmultipath active/active support, I've still only been able to
achieve about 500MBytes per second across the controllers/drives.

   I appreciate any thoughts.

Thanks,
John

ps: I currently have a ccd on top of these drives which seems to
perform more consistenty then zfs. But that's an email for a different
day :-)