From owner-freebsd-fs@FreeBSD.ORG  Wed Feb  5 14:43:00 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 391242A3
 for <freebsd-fs@freebsd.org>; Wed,  5 Feb 2014 14:43:00 +0000 (UTC)
Received: from mail.physics.umn.edu (smtp.spa.umn.edu [128.101.220.4])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0975D1176
 for <freebsd-fs@freebsd.org>; Wed,  5 Feb 2014 14:42:59 +0000 (UTC)
Received: from c-66-41-25-68.hsd1.mn.comcast.net ([66.41.25.68]
 helo=[192.168.0.138])
 by mail.physics.umn.edu with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
 (Exim 4.77 (FreeBSD)) (envelope-from <allan@physics.umn.edu>)
 id 1WB3gm-0001Ug-GK; Wed, 05 Feb 2014 08:42:58 -0600
Message-ID: <52F24DEA.9090905@physics.umn.edu>
Date: Wed, 05 Feb 2014 08:42:50 -0600
From: Graham Allan <allan@physics.umn.edu>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: aurfalien <aurfalien@gmail.com>
References: <52F1BDA4.6090504@physics.umn.edu>
 <7D20F45E-24BC-4595-833E-4276B4CDC2E3@gmail.com>
In-Reply-To: <7D20F45E-24BC-4595-833E-4276B4CDC2E3@gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 mrmachenry.spa.umn.edu
X-Spam-Level: 
X-Spam-Status: No, score=-0.9 required=5.0 tests=ALL_TRUSTED,TW_ZF
 autolearn=unavailable version=3.3.2
Subject: Re: practical maximum number of drives
X-SA-Exim-Version: 4.2
Cc: FreeBSD FS <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Feb 2014 14:43:00 -0000


On 2/4/2014 11:36 PM, aurfalien wrote:
> Hi Graham,
>
> When you say behaved better with 1 HBA, what were the issues that
> made you go that route?

It worked fine in general with 3 HBAs for a while but OTOH 2 of the 
drive chassis were being very lightly used (and note I was being quite 
conservative and keeping each chassis as an independent zfs pool).

Actual problems occurred once while I was away but our notes show we got 
some kind of repeated i/o deadlock. As well as all drive i/o stopping, 
we also couldn't use the sg_ses utilities to query the enclosures. This 
reoccurred several times after restarts throughout the day, and 
eventually "we" (again i wasn't here) removed the extra HBAs and 
daisy-chained all the chassis together. An inspired hunch, I guess. No 
issues since then.

Coincidentally a few days later I saw a message on this list from Xin Li 
"Re: kern/177536: [zfs] zfs livelock (deadlock) with high write-to-disk 
load":

  One problem we found in field that is not easy to reproduce is that
  there is a lost interrupt issue in FreeBSD core.  This was fixed in
  r253184 (post-9.1-RELEASE and before 9.2, the fix will be part of the
  upcoming FreeBSD 9.2-RELEASE):

 
http://svnweb.freebsd.org/base/stable/9/sys/kern/kern_intr.c?r1=249402&r2=253184&view=patch

  The symptom of this issue is that you basically see a lot of processes
  blocking on zio->zio_cv, while there is no disk activity.  However,
  the information you have provided can neither prove or deny my guess.
  I post the information here so people are aware of this issue if they
  search these terms.

Something else suggested to me that multiple mps adapters would make 
this worse but I'm not quite sure what. This issue wouldn't exist after 
9.1 anyway.

> Also, curious that you have that many drives on 1 PCI card, is it PCI
> 3 etc… and is saturation an issue?

Pretty sure it's PCIe 2.x but we haven't seen any saturation issues. 
That was of course the motivation for using separate HBAs in the initial 
design but it was more of a hypothetical concern than a real one - at 
least given our use pattern at present. This is more backing storage, 
the more intensive i/o usually goes to a hadoop filesystem.

Graham