Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 5 Feb 2014 10:45:33 -0800
From:      aurfalien <aurfalien@gmail.com>
To:        Graham Allan <allan@physics.umn.edu>
Cc:        FreeBSD FS <freebsd-fs@freebsd.org>
Subject:   Re: practical maximum number of drives
Message-ID:  <94A20D8E-292D-47B4-8D82-61A131B3010D@gmail.com>
In-Reply-To: <52F24DEA.9090905@physics.umn.edu>
References:  <52F1BDA4.6090504@physics.umn.edu> <7D20F45E-24BC-4595-833E-4276B4CDC2E3@gmail.com> <52F24DEA.9090905@physics.umn.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Ah great info many thanks.

And pplz, ignore my reply to Daniel as I got the posts confused.  I =
recently switched to Sanka :)

- aurf

On Feb 5, 2014, at 6:42 AM, Graham Allan <allan@physics.umn.edu> wrote:

>=20
>=20
> On 2/4/2014 11:36 PM, aurfalien wrote:
>> Hi Graham,
>>=20
>> When you say behaved better with 1 HBA, what were the issues that
>> made you go that route?
>=20
> It worked fine in general with 3 HBAs for a while but OTOH 2 of the =
drive chassis were being very lightly used (and note I was being quite =
conservative and keeping each chassis as an independent zfs pool).
>=20
> Actual problems occurred once while I was away but our notes show we =
got some kind of repeated i/o deadlock. As well as all drive i/o =
stopping, we also couldn't use the sg_ses utilities to query the =
enclosures. This reoccurred several times after restarts throughout the =
day, and eventually "we" (again i wasn't here) removed the extra HBAs =
and daisy-chained all the chassis together. An inspired hunch, I guess. =
No issues since then.
>=20
> Coincidentally a few days later I saw a message on this list from Xin =
Li "Re: kern/177536: [zfs] zfs livelock (deadlock) with high =
write-to-disk load":
>=20
> One problem we found in field that is not easy to reproduce is that
> there is a lost interrupt issue in FreeBSD core.  This was fixed in
> r253184 (post-9.1-RELEASE and before 9.2, the fix will be part of the
> upcoming FreeBSD 9.2-RELEASE):
>=20
> =
http://svnweb.freebsd.org/base/stable/9/sys/kern/kern_intr.c?r1=3D249402&r=
2=3D253184&view=3Dpatch
>=20
> The symptom of this issue is that you basically see a lot of processes
> blocking on zio->zio_cv, while there is no disk activity.  However,
> the information you have provided can neither prove or deny my guess.
> I post the information here so people are aware of this issue if they
> search these terms.
>=20
> Something else suggested to me that multiple mps adapters would make =
this worse but I'm not quite sure what. This issue wouldn't exist after =
9.1 anyway.
>=20
>> Also, curious that you have that many drives on 1 PCI card, is it PCI
>> 3 etc=85 and is saturation an issue?
>=20
> Pretty sure it's PCIe 2.x but we haven't seen any saturation issues. =
That was of course the motivation for using separate HBAs in the initial =
design but it was more of a hypothetical concern than a real one - at =
least given our use pattern at present. This is more backing storage, =
the more intensive i/o usually goes to a hadoop filesystem.
>=20
> Graham




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?94A20D8E-292D-47B4-8D82-61A131B3010D>