Date: Wed, 5 Feb 2014 10:45:33 -0800 From: aurfalien <aurfalien@gmail.com> To: Graham Allan <allan@physics.umn.edu> Cc: FreeBSD FS <freebsd-fs@freebsd.org> Subject: Re: practical maximum number of drives Message-ID: <94A20D8E-292D-47B4-8D82-61A131B3010D@gmail.com> In-Reply-To: <52F24DEA.9090905@physics.umn.edu> References: <52F1BDA4.6090504@physics.umn.edu> <7D20F45E-24BC-4595-833E-4276B4CDC2E3@gmail.com> <52F24DEA.9090905@physics.umn.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Ah great info many thanks. And pplz, ignore my reply to Daniel as I got the posts confused. I = recently switched to Sanka :) - aurf On Feb 5, 2014, at 6:42 AM, Graham Allan <allan@physics.umn.edu> wrote: >=20 >=20 > On 2/4/2014 11:36 PM, aurfalien wrote: >> Hi Graham, >>=20 >> When you say behaved better with 1 HBA, what were the issues that >> made you go that route? >=20 > It worked fine in general with 3 HBAs for a while but OTOH 2 of the = drive chassis were being very lightly used (and note I was being quite = conservative and keeping each chassis as an independent zfs pool). >=20 > Actual problems occurred once while I was away but our notes show we = got some kind of repeated i/o deadlock. As well as all drive i/o = stopping, we also couldn't use the sg_ses utilities to query the = enclosures. This reoccurred several times after restarts throughout the = day, and eventually "we" (again i wasn't here) removed the extra HBAs = and daisy-chained all the chassis together. An inspired hunch, I guess. = No issues since then. >=20 > Coincidentally a few days later I saw a message on this list from Xin = Li "Re: kern/177536: [zfs] zfs livelock (deadlock) with high = write-to-disk load": >=20 > One problem we found in field that is not easy to reproduce is that > there is a lost interrupt issue in FreeBSD core. This was fixed in > r253184 (post-9.1-RELEASE and before 9.2, the fix will be part of the > upcoming FreeBSD 9.2-RELEASE): >=20 > = http://svnweb.freebsd.org/base/stable/9/sys/kern/kern_intr.c?r1=3D249402&r= 2=3D253184&view=3Dpatch >=20 > The symptom of this issue is that you basically see a lot of processes > blocking on zio->zio_cv, while there is no disk activity. However, > the information you have provided can neither prove or deny my guess. > I post the information here so people are aware of this issue if they > search these terms. >=20 > Something else suggested to me that multiple mps adapters would make = this worse but I'm not quite sure what. This issue wouldn't exist after = 9.1 anyway. >=20 >> Also, curious that you have that many drives on 1 PCI card, is it PCI >> 3 etc=85 and is saturation an issue? >=20 > Pretty sure it's PCIe 2.x but we haven't seen any saturation issues. = That was of course the motivation for using separate HBAs in the initial = design but it was more of a hypothetical concern than a real one - at = least given our use pattern at present. This is more backing storage, = the more intensive i/o usually goes to a hadoop filesystem. >=20 > Graham
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?94A20D8E-292D-47B4-8D82-61A131B3010D>