Date: Thu, 15 Dec 2011 08:12:07 -0800 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Dan Pritts <danno@internet2.edu> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS hangs with 8.2-release Message-ID: <20111215161207.GA26990@icarus.home.lan> In-Reply-To: <4EEA155C.5050305@internet2.edu> References: <4EE118C7.8030803@internet2.edu> <CAOjFWZ4kZfepsBdb0O9s3sivj2%2BoSkXhX1P_uyrbJW--Cp0CxQ@mail.gmail.com> <4EE12632.4070309@internet2.edu> <4EE21936.6020502@egr.msu.edu> <4EEA155C.5050305@internet2.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Dec 15, 2011 at 10:42:20AM -0500, Dan Pritts wrote: > Hi all, as a followup to my notes from last week. > > Short answer, I have followed most or all of the list's suggestions > and I still get crashes when scrubbing. In fact, It is now reliably > crashing after <10 minutes. > > Does anyone have any other suggestions? Are the ZFS devs here, and > would crash dumps be useful? > > > Below are my responses to specific things that folks suggested. > > > >do a memory test > my colleague reminded me that we have run a test in the last month > or two, since we started troubleshooting this. 24 hours with > memtest86+ with no errors reported. FWIW this system was stable > running solaris for several years. > >Recommendations to upgrade to 8.2-STABLE and then polite > >explanations after i did it wrong > We've upgraded to 8.2-STABLE and applied the 1-line patch suggested > by Adam McDougall. > > >FreeBSD netflow3.internet2.edu 8.2-STABLE FreeBSD 8.2-STABLE #1: > >Mon Dec 12 15:45:06 UTC 2011 > >root@netflow3.internet2.edu:/usr/obj/usr/src/sys/GENERIC amd64 > > And many recommendations from Adam McDougall that resulted in the > following /boot/loader.conf. I also tried removing all of the zfs > and vm lines, same problems. > > I think that something in here is causing the lockups - with the > empty loader.conf it reboots instead of locking. > >verbose_loading="YES" > >rootdev="disk16s1a" > > > >#I have 16G of Ram > > > >vfs.zfs.prefetch_disable=1 > >vfs.zfs.txg.timeout="5" > >vfs.zfs.arc_min="512M" > >vfs.zfs.arc_max="4G" > >vm.kmem_size="32G" These settings are incorrect by my standards. You're running 8.2-RELEASE though I would strongly recommend you go with 8.2-STABLE and stay with that instead. Regardless of which you run, these are the settings you should be using in /boot/loader.conf: vfs.zfs.arc_max="4G" You could increase this value to 8G if you wanted, or maybe even 12G (and possibly larger, but I would not recommend above 14G). There is "an art" to tuning this variable, as memory fragmentation and other things I'd rather not get into can cause the ARC size to exceed that variable sometimes (this is even further addressed in 8.2-STABLE -- consider it another reason to run that instead of -RELEASE). So, you need to "give it some headroom". The extra "art" involved is that you want to make sure you don't give too much memory to the ARC; e.g. if you have a big/fat mysqld running on that system, you should probably diminish the ARC size so that you have a "good balance" between what MySQL can/will use (based on mysql tunings and some other loader.conf tunings) and what's available for ARC/kernel. Start small (4GB with a 16GB RAM system is fine), see how things are, then increase it. With a 16GB system I would go with 4GB -> 8GB -> 10GB, with about a week in between each test. DO NOT pick a value like 16GB or 15GB; again, it's better to be safe than sorry, else you'll experience a kernel panic. :-) Further comments: 1. vfs.zfs.txg.timeout defaults to 5 now. 2. There is no point in messing with vfs.zfs.arc_min. It seems to be calculated on its own, and reliably so. 3. vm.kmem_size should not be adjusted/touched at all. Messing about with this *could* cause a reboot or possibly stability problems (but the latter would show up differently, not just a reboot). Next let's talk about vfs.zfs.prefetch_disable="1". For a very long time (years?), I strongly advocated this setting (which disables prefetching). The performance on my home storage system, as well as our production systems in our co-lo, suffered when prefetching was enabled; I/O throughput was generally blah. You can find old posts from me on the mailing list, and many posts from me on the web talking about this and advocating its setting. However, we have removed it entirely and leave prefetching enabled. We haven't noticed any particular massive performance loss as such, so it's very likely something was changed/improved in this regard. Maybe ZFSv28 is what did it, I really don't know (meaning I am not sure which commit may have addressed it). Prefetching being enabled or disabled has absolutely no bearing on stability, other than your drives may get taxed a tiny bit more (more data read into the ARC in advance). However, if you have the time (after you get this lock-up problem solved), you can play with the setting if you want. Find what works best for you with your workload. Be aware you should change the setting and then let it sit for about a week or so if possible, to get a full feel for the difference. Next let's talk about dedupe as well as compression. I recommend not enabling either one of these unless you absolutely want them/need them **and** are willing to suffer from sporadic "stalls" on the system during ZFS I/O. The stalling is double-worse if you enable both. "Stalls" means while ZFS is writing (dedupe only) or writing/reading (compression), things like typing via SSH, or on the console, or doing ANYTHING on the system just "stops" and catches up when ZFS does its stuff. This is a known problem and has to do with lack of "prioritisation queue" code for dedupe/compression on ZFS for FreeBSD. Solaris does not have this problem (it was solved by implementing said prio queue thing). I can refer you to the exact post from Bob Friesenhahn on this subject if you wish to read it. There is no ETA on getting this fixed in FreeBSD (meaning I have seen no one discuss fixing it or anything of that sort). Both of these features will tax your system CPU and memory more than if you didn't use them. If you do wish to use compression, I recommend using the lzjb algorithm instead of the default (gzip) as it diminishes the stalling by quite a bit -- but it's still easily noticeable. Finally, let's talk about your system problem: Can you take ZFS out of the picture on this system? If so, please do. That would be a great way to start. But I will give you my opinion: I strongly doubt ZFS is responsible for your problem. ZFS is probably "tickling" another problem. I'm inclined to believe your problem is hardware-related, or (and this continues to get my vote because of the continual non-stop problems I keep reading with these damn controllers) firmware or driver related pertaining to your mpt(4) cards. I will recall what Dan said initially -- you have to read very close to understand the implications. Quote: > internal LSI mpt-driver hardware raid for boot. > 3x LSI parallel-scsi cards for primary storage. 48 SATA disks > attached. Using Infortrend RAIDs as JBODs. So you effectively have 4 LSI cards in this system. Would you like me to spend a few hours digging through mailing lists and PRs listing off all the problems people continually report with either mpt(4), mps(4) or mfi(4) on FreeBSD, ESPECIALLY when ZFS is in use? Heck, there were even commits done not too long ago to one of the drivers "to help relieve problems when heavy I/O happens under ZFS". Then there's the whole debacle with card firmware versions (and you've FOUR cards! :-) ). Some people report problems with some firmware versions, while others work great. Then there's the whole provided-by-FreeBSD vs. provided-by-LSI driver ordeal. I don't even want to get into this nonsense -- seriously, it's all on the mailing lists, and it keeps coming up. It would take me, as I said, hours to put it all together and give you *LOTS* of references. Finally, there is ALWAYS the possibility of bad hardware. I don't mean RAM -- I'm talking weird motherboard problems that are exacerbated when doing lots of PCIe I/O, or drawing too much power -- neither of these would be stress-tested by memtest86, obviously. The number of possibilities is practically endless, I'm sorry to say. Hardware troubleshooting 101 says replace things piece-by-piece until you figure it out. :-( Otherwise, I'd consider just running OpenIndiana on this system, assuming their LSI card driver support is good. Finally: http://people.internet2.edu/~danno/zfs/ returns HTTP 403 Forbidden, so I have no idea what your photos/screen shots contained, if anything. :-( -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111215161207.GA26990>
