Date: Wed, 20 Oct 2010 16:45:32 -0400 From: Sean Thomas Caron <scaron@umich.edu> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-stable@freebsd.org, scaron@umich.edu Subject: Re: Spurious reboot in 8.1-RELEASE when reading from ZFS pool with > 9 disks Message-ID: <20101020164532.205459df5vc283cw@web.mail.umich.edu> In-Reply-To: <20101020180807.GA58494@icarus.home.lan> References: <20101020112738.12467cvfvvh4zb0g@web.mail.umich.edu> <20101020180807.GA58494@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Jeremy, Thanks for the very helpful response! I added all debugging options that you specified to my kernel and rebuilt; then set the kernel parameters as you mention (I was being a bit lazy earlier when I called them sysctls; I always tuned them in loader.conf; just that you can view their values with sysctl). Rebooted the system with the new kernel and set up a 11-disk zraid2 pool again then started beating on it. At first it seemed to be a bit more resilient with this set of kernel parameters but eventually it too failed out. Again I just got a straight up reboot, no debugger, no output to the console flashed by as far as I can tell. I don't have a serial console hooked up right now but it's probably possible to do so through the ILOM or equivalent; I will have to look into that further. This is pretty wierd. I am thinking there might be some memory starting to go in this system; never seen failing memory in an ECC box cause reboots this consistently and only under such specific conditions but I suppose it isn't completely out of the question. I'll talk to my customer and see what they can do about the hardware; maybe they have some spares. I will also try 8.1-STABLE when I have a chance and see if that works better. But it's definitely helpful to know that folks have > 9 disk raidz pools up and running on FreeBSD 8.x with no trouble - that it "should work". And the list of tunables is very useful; nice to have something to work with that I can have a bit more confidence in outside of my own guessing :) I will report back to the list when I have more information. Thanks! -Sean Quoting Jeremy Chadwick <freebsd@jdc.parodius.com>: > There are users here using FreeBSD ZFS with *lots* of disks (I think > someone was using 32 disks at one point) reliably. Some of them post > here regularly (with other issues that don't consist of sporadic > reboots). > > The kernel options may not be sufficient. I'm used to using these: > > # Debugging options > options BREAK_TO_DEBUGGER # Sending a serial BREAK drops to DDB > options KDB # Enable kernel debugger support > options KDB_TRACE # Print stack trace > automatically on panic > options DDB # Support DDB > options GDB # Support remote GDB > > And in /etc/rc.conf, setting: > > ddb_enable="yes" > > Next: arc_max isn't "technically" a sysctl, meaning it can't be changed > in real-time, so I'm not sure how you managed to do that. Validation: > > sysctl: oid 'vfs.zfs.arc_max' is a read only tunable > sysctl: Tunable values are set in /boot/loader.conf > > Your system may be reporting something relating to kmem exhaustion but > is then auto-rebooting so fast that you can't see the message on VGA > console. Do you have serial console? > > Please try setting the following tunables in /boot/loader.conf and > reboot the machine, then see if the same problem persists. > > vm.kmem_size="16384M" > vfs.zfs.arc_max="14336M" > vfs.zfs.prefetch_disable="1" > vfs.zfs.zio.use_uma="0" > vfs.zfs.txg.timeout="5" > > I would also advocate you try 8.1-STABLE as there have been many changes > in ZFS since then (and I'm not just referring to the v15 import), > including how the ARC gets sized/adjusted. CURRENT is highly > bleeding-edge, so I would start or stick with STABLE. > > Finally, there's always the possibility that the PSU has some sort of > load problem with that many disks all being accessed at the same time. > I imagine the power draw of that system is quite high. I can't imagine > Sun shipping a box with a insufficient PSU, but then again power draw > changes depending on the RPM of the disks used and many other things. > > -- > | Jeremy Chadwick jdc@parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101020164532.205459df5vc283cw>