Date: Fri, 25 Jan 2013 10:37:17 +0100 From: Bas Smeelen <b.smeelen@ose.nl> To: freebsd-stable@freebsd.org Subject: Re: Spontaneous reboots on Intel i5 and FreeBSD 9.0 Message-ID: <5102524D.4010002@ose.nl> In-Reply-To: <CAJ-UWtT8pFn86OMpPG47ryKN%2B%2B=1KfaQX3JtbCLuu_kByvtMzA@mail.gmail.com> References: <CAJ-UWtSANRMsOqwW9rJ6Eebta6=AiHeNO6fhPO0mhYhZiMmn4A@mail.gmail.com> <op.wq3zxn038527sy@ronaldradial.versatec.local> <alpine.BSF.2.00.1301180758460.96418@wonkity.com> <1358527685.32417.237.camel@revolution.hippie.lan> <20130118173602.GA76438@neutralgood.org> <alpine.BSF.2.00.1301181313560.1604@wonkity.com> <CAJ-UWtRRfCKg9GBR_ppvtjvJGadiOXMXBFBpX7tAvLEXDoZHQg@mail.gmail.com> <20130119201914.84B761CB@server.theusgroup.com> <CAJ-UWtR%2Bymv_%2BxpLcw01r9r=ym6gMh%2BHt4KfTabWQXXcAv5Ydw@mail.gmail.com> <CAJ-UWtT8pFn86OMpPG47ryKN%2B%2B=1KfaQX3JtbCLuu_kByvtMzA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 01/25/2013 10:29 AM, Marin Atanasov Nikolov wrote: > Hello again :) > > Here's my update on these spontaneous reboots after less than a week since > I've updated to stable/9. > > First two days the system was running fine with no reboots happening, so I > though that this update actually fixed it, but I was wrong. Not really a solution but you can take a look at sysutils/zfs-stats > > The reboots are still happening and still no clear evidence of the root > cause. What I did so far: > > * Ran disks tests -- looking good > * Ran memtest -- looking good > * Replaced power cables > * Ran UPS tests -- looking good > * Checked for any bad capacitors -- none found > * Removed all ZFS snapshots > > There is also one more machine connected to the same UPS, so if it was a > UPS issue I'd expect that the other one reboots too, but that's not the > case. > > Now that I've excluded the hardware part of this problem I started looking > again into the software side, and this time in particular -- ZFS. > > I'm running FreeBSD 9.1-STABLE #1 r245686 on a Intel i5 with 8Gb of memory. > > A quick look at top(1) showed lots of memory usage by ARC and my available > free memory dropping fast. I've made a screenshot, which you can see on the > link below: > > * http://users.unix-heaven.org/~dnaeon/top-zfs-arc.jpg > > So I went to the FreeBSD Wiki and started reading the ZFS Tuning Guide [1], > but honestly at the end I was not sure which parameters I need to > increase/decrease and to what values. > > Here's some info about my current parameters. > > % sysctl vm.kmem_size_max > vm.kmem_size_max: 329853485875 > > % sysctl vm.kmem_size > vm.kmem_size: 8279539712 > > % sysctl vfs.zfs.arc_max > vfs.zfs.arc_max: 7205797888 > > % sysctl kern.maxvnodes > kern.maxvnodes: 206227 > > There's one script at the ZFSTuningGuide which calculates kernel memory > utilization, and for me these values are listed below: > > TEXT=22402749, 21.3649 MB > DATA=4896264192, 4669.44 MB > TOTAL=4918666941, 4690.81 MB > > While looking for ZFS tuning I've also stumbled upon this thread in the > FreeBSD Forums [2], where the OP describes a similar behaviour to what I am > already experiencing, so I'm quite worried now that the reason for these > crashes is ZFS. > > Before jumping into any change to the kernel parameters (vm.kmem_size, > vm.kmem_max_size, kern.maxvnodes, vfs.zfs.arc_max) I'd like to hear any > feedback from people that have already done such optimizations on their ZFS > systems. > > Could you please share what are the optimal values for these parameters on > a system with 8Gb of memory? Is there a way to calculate these values or is > it just a "test-and-see-which-fits-better" way of doing this? > > Thanks and regards, > Marin > > [1]: https://wiki.freebsd.org/ZFSTuningGuide > [2]: http://forums.freebsd.org/showthread.php?t=9143 > > > On Sun, Jan 20, 2013 at 3:44 PM, Marin Atanasov Nikolov <dnaeon@gmail.com>wrote: > >> >> >> On Sat, Jan 19, 2013 at 10:19 PM, John <john@theusgroup.com> wrote: >> >>>> At 03:00am I can see that periodic(8) runs, but I don't see what could >>> have >>>> taken so much of the free memory. I'm also running this system on ZFS and >>>> have daily rotating ZFS snapshots created - currently the number of ZFS >>>> snapshots are > 1000, and not sure if that could be causing this. Here's >>> a >>>> list of the periodic(8) daily scripts that run at 03:00am time. >>>> >>>> % ls -1 /etc/periodic/daily >>>> 800.scrub-zfs >>>> >>>> % ls -1 /usr/local/etc/periodic/daily >>>> 402.zfSnap >>>> 403.zfSnap_delete >>> On a couple of my zfs machines, I've found running a scrub along with >>> other >>> high file system users to be a problem. I therefore run scrub from cron >>> and >>> schedule it so it doesn't overlap with periodic. >>> >>> I also found on a machine with an i3 and 4G ram that overlapping scrubs >>> and >>> snapshot destroy would cause the machine to grind to the point of being >>> non-responsive. This was not a problem when the machine was new, but >>> became one >>> as the pool got larger (dedup is off and the pool is at 45% capacity). >>> >>> I use my own zfs management script and it prevents snapshot destroys from >>> overlapping scrubs, and with a lockfile it prevents a new destroy from >>> being >>> initiated when an old one is still running. >>> >>> zfSnap has its -S switch to prevent actions during a scrub which you >>> should >>> use if you haven't already. >>> >>> >> Hi John, >> >> Thanks for the hints. It was a long time since I've setup zfSnap and I've >> just checked the configuration and I am using the "-s -S" flags, so there >> should be no overlapping. >> >> Meanwhile I've updated to 9.1-RELEASE, but then I hit an issue when trying >> to reboot the system (which appears to be discussed a lot in a separate >> thread). >> >> Then I've updated to stable/9, so at the least the reboot issue is now >> solved. Since I've to stable/9 I'm monitoring the system's memory usage and >> so far it's been pretty stable, so I'll keep an eye of an update to >> stable/9 has actually fixed this strange issue. >> >> Thanks again, >> Marin >> >> >>> Since making these changes, a machine that would have to be rebooted >>> several >>> times a week has now been up 61 days. >>> >>> John Theus >>> TheUs Group >>> >> >> >> -- >> Marin Atanasov Nikolov >> >> dnaeon AT gmail DOT com >> http://www.unix-heaven.org/ >> > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5102524D.4010002>