Date: Mon, 26 Nov 2018 19:34:43 +0700 From: Eugene Grosbein <eugen@grosbein.net> To: =?UTF-8?Q?Gerrit_K=c3=bchn?= <gerrit.kuehn@aei.mpg.de>, freebsd-stable@freebsd.org Subject: Re: high cpu irq load and slow boot after update from 10.4 to 11.2 Message-ID: <007b9007-6abb-15cf-45df-45b3da814e5d@grosbein.net> In-Reply-To: <20181126094648.510fc7f7b773bfdac546d037@aei.mpg.de> References: <20181126094648.510fc7f7b773bfdac546d037@aei.mpg.de>
next in thread | previous in thread | raw e-mail | index | archive | help
26.11.2018 15:46, Gerrit Kühn wrote: > A couple of weeks ago, I updated an older storage server (2 CPUs, 4 cores > each, 48GB RAM, 36x4GB HDDs, 3 LSI-based mps controllers) from 10.4 to > 11.2. The first thing I noticed was that booting takes much longer now. The > system probes each HDD (there are 36 of them, attached to mps controllers) > very slowly multiple times (I can see the light of each disk blinking, > it takes seconds to go on to the next disk), the whole process takes > several minutes (was much faster before). > > A more nasty issue appears after a couple of weeks of operation (so far, > roughly between 15 and 30 days): > Suddenly there is a very high irq load on one of the CPU cores > (cpu<n>:timer), causing high system load and high cpu load (top easily > shows average load over 10, whereas it was always below 1 before). I cannot > find any process or device as a culprit. First I thought this problem can > only be made to go away by rebooting, but now I managed to get rid of it > (at least for some time, don't know if or when it will be back) while > checking out the latest source in background (I actually intended to fiddle > with some kernel settings, but suddenly the issue was gone after > persisting permanently over the weekend), causing. > > Looking around, I found a couple of vaguely similar reports (like > https://lists.freebsd.org/pipermail/freebsd-current/2017-January/064419.html), > but these all appear to be fixed by now. > I have a couple of other storage machines (mostly mps-based, but always > slightly different hardware) that show no such issue after updating to > 11.2. > > Any ideas? Maybe this box has some clocking problems incompatible with tickless kernel. Try get back to old periodic ticking with sysctl kern.eventtimer.periodic=1 instead of now default 0. Of, if you are curious, run ntpd if it is not already running, wait about an hour then look to its /var/db/ntpd.drift file to see if system clock is good or not. Perhaps, you can get better behaviour changing default value of kern.timecounter.hardware to another one from kern.timecounter.choice; same with kern.eventtimer.timer and kern.eventtimer.choice
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?007b9007-6abb-15cf-45df-45b3da814e5d>