Date: Tue, 28 Sep 2010 13:33:48 +0300 From: borislav nikolov <vf1100c@gmail.com> To: Jurgen Weber <jurgen@ish.com.au> Cc: "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org> Subject: Re: cpu timer issues Message-ID: <FC918FA4-770F-4F93-B179-F76BFFBFBD50@gmail.com> In-Reply-To: <4CA19F27.6050903@ish.com.au> References: <4CA19F27.6050903@ish.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
On 28.09.2010, at 10:54, Jurgen Weber <jurgen@ish.com.au> wrote: > Hello List >=20 > We have been having issues with some firewall machines of ours using pfSen= se. >=20 > FreeBSD smash01.ish.com.au 7.2-RELEASE-p5 FreeBSD 7.2-RELEASE-p5 #0: Sun D= ec 6 23:20:31 EST 2009 sullrich@FreeBSD_7.2_pfSense_1.2.3_snaps.pfsense.org= :/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.7 i386 >=20 > MotherBoard: http://www.supermicro.com/products/motherboard/Xeon3000/3200/= X7SBi-LN4.cfm >=20 > Originally the systems started out by showing a lot of packet loss, the sy= stem time would fall behind, and the value of "#vmstat -i | grep timer" was d= ropping below 2000. I was lead to believe by the guys at pfSense that this i= s where the value should sit. I would also receive errors in messages that l= ooked like " kernel: calcru: runtime went backwards from 244314 usec to 2363= 41". >=20 > We tried a variety of things, disabling USB, turning off the Intel Speed S= tep in the BIOS, disabling ACPI, etc, etc. All having little to no effect. T= he only thing that would right it is restarting the box but over time it wou= ld degrade again. I talked to the SuperMicro and they said that this is a Fre= eBSD issue and pretty much washed their hands of it. >=20 > After a couple of months of dealing with this and just rebooting the syste= ms reguarly, the symptoms slowly but surely disappeared. eg. The kernel mess= ages went away, the system time was not falling behind and I was experiencin= g no packet loss but the "#vmstat -i | grep timer" value would continue to d= ecrease over time. Eventually I think, when it finally got the 0 the machine= restarted (I am only guessing here). >=20 > After this restart it worked again for a couple of hours and then it resta= rted again. >=20 > After the second time the system has not missed a beat, it has been fine a= nd the "#vmstat -i | grep timer" value remained near the 2000 mark... We set= up some zabbix monitoring to watch it. As mentioned it was fine for about a m= onth. Until today. Today the value has dropped to 0, but the system has not r= estarted and over the last couple of hours the value has increased to 47. >=20 > This machine is mission critical, we have two in a fail over scenario (usi= ng pfSense's CARP features) and it seems unfortunate that we have an issue w= ith two brand new SuperMicro boxes that affect both machines. While at the m= oment everything seems fine I want to ensure that I have no further issues. D= oes anyone have any suggestions? >=20 > Lastly I have double check both of the below: > http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#CAL= CRU-NEGATIVE-RUNTIME > We disabled EIST. >=20 > http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#COM= PUTER-CLOCK-SKEW >=20 > # dmesg | grep Timecounter > Timecounter "i8254" frequency 1193182 Hz quality 0 > Timecounters tick every 1.000 msec > # sysctl kern.timecounter.hardware > kern.timecounter.hardware: i8254 >=20 > Only have one timer to choose from. >=20 > Thanks >=20 > Jurgen >=20 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" Hello, vmsat -i calculates interrupt rate based on interrupt count/uptime, and the i= nterrupt count is 32 bit integer.=20 With high values of kern.hz it will overflow in few days (with kern.hz=3D400= 0 it will happen every 12 days or so). If that is the case, use systat -vmstat 1 to get accurate interrupt rate. That is just fyi, because i was confused once and it scared me abit, and i s= tarted changing counters untill i noticed this. p.s. please forgive my poor english=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FC918FA4-770F-4F93-B179-F76BFFBFBD50>