From owner-freebsd-stable@FreeBSD.ORG Thu Jan 24 06:50:15 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 90056EAC; Thu, 24 Jan 2013 06:50:15 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84]) by mx1.freebsd.org (Postfix) with ESMTP id 435D6A07; Thu, 24 Jan 2013 06:50:14 +0000 (UTC) Received: from pampa.cs.huji.ac.il ([132.65.80.32]) by kabab.cs.huji.ac.il with esmtp id 1TyGdP-00031f-7E; Thu, 24 Jan 2013 08:50:03 +0200 X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.3 To: John Nielsen Subject: Re: time issues and ZFS In-reply-to: <575CDBE9-0FF3-4F93-A223-9F8FAF3FE936@jnielsen.net> References: <1358780588.32417.414.camel@revolution.hippie.lan> <1358783667.32417.434.camel@revolution.hippie.lan> <575CDBE9-0FF3-4F93-A223-9F8FAF3FE936@jnielsen.net> Comments: In-reply-to John Nielsen message dated "Wed, 23 Jan 2013 16:11:57 -0700." Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Date: Thu, 24 Jan 2013 08:50:02 +0200 From: Daniel Braniss Message-ID: Cc: Adrian Chadd , freebsd-stable@freebsd.org, Ronald Klop X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jan 2013 06:50:15 -0000 > On Jan 22, 2013, at 2:40 AM, Adrian Chadd wrote:= > > > On Jan 21, 2013, at 4:33 AM, Daniel Braniss = wrote: > >=20 > >> host: DELL PowerEdge R710, 16GB,=20 >=20 > I administer a Dell PowerEdge R710 and I've been seeing the exact same = =3Dthing. It's currently running FreeBSD 9.0-STABLE =230 r236355. It has = a =3DZFS pool which sees moderate load most of the time but can be very h= igh =3Dat times (when certain scripts run, etc.). I hadn't previously =3D= correlated the issue with ZFS load but that is very possible.> > I set a = cron job to restart ntpd when it dies (because the time =3Ddifference exc= eeds the sanity check). The cron job runs =22every 20 =3Dminutes=22, but = that varies greatly when the system stops counting. The =3Dtime offset fr= om ntpdate (which the script runs before restarting ntpd) =3Dvaries a lot= , but always in increments of 300 seconds. I've seen =3Deverything from 1= 200 to 23100. (Yes, that's 23 thousand seconds aka 6 =3Dhours 25 minutes = that the system wasn't keeping time for.) >=20 > Sysctl kern.timecounter.hardware defaults to HPET. I experimented with = =3Dsetting it to ACPI-fast but the issue persisted so I put it back. > kern.timecounter.choice: TSC-low(-100) ACPI-fast(900) HPET(950) i8254(0= ) =3Ddummy(-1000000)> > I first installed the box with an older 9.0-STABL= E and this issue was =3Dnot present. I have been tracking -STABLE on it (= albeit irregularly) so =3DI'm not sure when the issue came up. >=20 >=20 > Have you run tests with the machdep.idle value changed, and fiddling >=20 > kern.eventtimer.periodic / kern.eventtimer.idletick ? >=20 > I would love to resolve this and am able to do some experimenting. I've= =3D_usually_ been seeing the issue 2-3 times every 1-2 days, but I did j= ust =3Dmake some changes: > disabling ZFS compression and deduplication on all pools > updated to 9.1-STABLE from yesterday (r245821) >=20 > If the issue persists I will try changing some of the sysctls above and= =3Dfollow up with the result. If it goes away, I'll try to remember to = =3Dreport that too. >=20 > JN >=20 set kern.eventtimer.timer=3DLAPIC this solved it for me. danny