From owner-freebsd-stable@FreeBSD.ORG  Wed Jan 23 23:42:03 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id EF744D5E;
 Wed, 23 Jan 2013 23:42:03 +0000 (UTC)
 (envelope-from lists@jnielsen.net)
Received: from ns1.jnielsen.net (secure.freebsdsolutions.net [69.55.234.48])
 by mx1.freebsd.org (Postfix) with ESMTP id 9CB9A8D4;
 Wed, 23 Jan 2013 23:42:03 +0000 (UTC)
Received: from [10.10.1.32] (office.betterlinux.com [199.58.199.60])
 (authenticated bits=0)
 by ns1.jnielsen.net (8.14.4/8.14.4) with ESMTP id r0NNBqhf085348
 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT);
 Wed, 23 Jan 2013 18:11:52 -0500 (EST)
 (envelope-from lists@jnielsen.net)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: time issues and ZFS
From: John Nielsen <lists@jnielsen.net>
In-Reply-To: <CAJ-VmomdQORjs55ooW55Rgg0i1M13PPtnmCPRrp__btEWQz=4g@mail.gmail.com>
Date: Wed, 23 Jan 2013 16:11:57 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <575CDBE9-0FF3-4F93-A223-9F8FAF3FE936@jnielsen.net>
References: <E1TxFcr-0006dx-MX@kabab.cs.huji.ac.il>
 <1358780588.32417.414.camel@revolution.hippie.lan>
 <E1TxJP2-000DS8-DJ@kabab.cs.huji.ac.il>
 <1358783667.32417.434.camel@revolution.hippie.lan>
 <CAJ-Vmo=2Dmf4Lb-uoUQDrybyRSS=_bnV5KcNYGg5MnMxfhhu7w@mail.gmail.com>
 <E1TxYHa-0002yo-4Y@kabab.cs.huji.ac.il>
 <CAJ-VmomdQORjs55ooW55Rgg0i1M13PPtnmCPRrp__btEWQz=4g@mail.gmail.com>
To: Adrian Chadd <adrian@freebsd.org>
X-Mailer: Apple Mail (2.1499)
X-DCC-sonic.net-Metrics: ns1.jnielsen.net 1117; Body=4 Fuz1=4 Fuz2=4
X-Virus-Scanned: clamav-milter 0.97.5 at ns1.jnielsen.net
X-Virus-Status: Clean
Cc: freebsd-stable@freebsd.org, Ronald Klop <ronald-freebsd8@klop.yi.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Jan 2013 23:42:04 -0000

On Jan 22, 2013, at 2:40 AM, Adrian Chadd <adrian@freebsd.org> wrote:

> On Jan 21, 2013, at 4:33 AM, Daniel Braniss <danny@cs.huji.ac.il> =
wrote:
>=20
>> host: DELL PowerEdge R710, 16GB,=20

I administer a Dell PowerEdge R710 and I've been seeing the exact same =
thing. It's currently running FreeBSD 9.0-STABLE #0 r236355. It has a =
ZFS pool which sees moderate load most of the time but can be very high =
at times (when certain scripts run, etc.). I hadn't previously =
correlated the issue with ZFS load but that is very possible.

I set a cron job to restart ntpd when it dies (because the time =
difference exceeds the sanity check). The cron job runs "every 20 =
minutes", but that varies greatly when the system stops counting. The =
time offset from ntpdate (which the script runs before restarting ntpd) =
varies a lot, but always in increments of 300 seconds. I've seen =
everything from 1200 to 23100. (Yes, that's 23 thousand seconds aka 6 =
hours 25 minutes that the system wasn't keeping time for.)

Sysctl kern.timecounter.hardware defaults to HPET. I experimented with =
setting it to ACPI-fast but the issue persisted so I put it back.
kern.timecounter.choice: TSC-low(-100) ACPI-fast(900) HPET(950) i8254(0) =
dummy(-1000000)

I first installed the box with an older 9.0-STABLE and this issue was =
not present. I have been tracking -STABLE on it (albeit irregularly) so =
I'm not sure when the issue came up.

> Have you run tests with the machdep.idle value changed, and fiddling
> kern.eventtimer.periodic / kern.eventtimer.idletick ?

I would love to resolve this and am able to do some experimenting. I've =
_usually_ been seeing the issue 2-3 times every 1-2 days, but I did just =
make some changes:
	disabling ZFS compression and deduplication on all pools
	updated to 9.1-STABLE from yesterday (r245821)

If the issue persists I will try changing some of the sysctls above and =
follow up with the result. If it goes away, I'll try to remember to =
report that too.

JN