From owner-freebsd-stable@FreeBSD.ORG  Thu Jan 17 11:32:23 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 16F7BC26
 for <freebsd-stable@freebsd.org>; Thu, 17 Jan 2013 11:32:23 +0000 (UTC)
 (envelope-from danny@cs.huji.ac.il)
Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84])
 by mx1.freebsd.org (Postfix) with ESMTP id 953FD69E
 for <freebsd-stable@freebsd.org>; Thu, 17 Jan 2013 11:32:22 +0000 (UTC)
Received: from pampa.cs.huji.ac.il ([132.65.80.32])
 by kabab.cs.huji.ac.il with esmtp
 id 1Tvnhf-00064g-8K; Thu, 17 Jan 2013 13:32:15 +0200
X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.3
To: "Ronald Klop" <ronald-freebsd8@klop.yi.org>
Subject: Re: time issues and some more
In-reply-to: <op.wq1vjkyv8527sy@ronaldradial.versatec.local>
References: <E1TvPZ7-000NC7-5C@kabab.cs.huji.ac.il> 
 <op.wq0mrtuy8527sy@212-182-167-131.ip.telfort.nl>
 <E1TvlIV-00013s-Rz@kabab.cs.huji.ac.il>
 <op.wq1vjkyv8527sy@ronaldradial.versatec.local>
Comments: In-reply-to "Ronald Klop" <ronald-freebsd8@klop.yi.org>
 message dated "Thu, 17 Jan 2013 11:03:10 +0100."
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 17 Jan 2013 13:32:15 +0200
From: Daniel Braniss <danny@cs.huji.ac.il>
Message-ID: <E1Tvnhf-00064g-8K@kabab.cs.huji.ac.il>
Cc: freebsd-stable@freebsd.org
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jan 2013 11:32:23 -0000

> On Thu, 17 Jan 2013 09:58:07 +0100, Daniel Braniss <danny@cs.huji.ac.il>  
> wrote:
> 
> >> On Wed, 16 Jan 2013 10:45:49 +0100, Daniel Braniss <danny@cs.huji.ac.il>
> >> wrote:
> >>
> >> > I resently upgraded a Dell PowerEdge R710, to 9.1-stable, we mainly  
> >> use
> >> > it as
> >> > a backup to several zfs servers (doing send|receive) without major
> >> > issues till
> >> > the upgrade, it was running 8.2-stable.
> >> >
> >> > now, we see that sometime the time drifts, and today I noticed that it
> >> > was
> >> > hung, and once I got unto the ipmi console this is what i got:
> >> > [SOL Session operational.  Use ~? for help]
> >> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 3864, size:  
> >> 12288
> >> >
> >> > and things started moving again,
> >> >
> >> > in /var/log/messages:
> >> > Jan 16 03:27:35 store-02 kernel: swap_pager: indefinite wait buffer:
> >> > bufobj:
> >> > 0, blkno: 3864, size: 12288
> >> >
> >> > but the REAL time is 7hs ahead!, so time stood still ?
> >> > and now, of course we get:
> >> > Jan 16 03:54:19 store-02 ntpd[38163]: time correction of 25216 seconds
> >> > exceeds
> >> > sanity limit (1000); set clock manually to the correct UTC time.
> >> >
> >> > I will now reboot, and try a newer kernel and check, but any insight  
> >> will
> >> > be very helpful,
> >> >
> >> > thanks,
> >> > 	danny
> >>
> >> Does BSD 9 choose another timer source than BSD 8?
> >> Use sysctl to check these values at your system.
> >> kern.eventtimer.choice: LAPIC(400) i8254(100) RTC(0)
> >> kern.eventtimer.timer: LAPIC
> >>
> >> Or this ones. I always confuse these.
> >> kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) i8254(0)
> >> dummy(-1000000)
> >> kern.timecounter.hardware: TSC-low
> >>
> >
> > under 8.3 it's kern.timecounte, so this is what I get:
> >
> >> sysctl kern.timecounter
> > kern.timecounter.tick: 1
> > kern.timecounter.choice: TSC(-100) HPET(900) ACPI-fast(1000) i8254(0)
> > dummy(-1000000)
> > kern.timecounter.hardware: ACPI-fast
> > kern.timecounter.stepwarnings: 0
> > kern.timecounter.tc.i8254.mask: 65535
> > kern.timecounter.tc.i8254.counter: 52515
> > kern.timecounter.tc.i8254.frequency: 1193182
> > kern.timecounter.tc.i8254.quality: 0
> > kern.timecounter.tc.ACPI-fast.mask: 16777215
> > kern.timecounter.tc.ACPI-fast.counter: 925448
> > kern.timecounter.tc.ACPI-fast.frequency: 3579545
> > kern.timecounter.tc.ACPI-fast.quality: 1000
> > kern.timecounter.tc.HPET.mask: 4294967295
> > kern.timecounter.tc.HPET.counter: 1472869277
> > kern.timecounter.tc.HPET.frequency: 14318180
> > kern.timecounter.tc.HPET.quality: 900
> > kern.timecounter.tc.TSC.mask: 4294967295
> > kern.timecounter.tc.TSC.counter: 4125922088
> > kern.timecounter.tc.TSC.frequency: 2329838875
> > kern.timecounter.tc.TSC.quality: -100
> > kern.timecounter.smp_tsc: 0
> > kern.timecounter.invariant_tsc: 1
> >
> > so I assume the choise is HPET, under 9.1:
> > kern.eventtimer.timer: HPET
> 
> Your servers uses:
> > kern.timecounter.hardware: ACPI-fast
> 
> Please check that value on 9.1 and 8.3.
> 
they both choose the same, ACPI-fast

> 
> >
> > so it seems to be the same.
> >
> > btw, this morning I see that it's behind more than 1 hour, and no signs  
> > of
> > ntpd!
> >
> > the logs show:
> > ...
> > Jan 17 00:40:52 store-02 kernel: usb_dev_suspend_peer: Setting device  
> > remote
> > wakeup failed
> > Jan 17 01:05:46 store-02 ntpd[1845]: time correction of 7854 seconds  
> > exceeds
> > sanity limit (1000); set clock manually to the correct UTC time.
> > ...
> >
> > it seems to me that the 7854 seconds is exactly the time diff:
> > date on this hosts says:
> > Thu Jan 17 08:46:18 IST 2013
> >
> >
> > addig the 7854 sec is the current(almost) real date:
> > Thu Jan 17 10:57:13 IST 2013
> >
> > something is very fishy here.
> 
> 
> Are you doing suspend/resume stuff on your machine? Or does  
> usb_dev_suspend_peer mean suspend in another way?
not that I know, but the prev. time it complained about something else:
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 3864, size:  12288

Since I have other such boxes -without the problem-, my bet is on mfdi/zfs

danny