From owner-freebsd-stable@FreeBSD.ORG  Tue Jan 22 07:28:36 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id E345653D;
 Tue, 22 Jan 2013 07:28:36 +0000 (UTC)
 (envelope-from danny@cs.huji.ac.il)
Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84])
 by mx1.freebsd.org (Postfix) with ESMTP id 6BBFA9A4;
 Tue, 22 Jan 2013 07:28:36 +0000 (UTC)
Received: from pampa.cs.huji.ac.il ([132.65.80.32])
 by kabab.cs.huji.ac.il with esmtp
 id 1TxYHa-0002yo-4Y; Tue, 22 Jan 2013 09:28:34 +0200
X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.3
To: Adrian Chadd <adrian@freebsd.org>
Subject: Re: time issues and ZFS
In-reply-to: <CAJ-Vmo=2Dmf4Lb-uoUQDrybyRSS=_bnV5KcNYGg5MnMxfhhu7w@mail.gmail.com>
References: <E1TxFcr-0006dx-MX@kabab.cs.huji.ac.il> 
 <1358780588.32417.414.camel@revolution.hippie.lan> 
 <E1TxJP2-000DS8-DJ@kabab.cs.huji.ac.il>
 <1358783667.32417.434.camel@revolution.hippie.lan>
 <CAJ-Vmo=2Dmf4Lb-uoUQDrybyRSS=_bnV5KcNYGg5MnMxfhhu7w@mail.gmail.com>
Comments: In-reply-to Adrian Chadd <adrian@freebsd.org>
 message dated "Mon, 21 Jan 2013 12:09:21 -0800."
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Tue, 22 Jan 2013 09:28:34 +0200
From: Daniel Braniss <danny@cs.huji.ac.il>
Message-ID: <E1TxYHa-0002yo-4Y@kabab.cs.huji.ac.il>
Cc: freebsd-stable@freebsd.org, Ian Lepore <ian@freebsd.org>,
 Ronald Klop <ronald-freebsd8@klop.yi.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Jan 2013 07:28:37 -0000

> I still firmly believe the ACPI event timer code is racy, and what we
> may be seeing here is the fallout from that.
> 
> It's very possible that we're missing interrupts here - the new
> eventtimer code that made it into 9.x puts the halt behind a critical
> section, with interrupts disabled. The only platforms that correctly
> implement enable-interrupts-and-halt atomically is the HLT (well, and
> the don't-sleep-at-all) idle loops on i386/amd64. The default method
> is to use the ACPI sleep method, which doesn't do atomic interrupt
> enable / halt.
> 
> I'm still seeing odd stuff on some of my ACPI-using netbooks when
> doing net80211/ath development and it all goes away whenever I fondle
> with the above settings.
> 
> So, play with kern.eventtimer.periodic, kern.eventtimer.idletick and
> machdep.idle (try setting machdep.idle to hlt, or something else
> listed in machdep.idle_available) - please report back what the
> results are.
> 
> 
> Adrian
>

Adrian,
you mention that ACPI is racy, which event timer are you talking about?

how is the quality chosen?

at the moment switching kern.eventtimer.timer to LAPIC seems to have done the
trick. I'll have to wait another 24hs to make sure.

In the meantime here is some info:
Intel(R) Xeon(R) CPU E5645: running with no problems
  LAPIC(600) HPET(450) HPET1(440) HPET2(440) HPET3(440) i8254(100) RTC(0)

Intel(R) Xeon(R) CPU X5550: this is the problematic, at least for the moment
  HPET(450) HPET1(440) HPET2(440) HPET3(440) LAPIC(400) i8254(100) RTC(0)

Dual-Core AMD Opteron(tm) Processor 2218: running with no problems
  LAPIC(400) RTC(0)

so if someone is running 9.1 on any of the following and can provide
the output of sysctl kern.eventtimer.choice would be nice:

Intel(R) Xeon(R) CPU E5410
Intel(R) Xeon(R) CPU E5507

btw, all the above are on server MBs.

thanks,
	danny


> On 21 January 2013 07:54, Ian Lepore <ian@freebsd.org> wrote:
> > On Mon, 2013-01-21 at 17:35 +0200, Daniel Braniss wrote:
> >> ...
> >> >
> >> > What's the output of sysctl kern.eventtimer?
> >>
> >> kern.eventtimer.periodic is 0
> >>
> >> >                                              Does the bad behavior
> >> > change if you set kern.eventimer.periodic=1?
> >> >
> >>
> >> setting kern.eventtimer.timer=LAPIC
> >> instead of the default HPET made the missing cpu timers to appear:
> >> # vmstat -i
> >> interrupt                          total       rate
> >> irq3: uart1                         1695          0
> >> irq4: uart0                            5          0
> >> irq19: ehci0                        3875          0
> >> irq20: hpet0 uhci3               5495755       1135
> >> irq21: uhci2 ehci1                    29          0
> >> irq23: atapci0                        48          0
> >> cpu0:timer                          7063          1
> >> irq256: bce0                      117073         24
> >> irq260: mfi0                       51083         10
> >> irq261: mfi1                        3088          0
> >> cpu1:timer                           484          0
> >> cpu14:timer                           36          0
> >> cpu6:timer                           486          0
> >> cpu8:timer                            38          0
> >> cpu5:timer                            38          0
> >> cpu15:timer                           38          0
> >> cpu7:timer                            32          0
> >> cpu12:timer                           38          0
> >> cpu3:timer                            40          0
> >> cpu9:timer                            36          0
> >> cpu10:timer                           34          0
> >> cpu11:timer                           37          0
> >> cpu2:timer                            33          0
> >> cpu13:timer                           40          0
> >> cpu4:timer                            36          0
> >> Total                            5681160       1173
> >>
> >> is this relevant?
> >
> > I'll have to let someone who knows modern x86 hardware better comment on
> > the relative merits of hpet vs. lapic timers.  If it was using hpet in
> > one-shot mode, and changing it to hpet in periodic mode makes the
> > problem go away, that might be a clue that there's something wrong in
> > the hpet eventtimer start or interrupt routines.
> >
> > I wonder if a single missed interrupt in one-shot mode would bring an
> > eventtimer to a halt like that?  And if so, then what is it about
> > manually asking for the date that kicks it into running again?
> >
> > -- Ian