From owner-freebsd-stable@FreeBSD.ORG Tue Jan 22 07:28:36 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E345653D; Tue, 22 Jan 2013 07:28:36 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84]) by mx1.freebsd.org (Postfix) with ESMTP id 6BBFA9A4; Tue, 22 Jan 2013 07:28:36 +0000 (UTC) Received: from pampa.cs.huji.ac.il ([132.65.80.32]) by kabab.cs.huji.ac.il with esmtp id 1TxYHa-0002yo-4Y; Tue, 22 Jan 2013 09:28:34 +0200 X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.3 To: Adrian Chadd Subject: Re: time issues and ZFS In-reply-to: References: <1358780588.32417.414.camel@revolution.hippie.lan> <1358783667.32417.434.camel@revolution.hippie.lan> Comments: In-reply-to Adrian Chadd message dated "Mon, 21 Jan 2013 12:09:21 -0800." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 22 Jan 2013 09:28:34 +0200 From: Daniel Braniss Message-ID: Cc: freebsd-stable@freebsd.org, Ian Lepore , Ronald Klop X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Jan 2013 07:28:37 -0000 > I still firmly believe the ACPI event timer code is racy, and what we > may be seeing here is the fallout from that. > > It's very possible that we're missing interrupts here - the new > eventtimer code that made it into 9.x puts the halt behind a critical > section, with interrupts disabled. The only platforms that correctly > implement enable-interrupts-and-halt atomically is the HLT (well, and > the don't-sleep-at-all) idle loops on i386/amd64. The default method > is to use the ACPI sleep method, which doesn't do atomic interrupt > enable / halt. > > I'm still seeing odd stuff on some of my ACPI-using netbooks when > doing net80211/ath development and it all goes away whenever I fondle > with the above settings. > > So, play with kern.eventtimer.periodic, kern.eventtimer.idletick and > machdep.idle (try setting machdep.idle to hlt, or something else > listed in machdep.idle_available) - please report back what the > results are. > > > Adrian > Adrian, you mention that ACPI is racy, which event timer are you talking about? how is the quality chosen? at the moment switching kern.eventtimer.timer to LAPIC seems to have done the trick. I'll have to wait another 24hs to make sure. In the meantime here is some info: Intel(R) Xeon(R) CPU E5645: running with no problems LAPIC(600) HPET(450) HPET1(440) HPET2(440) HPET3(440) i8254(100) RTC(0) Intel(R) Xeon(R) CPU X5550: this is the problematic, at least for the moment HPET(450) HPET1(440) HPET2(440) HPET3(440) LAPIC(400) i8254(100) RTC(0) Dual-Core AMD Opteron(tm) Processor 2218: running with no problems LAPIC(400) RTC(0) so if someone is running 9.1 on any of the following and can provide the output of sysctl kern.eventtimer.choice would be nice: Intel(R) Xeon(R) CPU E5410 Intel(R) Xeon(R) CPU E5507 btw, all the above are on server MBs. thanks, danny > On 21 January 2013 07:54, Ian Lepore wrote: > > On Mon, 2013-01-21 at 17:35 +0200, Daniel Braniss wrote: > >> ... > >> > > >> > What's the output of sysctl kern.eventtimer? > >> > >> kern.eventtimer.periodic is 0 > >> > >> > Does the bad behavior > >> > change if you set kern.eventimer.periodic=1? > >> > > >> > >> setting kern.eventtimer.timer=LAPIC > >> instead of the default HPET made the missing cpu timers to appear: > >> # vmstat -i > >> interrupt total rate > >> irq3: uart1 1695 0 > >> irq4: uart0 5 0 > >> irq19: ehci0 3875 0 > >> irq20: hpet0 uhci3 5495755 1135 > >> irq21: uhci2 ehci1 29 0 > >> irq23: atapci0 48 0 > >> cpu0:timer 7063 1 > >> irq256: bce0 117073 24 > >> irq260: mfi0 51083 10 > >> irq261: mfi1 3088 0 > >> cpu1:timer 484 0 > >> cpu14:timer 36 0 > >> cpu6:timer 486 0 > >> cpu8:timer 38 0 > >> cpu5:timer 38 0 > >> cpu15:timer 38 0 > >> cpu7:timer 32 0 > >> cpu12:timer 38 0 > >> cpu3:timer 40 0 > >> cpu9:timer 36 0 > >> cpu10:timer 34 0 > >> cpu11:timer 37 0 > >> cpu2:timer 33 0 > >> cpu13:timer 40 0 > >> cpu4:timer 36 0 > >> Total 5681160 1173 > >> > >> is this relevant? > > > > I'll have to let someone who knows modern x86 hardware better comment on > > the relative merits of hpet vs. lapic timers. If it was using hpet in > > one-shot mode, and changing it to hpet in periodic mode makes the > > problem go away, that might be a clue that there's something wrong in > > the hpet eventtimer start or interrupt routines. > > > > I wonder if a single missed interrupt in one-shot mode would bring an > > eventtimer to a halt like that? And if so, then what is it about > > manually asking for the date that kicks it into running again? > > > > -- Ian