From owner-freebsd-hackers@freebsd.org  Sun Oct 18 15:06:39 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 53325A18F73
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun, 18 Oct 2015 15:06:39 +0000 (UTC)
 (envelope-from jmaloney@pcbsd.org)
Received: from barracuda.ixsystems.com (mail.ixsystems.com [69.198.165.135])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "*.ixsystems.com",
 Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 3133EE57
 for <freebsd-hackers@freebsd.org>; Sun, 18 Oct 2015 15:06:38 +0000 (UTC)
 (envelope-from jmaloney@pcbsd.org)
X-ASG-Debug-ID: 1445180796-08ca040e8500c30002-P5m3U7
Received: from [10.0.1.52] (ip72-209-160-49.ks.ks.cox.net [72.209.160.49]) by
 barracuda.ixsystems.com with ESMTP id F2olhriWYHpLoehM (version=TLSv1
 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NO);
 Sun, 18 Oct 2015 08:06:36 -0700 (PDT)
X-Barracuda-Envelope-From: jmaloney@pcbsd.org
X-Barracuda-AUTH-User: jmaloney@pcbsd.org
X-Barracuda-Apparent-Source-IP: 72.209.160.49
Mime-Version: 1.0 (Mac OS X Mail 9.0 \(3094\))
Subject: Re: rc(8) parallel tasks
From: Joe Maloney <jmaloney@pcbsd.org>
X-ASG-Orig-Subj: Re: rc(8) parallel tasks
In-Reply-To: <562143E7.3030104@jet9.net>
Date: Sun, 18 Oct 2015 10:06:35 -0500
Cc: Mark Felder <feld@FreeBSD.org>, freebsd-hackers@freebsd.org,
 freebsd-rc@freebsd.org, marck@rinet.ru
Message-Id: <6D32B114-453C-4D25-8FF4-C8777D78C50C@pcbsd.org>
References: <560EAC05.6050308@jet9.net>
 <1445011561.1233840.412174489.688C5822@webmail.messagingengine.com>
 <562143E7.3030104@jet9.net>
To: Cyril Vechera <cv@jet9.net>
X-Mailer: Apple Mail (2.3094)
X-Barracuda-Connect: ip72-209-160-49.ks.ks.cox.net[72.209.160.49]
X-Barracuda-Start-Time: 1445180796
X-Barracuda-Encrypted: ECDHE-RSA-AES256-SHA
X-Barracuda-URL: https://10.2.0.41:443/cgi-mod/mark.cgi
X-Virus-Scanned: by bsmtpd at ixsystems.com
X-Barracuda-BRTS-Status: 1
X-Barracuda-Spam-Score: 0.00
X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0
 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=8.0 tests=HTML_MESSAGE
X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.23602
 Rule breakdown below
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 0.00 HTML_MESSAGE           BODY: HTML included in message
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Oct 2015 15:06:39 -0000

Thanks for sharing this.  It worked great for me on FreeBSD as well with =
minimal services, and does indeed make a difference.  It blew up on =
PCBSD which has many more services running out of box.  I also noticed =
/tmp/init has to be removed manually in some cases when boot hangs.  =
Very impressive otherwise.

Joe Maloney

> On Oct 16, 2015, at 1:37 PM, Cyril Vechera <cv@jet9.net> wrote:
>=20
> On 10/16/2015 07:06 PM, Mark Felder wrote:
>> On Fri, Oct 2, 2015, at 11:08, Cyril Vechera wrote:
>>> Hi there.
>>>=20
>>> We've got a small launcher script (~250 loc) for parallel services
>>> start/stop etc. It is used on our embedded systems and our users
>>> containers. And I've done a proof of concept for implanting it to =
the
>>> FreeBSD's standard /etc/rc for execution starting scripts in =
parallel.
>>> It gave me a boot time reduction of rc part from 27 to 7 seconds, =
mostly
>>> on eliminating jams for network or other long-latency resources =
waiting.
>>>=20
>>> The launcher is written in pure POSIX shell and uses FIFOs (named =
pipes)
>>> as a mutexes for synchronization. So it is embedded into /etc/rc and
>>> /etc/rc.d preserving rc.subr preloading. As a primary requirement, =
it
>>> guarantees topological order (strict partial order) defined by
>>> dependencies. It requires only POSIX shell, FreeBSD or Linux kernel,
>>> mkfifo and a writeable file system. Due to last requirement, it can =
be
>>> run on the late stage or should be supplied by some kinf of =
writtable
>>> fs, ie tmpfs. The FreeBSD-integrated version uses standard rcorder
>>> annotations (REQUIRE, BEFORE and PROVIDE) and there's no need to =
change
>>> rc.d scripts
>>>=20
>>> It's not a full init replacement or a kind of services supervision =
tool.
>>> It only starts or invokes a group of scripts in parallel with =
resolving
>>> and assuring execution in dependencies order.
>>>=20
>>> Please take a look at the script and patch set for FreeBSD:
>>>=20
>>> =
https://github.com/cvss/jet9-multitask-init/blob/master/jet9-multitask-ini=
t
>>> =
https://github.com/cvss/jet9-multitask-init/tree/master/examples/freebsd
>>>=20
>>>=20
>> Your first link is a 404, but this looks really nice!
>=20
> In the last commit (v1.3.0) I've renamed 'jet9-multitask-init' to =
'jet9-multitask-flow' to avoid confusion with naming, because it's not =
as it seems, from name, an init replacement, but just a parallel task =
launcher. And now the actual repository URL is =
https://github.com/cvss/jet9-multitask-flow =
<https://github.com/cvss/jet9-multitask-flow>
>=20
> In this commit I've complete the FreeBSD compatibility. Now a script =
or dependency name can include minuses `-` and dots `.` (the first stone =
I stumbled over was ftp-proxy).  And I've cleaned up the main script =
code https://github.com/cvss/jet9-multitask-flow/jet9-multitask-flow =
<https://github.com/cvss/jet9-multitask-flow/jet9-multitask-flow> and =
have split it to more functions that can be redefined. So it's now =
easier to rewrite dependency extraction for FreeBSD rc-scripts - current =
implementation is too rough and takes three 'awk' runs for each =
rc-script. The last is not only time loss, but as DMarck mentioned =
before, using awk restricts parallel rc to be run only after FILESYSTEMS =
stage is done. Maybe it would be better to add to the rcorder(8) some =
new option to dump the gathered dependencies in tsort-compatible listing =
and insert them directly to flow execution plan.
>=20
> I've also added rc.conf variable `rc_parallel` to turn on and off =
parallel execution. There's a risk to discover an incomplete dependency =
annotations in some rc-scripts that earlier were masked by serial =
execution. I've done some checks by enabling as much rc.conf variables =
as possible to start more rc-scripts, and didn't found any error. But it =
looks too good to be true and I'm afraid that it's just to poor testing. =
I think if some ordering conflict will be found, it could be =
worked-around with introducing a white-list for script names that must =
be run only in sequentially.
>=20
> So it remained first to check if it really works in different =
conditions.
>=20
>=20
>=20
>=20
> --=20
> Cyril Vechera
>=20
> _______________________________________________
> freebsd-hackers@freebsd.org <mailto:freebsd-hackers@freebsd.org> =
mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers =
<https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>
> To unsubscribe, send any mail to =
"freebsd-hackers-unsubscribe@freebsd.org =
<mailto:freebsd-hackers-unsubscribe@freebsd.org>"


From owner-freebsd-hackers@freebsd.org  Mon Oct 19 09:19:25 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id ECFCCA18646
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon, 19 Oct 2015 09:19:25 +0000 (UTC)
 (envelope-from andrew@fubar.geek.nz)
Received: from kif.fubar.geek.nz (kif.fubar.geek.nz [178.62.119.249])
 by mx1.freebsd.org (Postfix) with ESMTP id BCA861DE9
 for <freebsd-hackers@freebsd.org>; Mon, 19 Oct 2015 09:19:25 +0000 (UTC)
 (envelope-from andrew@fubar.geek.nz)
Received: from bender.Home (bcdccf38.skybroadband.com [188.220.207.56])
 by kif.fubar.geek.nz (Postfix) with ESMTPSA id 98ADBD7900;
 Mon, 19 Oct 2015 09:18:53 +0000 (UTC)
Date: Mon, 19 Oct 2015 10:18:51 +0100
From: Andrew Turner <andrew@fubar.geek.nz>
To: Eric McCorkle <eric@metricspace.net>
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs
Message-ID: <20151019101851.28895dfa@bender.Home>
In-Reply-To: <56211825.3080403@metricspace.net>
References: <56211825.3080403@metricspace.net>
X-Mailer: Claws Mail 3.12.0 (GTK+ 2.24.28; amd64-portbld-freebsd10.1)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Oct 2015 09:19:26 -0000

On Fri, 16 Oct 2015 11:30:45 -0400
Eric McCorkle <eric@metricspace.net> wrote:

> Hi,
> 
> I've received a few successful test reports for EFI/ZFS, on both ZFS
> and UFS systems.  I have personally been using it for some time in a
> GRUB + loader.efi setup.
> 
> I have a fairly minor test result for loader.efi.  I have confirmed
> that "nextboot -k" works fine.
> 
> In general, I need testing on ZFS setups with more complex vdevs
> (l2arc, intent logs, mirroring, striping, raidz, etc.)

Do you have an updated patch? When I looked at this recently the patch
didn't apply cleanly.

Andrew

From owner-freebsd-hackers@freebsd.org  Tue Oct 20 10:14:15 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 149A6A1835B
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue, 20 Oct 2015 10:14:15 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 8AF47A3;
 Tue, 20 Oct 2015 10:14:12 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA02295;
 Tue, 20 Oct 2015 13:14:04 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1ZoTvg-0004vO-5d; Tue, 20 Oct 2015 13:14:04 +0300
To: freebsd-hackers <freebsd-hackers@FreeBSD.org>
From: Andriy Gapon <avg@FreeBSD.org>
Subject: instability of timekeeping
X-Enigmail-Draft-Status: N1110
Message-ID: <56261398.60102@FreeBSD.org>
Date: Tue, 20 Oct 2015 13:12:40 +0300
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101
 Thunderbird/38.3.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Oct 2015 10:14:15 -0000


I recently replaced a 2-core Athlon II X2 CPU with a same-family Phenom II X4
CPU and after that I started noticing problems with the timekeeping.  It seems
that from time to time the jitter becomes so high that ntpd goes nuts or stops
synchronizing or panics.

Here how the current event timer and time counter configurations look (slightly
trimmed):
$ sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: TSC-low(800) ACPI-fast(900) HPET(950) i8254(0)
dummy(-1000000)
kern.timecounter.hardware: TSC-low
kern.timecounter.alloweddeviation: 5
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.TSC-low.quality: 800
kern.timecounter.tc.TSC-low.frequency: 1607357461
kern.timecounter.tc.TSC-low.counter: 2457319922
kern.timecounter.tc.TSC-low.mask: 4294967295
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.i8254.quality: 0
$ sysctl kern.eventtimer
kern.eventtimer.periodic: 0
kern.eventtimer.timer: HPET
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2
kern.eventtimer.choice: HPET(450) HPET1(450) HPET2(450) LAPIC(400) i8254(100) RTC(0)
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.et.HPET2.quality: 450
kern.eventtimer.et.HPET1.quality: 450
kern.eventtimer.et.HPET.quality: 450
kern.eventtimer.et.HPET.frequency: 14318180
kern.eventtimer.et.HPET.flags: 3
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.LAPIC.quality: 400

Please note is that TSC-low time counter is chosen administratively whereas the
event timer configuration is fully automatic.
The previous configuration was produced in the same fashion.
One notable difference is that the previous CPU was 2-core and so two HPET
timers were virtually combined into a single timer with per-CPU capability.  In
other words, two HPET timers used two drive two cores.
The newer CPU has four cores, so there are not enough HPET timers to drive each
core independently and thus there is no virtual bundling.  Thus, one HPET timer
drives one core and that core forwards the interrupts to other cores via IPIs as
necessary.

But I am far from sure that the stated difference is actually the source of the
instability.  There could be other hardware-related reasons, of course.

I wonder if there is a good way to analyze / debug this situation to see what
exactly is wrong.  For now I am thinking about trying different time counter and
event timer configurations, but I would prefer a more guided "scientific"
approach over a blind trial and error one.

I would appreciate any help, suggestions, hints.

The CPUs:
CPU: AMD Athlon(tm) II X2 250 Processor (3013.79-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x100f62  Family=0x10  Model=0x6  Stepping=2

Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD
Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
  SVM: Features=0xf<NP,LbrVirt,SVML,NRIPS>
Revision=1, ASIDs=64
  TSC: P-state invariant

CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x100f43  Family=0x10  Model=0x4  Stepping=3

Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD
Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
  SVM: Features=0xf<NP,LbrVirt,SVML,NRIPS>
Revision=1, ASIDs=64
  TSC: P-state invariant

-- 
Andriy Gapon

From owner-freebsd-hackers@freebsd.org  Tue Oct 20 11:06:57 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2A117A19685
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue, 20 Oct 2015 11:06:57 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 4A1F314AA;
 Tue, 20 Oct 2015 11:06:55 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA03152;
 Tue, 20 Oct 2015 14:06:54 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1ZoUko-0004zJ-5g; Tue, 20 Oct 2015 14:06:54 +0300
Subject: Re: instability of timekeeping
To: freebsd-hackers <freebsd-hackers@FreeBSD.org>
References: <56261398.60102@FreeBSD.org>
From: Andriy Gapon <avg@FreeBSD.org>
X-Enigmail-Draft-Status: N1110
Message-ID: <56261FE6.90302@FreeBSD.org>
Date: Tue, 20 Oct 2015 14:05:10 +0300
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101
 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <56261398.60102@FreeBSD.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Oct 2015 11:06:57 -0000


I performed a small observation.  With ntpd disabled I ran `ntpdate -d` at 10
second intervals in a loop (done via `sleep 10`).  It looks like for about 25
minutes the time offset between a reference server and my machine was quite
stable. But then it sort of jumped about 2.5 seconds between two consecutive
ntpdate invocations.

20 Oct 13:21:02 ntpdate[85157]: ntpdate 4.2.8p3-a (1)
Looking for host ntp.time.in.ua and service ntp
62.149.0.30 reversed to ntp.time.in.ua
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
server 62.149.0.30, port 123
stratum 1, precision -20, leap 00, trust 000
refid [GPS], delay 0.03326, dispersion 0.00014
transmitted 4, in filter 4
reference time:    d9d09422.65c26838  Tue, Oct 20 2015 13:21:22.397
originate timestamp: d9d0942f.a03ebee7  Tue, Oct 20 2015 13:21:35.625
transmit timestamp:  d9d09414.e9ed230c  Tue, Oct 20 2015 13:21:08.913
filter delay:  0.03372  0.03491  0.03378  0.03326
         0.00000  0.00000  0.00000  0.00000
filter offset: 26.70845 26.70896 26.70834 26.70832
         0.000000 0.000000 0.000000 0.000000
delay 0.03326, dispersion 0.00014
offset 26.708320

20 Oct 13:21:08 ntpdate[85157]: step time server 62.149.0.30 offset 26.708320 sec

[...]

20 Oct 13:45:20 ntpdate[87088]: ntpdate 4.2.8p3-a (1)
Looking for host ntp.time.in.ua and service ntp
62.149.0.30 reversed to ntp.time.in.ua
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
server 62.149.0.30, port 123
stratum 1, precision -20, leap 00, trust 000
refid [GPS], delay 0.03442, dispersion 0.00018
transmitted 4, in filter 4
reference time:    d9d099d3.67703742  Tue, Oct 20 2015 13:45:39.404
originate timestamp: d9d099e1.8a1b6c78  Tue, Oct 20 2015 13:45:53.539
transmit timestamp:  d9d099c6.d34c1e00  Tue, Oct 20 2015 13:45:26.825
filter delay:  0.03481  0.03442  0.03448  0.03458
         0.00000  0.00000  0.00000  0.00000
filter offset: 26.70957 26.70943 26.70913 26.70957
         0.000000 0.000000 0.000000 0.000000
delay 0.03442, dispersion 0.00018
offset 26.709437

20 Oct 13:45:26 ntpdate[87088]: step time server 62.149.0.30 offset 26.709437 sec

20 Oct 13:45:36 ntpdate[87094]: ntpdate 4.2.8p3-a (1)
Looking for host ntp.time.in.ua and service ntp
62.149.0.30 reversed to ntp.time.in.ua
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
server 62.149.0.30, port 123
stratum 1, precision -20, leap 00, trust 000
refid [GPS], delay 0.03349, dispersion 0.00012
transmitted 4, in filter 4
reference time:    d9d099e5.63ead89c  Tue, Oct 20 2015 13:45:57.390
originate timestamp: d9d099f4.6364c717  Tue, Oct 20 2015 13:46:12.388
transmit timestamp:  d9d099d7.00939943  Tue, Oct 20 2015 13:45:43.002
filter delay:  0.03349  0.03413  0.03419  0.03455
         0.00000  0.00000  0.00000  0.00000
filter offset: 29.38105 29.38106 29.38134 29.38149
         0.000000 0.000000 0.000000 0.000000
delay 0.03349, dispersion 0.00012
offset 29.381055

20 Oct 13:45:43 ntpdate[87094]: step time server 62.149.0.30 offset 29.381055 sec

On 20/10/2015 13:12, Andriy Gapon wrote:
> 
> I recently replaced a 2-core Athlon II X2 CPU with a same-family Phenom II X4
> CPU and after that I started noticing problems with the timekeeping.  It seems
> that from time to time the jitter becomes so high that ntpd goes nuts or stops
> synchronizing or panics.
> 
> Here how the current event timer and time counter configurations look (slightly
> trimmed):
> $ sysctl kern.timecounter
> kern.timecounter.tsc_shift: 1
> kern.timecounter.smp_tsc_adjust: 0
> kern.timecounter.smp_tsc: 1
> kern.timecounter.invariant_tsc: 1
> kern.timecounter.fast_gettime: 1
> kern.timecounter.tick: 1
> kern.timecounter.choice: TSC-low(800) ACPI-fast(900) HPET(950) i8254(0)
> dummy(-1000000)
> kern.timecounter.hardware: TSC-low
> kern.timecounter.alloweddeviation: 5
> kern.timecounter.stepwarnings: 0
> kern.timecounter.tc.TSC-low.quality: 800
> kern.timecounter.tc.TSC-low.frequency: 1607357461
> kern.timecounter.tc.TSC-low.counter: 2457319922
> kern.timecounter.tc.TSC-low.mask: 4294967295
> kern.timecounter.tc.ACPI-fast.quality: 900
> kern.timecounter.tc.HPET.quality: 950
> kern.timecounter.tc.i8254.quality: 0
> $ sysctl kern.eventtimer
> kern.eventtimer.periodic: 0
> kern.eventtimer.timer: HPET
> kern.eventtimer.idletick: 0
> kern.eventtimer.singlemul: 2
> kern.eventtimer.choice: HPET(450) HPET1(450) HPET2(450) LAPIC(400) i8254(100) RTC(0)
> kern.eventtimer.et.RTC.quality: 0
> kern.eventtimer.et.HPET2.quality: 450
> kern.eventtimer.et.HPET1.quality: 450
> kern.eventtimer.et.HPET.quality: 450
> kern.eventtimer.et.HPET.frequency: 14318180
> kern.eventtimer.et.HPET.flags: 3
> kern.eventtimer.et.i8254.quality: 100
> kern.eventtimer.et.LAPIC.quality: 400
> 
> Please note is that TSC-low time counter is chosen administratively whereas the
> event timer configuration is fully automatic.
> The previous configuration was produced in the same fashion.
> One notable difference is that the previous CPU was 2-core and so two HPET
> timers were virtually combined into a single timer with per-CPU capability.  In
> other words, two HPET timers used two drive two cores.
> The newer CPU has four cores, so there are not enough HPET timers to drive each
> core independently and thus there is no virtual bundling.  Thus, one HPET timer
> drives one core and that core forwards the interrupts to other cores via IPIs as
> necessary.
> 
> But I am far from sure that the stated difference is actually the source of the
> instability.  There could be other hardware-related reasons, of course.
> 
> I wonder if there is a good way to analyze / debug this situation to see what
> exactly is wrong.  For now I am thinking about trying different time counter and
> event timer configurations, but I would prefer a more guided "scientific"
> approach over a blind trial and error one.
> 
> I would appreciate any help, suggestions, hints.
> 
> The CPUs:
> CPU: AMD Athlon(tm) II X2 250 Processor (3013.79-MHz K8-class CPU)
>   Origin="AuthenticAMD"  Id=0x100f62  Family=0x10  Model=0x6  Stepping=2
> 
> Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>   Features2=0x802009<SSE3,MON,CX16,POPCNT>
>   AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
>   AMD
> Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
>   SVM: Features=0xf<NP,LbrVirt,SVML,NRIPS>
> Revision=1, ASIDs=64
>   TSC: P-state invariant
> 
> CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU)
>   Origin="AuthenticAMD"  Id=0x100f43  Family=0x10  Model=0x4  Stepping=3
> 
> Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>   Features2=0x802009<SSE3,MON,CX16,POPCNT>
>   AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
>   AMD
> Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
>   SVM: Features=0xf<NP,LbrVirt,SVML,NRIPS>
> Revision=1, ASIDs=64
>   TSC: P-state invariant
> 


-- 
Andriy Gapon

From owner-freebsd-hackers@freebsd.org  Tue Oct 20 11:10:39 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5C1FDA19784
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue, 20 Oct 2015 11:10:39 +0000 (UTC)
 (envelope-from phk@phk.freebsd.dk)
Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222])
 by mx1.freebsd.org (Postfix) with ESMTP id 2710A164E;
 Tue, 20 Oct 2015 11:10:38 +0000 (UTC)
 (envelope-from phk@phk.freebsd.dk)
Received: from critter.freebsd.dk (unknown [192.168.55.3])
 by phk.freebsd.dk (Postfix) with ESMTP id 14D794F860;
 Tue, 20 Oct 2015 11:10:37 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
 by critter.freebsd.dk (8.15.2/8.15.2) with ESMTP id t9KBAaJX097480;
 Tue, 20 Oct 2015 11:10:36 GMT (envelope-from phk@phk.freebsd.dk)
To: Andriy Gapon <avg@FreeBSD.org>
cc: freebsd-hackers <freebsd-hackers@FreeBSD.org>
Subject: Re: instability of timekeeping
In-reply-to: <56261FE6.90302@FreeBSD.org>
From: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
References: <56261398.60102@FreeBSD.org> <56261FE6.90302@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-ID: <97478.1445339436.1@critter.freebsd.dk>
Content-Transfer-Encoding: 8bit
Date: Tue, 20 Oct 2015 11:10:36 +0000
Message-ID: <97479.1445339436@critter.freebsd.dk>
X-Mailman-Approved-At: Tue, 20 Oct 2015 11:17:43 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Oct 2015 11:10:39 -0000

--------
In message <56261FE6.90302@FreeBSD.org>, Andriy Gapon writes:

>I performed a small observation.  With ntpd disabled I ran `ntpdate -d` at 10
>second intervals in a loop (done via `sleep 10`).  It looks like for about 25
>minutes the time offset between a reference server and my machine was quite
>stable. But then it sort of jumped about 2.5 seconds between two consecutive
>ntpdate invocations.

Pure guesswork:  Somebody may have börked the code to wind up timecounters.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

From owner-freebsd-hackers@freebsd.org  Tue Oct 20 21:59:54 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D3727A1AF3A
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue, 20 Oct 2015 21:59:54 +0000 (UTC) (envelope-from ian@freebsd.org)
Received: from outbound1b.ore.mailhop.org (outbound1b.ore.mailhop.org
 [54.200.247.200])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9E55EB2E
 for <freebsd-hackers@freebsd.org>; Tue, 20 Oct 2015 21:59:54 +0000 (UTC)
 (envelope-from ian@freebsd.org)
Received: from ilsoft.org (unknown [73.34.117.227])
 by outbound1.ore.mailhop.org (Halon Mail Gateway) with ESMTPSA
 for <freebsd-hackers@FreeBSD.org>; Tue, 20 Oct 2015 22:00:00 +0000 (UTC)
Received: from rev (rev [172.22.42.240])
 by ilsoft.org (8.14.9/8.14.9) with ESMTP id t9KLxkhr013544
 for <freebsd-hackers@FreeBSD.org>; Tue, 20 Oct 2015 15:59:46 -0600 (MDT)
 (envelope-from ian@freebsd.org)
Message-ID: <1445378386.14127.2.camel@freebsd.org>
Subject: vmstat -m strangeness
From: Ian Lepore <ian@freebsd.org>
To: freebsd-hackers <freebsd-hackers@FreeBSD.org>
Date: Tue, 20 Oct 2015 15:59:46 -0600
Content-Type: text/plain; charset="us-ascii"
X-Mailer: Evolution 3.16.5 FreeBSD GNOME Team Port 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Oct 2015 21:59:54 -0000

root@wand:~ # vmstat -m | egrep "busdma|bounce|devbuf|Type"
         Type InUse MemUse HighUse Requests  Size(s)
       devbuf   125    10K       -      166  16,32,64,256,512,1024
       busdma   922   116K       -      922  128
       bounce   385   775K       -      385  32,128

How do 385 allocations of 32 or 128 bytes add up to 775K?  The
answer... 768K of individual pages each allocated via contigmalloc() do
n't show up in that output.  Why is that, and is it something that
should be fixed?

-- Ian


From owner-freebsd-hackers@freebsd.org  Wed Oct 21 08:43:18 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id ABCF3A1AEDC
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed, 21 Oct 2015 08:43:18 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 9B8C6F9E;
 Wed, 21 Oct 2015 08:43:17 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA21745;
 Wed, 21 Oct 2015 11:43:13 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1ZoozI-0006YE-SL; Wed, 21 Oct 2015 11:43:13 +0300
Subject: Re: instability of timekeeping
To: freebsd-hackers <freebsd-hackers@FreeBSD.org>
References: <56261398.60102@FreeBSD.org> <56261FE6.90302@FreeBSD.org>
From: Andriy Gapon <avg@FreeBSD.org>
X-Enigmail-Draft-Status: N1110
Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, Jung-uk Kim <jkim@FreeBSD.org>
Message-ID: <56274FFC.2000608@FreeBSD.org>
Date: Wed, 21 Oct 2015 11:42:36 +0300
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101
 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <56261FE6.90302@FreeBSD.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Oct 2015 08:43:18 -0000

On 20/10/2015 14:05, Andriy Gapon wrote:
> I performed a small observation.  With ntpd disabled I ran `ntpdate -d` at 10
> second intervals in a loop (done via `sleep 10`).  It looks like for about 25
> minutes the time offset between a reference server and my machine was quite
> stable. But then it sort of jumped about 2.5 seconds between two consecutive
> ntpdate invocations.
[snip]
> On 20/10/2015 13:12, Andriy Gapon wrote:
[snip]
>> kern.timecounter.tc.TSC-low.frequency: 1607357461
>> kern.timecounter.tc.TSC-low.counter: 2457319922
>> kern.timecounter.tc.TSC-low.mask: 4294967295
[snip]

Another observation and a hypothesis.

I tried time counters other than TSC and I couldn't reproduce the issue with them.

Another thing that occurred to me is that TSC-low.mask / TSC-low.frequency ≈ 2.7
seconds.

>From these observations and reading some code comes my hypothesis.
First, I assume that visible values of TSCs on different cores are not perfectly
synchronized. Second, I think that there can be circumstances where
tc_ticktock() -> tc_windup() can get called on different cores sufficiently
close in time that the later call would see TSC value which is "before"
(sequentially smaller) than the TSC value read earlier (on the other core).  In
that case the delta between the readings would be close to TSC-low.mask.

Right now I do not have any proof that what the hypothesis says is what actually
happens.  On the other hand, I do not see anything that would prevent the
hypothesized situation from occurring.

To add more weight to the hypothesis:
cpu1:invltlb                      674384          4
cpu1:invlrng                      453020          3
cpu1:invlpg                    108772578        652
cpu1:preempt                    37435000        224
cpu1:ast                           36757          0
cpu1:rendezvous                     9473          0
cpu1:hardclock                  22267434        133
As you can see I am currently running workloads that result in a very
significant number of IPIs, especially the page invalidation IPIs.  Given that
all (x86) IPIs have the same priority (based on their vector numbers) I think
it's plausible that the hardclock IPI could get arbitrarily delayed.

I guess I could add a tracepoint to record deltas that are close to a current
timecounter's mask.

Assuming the hypothesis is correct I see two possible ways to work-around the
problem:

1. Increase tsc_shift, so that the cross-CPU TSC differences are smaller (at the
cost of lower resolution).  That should reduce the risk of seeing "backwards"
TSC values.  Judging from numbers that tools/tools/tscdrift produces I can set
tsc_shift to 7.  The resulting resolution should not be worse than that of HPET
or ACPI-fast counters with the benefit of TSC being much faster to read.

2. Change the code, so that tc_windup() is always called on the same CPU.  E.g.
it could be the BSP or a CPU that receives the actual timer interrupts (as
opposed to the hardclock IPIs).  This should help with the timekeeping, but
won't help with the "jitter" in binuptime() and friends.

3. In tc_delta() somehow detect and filter out "slightly behind" timecounter
readings.  Not sure if this is possible at all.

-- 
Andriy Gapon

From owner-freebsd-hackers@freebsd.org  Wed Oct 21 14:11:22 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3838FA1B57C
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed, 21 Oct 2015 14:11:22 +0000 (UTC)
 (envelope-from ganael.laplanche@corp.ovh.com)
Received: from 7.mo174.mail-out.ovh.net (7.mo174.mail-out.ovh.net
 [46.105.47.152])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id F3334C85
 for <freebsd-hackers@freebsd.org>; Wed, 21 Oct 2015 14:11:21 +0000 (UTC)
 (envelope-from ganael.laplanche@corp.ovh.com)
Received: from ex2.OVH.local (corp.ovh.com [5.196.251.137])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (No client certificate requested)
 by mo174.mail-out.ovh.net (Postfix) with ESMTPS id 21616FF80A4;
 Wed, 21 Oct 2015 13:45:31 +0200 (CEST)
Received: from desk533202.ovh.net (5.196.2.34) by ex2.OVH.local (172.16.7.2)
 with Microsoft SMTP Server (TLS) id 15.1.225.42; Wed, 21 Oct 2015 13:45:31
 +0200
From: Ganael Laplanche <ganael.laplanche@corp.ovh.com>
Organization: OVH
To: Eric McCorkle <eric@metricspace.net>
Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs
Date: Wed, 21 Oct 2015 13:45:29 +0200
User-Agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; )
CC: <freebsd-hackers@freebsd.org>
References: <56211825.3080403@metricspace.net>
In-Reply-To: <56211825.3080403@metricspace.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Message-ID: <201510211345.29460.ganael.laplanche@corp.ovh.com>
X-Originating-IP: [5.196.2.34]
X-ClientProxiedBy: cas02.OVH.local (172.16.1.2) To ex2.OVH.local (172.16.7.2)
X-Ovh-Tracer-Id: 8852669496638618152
X-VR-SPAMSTATE: OK
X-VR-SPAMSCORE: -100
X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekhedrtddvucetufdoteggodftvfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Oct 2015 14:11:22 -0000

On Friday, October 16, 2015 05:30:45 PM Eric McCorkle wrote:

Hi Eric,

> In general, I need testing on ZFS setups with more complex vdevs (l2arc,=
=20
intent logs, mirroring, striping, raidz, etc.)

I have successfully run the following tests, on a server with 3 SSDs and=20
following the method I explained in my previous post (i.e. booting from=20
patched loader.efi, *not* boot1.efi), see:=20
https://lists.freebsd.org/pipermail/freebsd-hackers/2015-August/048141.html

1) Single disk + SLOG + L2ARC

# zpool status
  pool: zroot
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          da0p3     ONLINE       0     0     0
        logs
          da2       ONLINE       0     0     0
        cache
          da1       ONLINE       0     0     0

errors: No known data errors

=3D> Boot OK, the pool is up and running with logs and cache online

2) Striping on 3 disks

# zpool status
  pool: zroot
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          da0p3     ONLINE       0     0     0
          da1p3     ONLINE       0     0     0
          da2p3     ONLINE       0     0     0

errors: No known data errors

=3D> Boot OK

3) Mirroring on 3 disks

# zpool status
  pool: zroot
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da0p3   ONLINE       0     0     0
            da1p3   ONLINE       0     0     0
            da2p3   ONLINE       0     0     0

errors: No known data errors

=3D> Boot OK

4) Raidz on 3 disks

# zpool status
  pool: zroot
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            da0p3   ONLINE       0     0     0
            da1p3   ONLINE       0     0     0
            da2p3   ONLINE       0     0     0

errors: No known data errors

=3D> Boot OK

As Andrew asked, do you have an updated patch ? Or is the code available on=
=20
some repository ?

Thanks again for your great work,
Regards,

=2D-=20
Gana=EBl LAPLANCHE <ganael.laplanche@corp.ovh.com>

From owner-freebsd-hackers@freebsd.org  Wed Oct 21 14:21:31 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0F764A1BB4B
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed, 21 Oct 2015 14:21:31 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net
 [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id CD5C81F1D
 for <freebsd-hackers@freebsd.org>; Wed, 21 Oct 2015 14:21:30 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 732811C3DD2
 for <freebsd-hackers@freebsd.org>; Wed, 21 Oct 2015 09:21:28 -0500 (CDT)
Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs
To: freebsd-hackers@freebsd.org
References: <56211825.3080403@metricspace.net>
 <201510211345.29460.ganael.laplanche@corp.ovh.com>
From: Karl Denninger <karl@denninger.net>
Message-ID: <56279F66.5030303@denninger.net>
Date: Wed, 21 Oct 2015 09:21:26 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <201510211345.29460.ganael.laplanche@corp.ovh.com>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms010603070509040907090006"
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Oct 2015 14:21:31 -0000

This is a cryptographically signed message in MIME format.

--------------ms010603070509040907090006
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Will loader.rc take all the parameters (e.g. loading of geli, aesni,
geom, setting kernel parameters, etc) that loader.conf does?

On 10/21/2015 06:45, Ganael Laplanche wrote:
> On Friday, October 16, 2015 05:30:45 PM Eric McCorkle wrote:
>
> Hi Eric,
>
>> In general, I need testing on ZFS setups with more complex vdevs (l2ar=
c,=20
> intent logs, mirroring, striping, raidz, etc.)
>
> I have successfully run the following tests, on a server with 3 SSDs an=
d=20
> following the method I explained in my previous post (i.e. booting from=
=20
> patched loader.efi, *not* boot1.efi), see:=20
> https://lists.freebsd.org/pipermail/freebsd-hackers/2015-August/048141.=
html
>
> 1) Single disk + SLOG + L2ARC
>
> # zpool status
>   pool: zroot
>  state: ONLINE
>   scan: none requested
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         zroot       ONLINE       0     0     0
>           da0p3     ONLINE       0     0     0
>         logs
>           da2       ONLINE       0     0     0
>         cache
>           da1       ONLINE       0     0     0
>
> errors: No known data errors
>
> =3D> Boot OK, the pool is up and running with logs and cache online
>
> 2) Striping on 3 disks
>
> # zpool status
>   pool: zroot
>  state: ONLINE
>   scan: none requested
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         zroot       ONLINE       0     0     0
>           da0p3     ONLINE       0     0     0
>           da1p3     ONLINE       0     0     0
>           da2p3     ONLINE       0     0     0
>
> errors: No known data errors
>
> =3D> Boot OK
>
> 3) Mirroring on 3 disks
>
> # zpool status
>   pool: zroot
>  state: ONLINE
>   scan: none requested
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         zroot       ONLINE       0     0     0
>           mirror-0  ONLINE       0     0     0
>             da0p3   ONLINE       0     0     0
>             da1p3   ONLINE       0     0     0
>             da2p3   ONLINE       0     0     0
>
> errors: No known data errors
>
> =3D> Boot OK
>
> 4) Raidz on 3 disks
>
> # zpool status
>   pool: zroot
>  state: ONLINE
>   scan: none requested
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         zroot       ONLINE       0     0     0
>           raidz1-0  ONLINE       0     0     0
>             da0p3   ONLINE       0     0     0
>             da1p3   ONLINE       0     0     0
>             da2p3   ONLINE       0     0     0
>
> errors: No known data errors
>
> =3D> Boot OK
>
> As Andrew asked, do you have an updated patch ? Or is the code availabl=
e on=20
> some repository ?
>
> Thanks again for your great work,
> Regards,
>

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms010603070509040907090006
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNTEwMjExNDIxMjZaME8GCSqGSIb3DQEJBDFCBEDH
O5nRcgyTDktZA4kewy/lnNN1GbVnvTZvnrVTWEyn0xysdtvDkMJ5NvEns/EEh/iYtQeJeNqp
Y3nFs21pbbbqMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAj90wTa+f
urYq00XXdU1ZogBkQxc0/c7eQA2c0NXsnynVbpiYq1sS7Ozr/NVpR59qdxnGRSOarhn4v7i8
EJbz6F8JEK2IO5IhaDdSwMKEkdjhY9ep6Zi1NgtAciag2hsbmbHtBorsoqbtlby8cHOAk3Kg
P9Co9zpFIGjM2pCeqGKCAaTye66su5Aiz7VOz9r5jyhr+hQDBhAWYpUI1SdSs8hMWe+Z0qH5
EZ53Wylg06FMEv9HmlsRKpduYvisk8fQMs/7pW1/zFMoYpBHHGZurtfecdklija1kqg9phTI
PHb8LIC/Ws0n+KCFGN4mn2Jp/WwLfNdycvw/9hgPFWkTl5tovAgV17IZthX0HO0B0gOIu3nX
A2cNo9doVCbnca+gOzl6uzUQkNAa6N3iSAZ1xwq20jCtxz3X83SnbaiLahTu/480vBFA4QFe
NWK1MCyVwz3VXmRlT/RDHhPe5YDZiYdAxYeq353ApTXbBBGvWlMf9QTONWv3ugvEvjwtOiMJ
vnsci6FYKQXyYtArMZTzMoNKdaCMOU+TgK2NQoArtJHEVf2+6I6vSB7z+nrCuzaocq7+71yC
sX9veD1EtXEzu8UDKmQmW3kPD99oGZu1KOxvYOGOxrfa1eBFCVGk2+HWZt2yYoOlH/CwX/fv
pbb3jA+ZrRQvNmx6Oi/VplHJJKEAAAAAAAA=
--------------ms010603070509040907090006--

From owner-freebsd-hackers@freebsd.org  Wed Oct 21 15:29:08 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8F93DA1BCCB
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed, 21 Oct 2015 15:29:08 +0000 (UTC)
 (envelope-from eric@metricspace.net)
Received: from mail.metricspace.net
 (207-172-209-83.c3-0.arl-ubr1.sbo-arl.ma.static.cable.rcn.com
 [207.172.209.83]) by mx1.freebsd.org (Postfix) with ESMTP id 6886CC11
 for <freebsd-hackers@freebsd.org>; Wed, 21 Oct 2015 15:29:07 +0000 (UTC)
 (envelope-from eric@metricspace.net)
Received: from [192.168.2.2] (unknown [166.197.121.170])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate) (Authenticated sender: eric)
 by mail.metricspace.net (Postfix) with ESMTPSA id 1CA30182B;
 Wed, 21 Oct 2015 15:28:59 +0000 (UTC)
User-Agent: K-9 Mail for blackphone
In-Reply-To: <201510211345.29460.ganael.laplanche@corp.ovh.com>
References: <56211825.3080403@metricspace.net>
 <201510211345.29460.ganael.laplanche@corp.ovh.com>
MIME-Version: 1.0
Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs
From: Eric McCorkle <eric@metricspace.net>
Date: Wed, 21 Oct 2015 11:28:52 -0400
To: Ganael Laplanche <ganael.laplanche@corp.ovh.com>
CC: freebsd-hackers@freebsd.org
Message-ID: <5ED4AE5E-8508-409B-B323-2CBFEBE77088@metricspace.net>
Content-Type: text/plain;
 charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Oct 2015 15:29:08 -0000

Outstanding. 

Based on these tests, the reports of successful testing of UFS loading, the fact that PCBSD and NextBSD have apparently picked it up, and the fact that I've been using it for months, I think it's time to move towards getting it committed. 

I'll post an updated patch by the end of the week. 

On October 21, 2015 7:45:29 AM EDT, Ganael Laplanche <ganael.laplanche@corp.ovh.com> wrote:
>On Friday, October 16, 2015 05:30:45 PM Eric McCorkle wrote:
>
>Hi Eric,
>
>> In general, I need testing on ZFS setups with more complex vdevs
>(l2arc, 
>intent logs, mirroring, striping, raidz, etc.)
>
>I have successfully run the following tests, on a server with 3 SSDs
>and 
>following the method I explained in my previous post (i.e. booting from
>
>patched loader.efi, *not* boot1.efi), see: 
>https://lists.freebsd.org/pipermail/freebsd-hackers/2015-August/048141.html
>
>1) Single disk + SLOG + L2ARC
>
># zpool status
>  pool: zroot
> state: ONLINE
>  scan: none requested
>config:
>
>        NAME        STATE     READ WRITE CKSUM
>        zroot       ONLINE       0     0     0
>          da0p3     ONLINE       0     0     0
>        logs
>          da2       ONLINE       0     0     0
>        cache
>          da1       ONLINE       0     0     0
>
>errors: No known data errors
>
>=> Boot OK, the pool is up and running with logs and cache online
>
>2) Striping on 3 disks
>
># zpool status
>  pool: zroot
> state: ONLINE
>  scan: none requested
>config:
>
>        NAME        STATE     READ WRITE CKSUM
>        zroot       ONLINE       0     0     0
>          da0p3     ONLINE       0     0     0
>          da1p3     ONLINE       0     0     0
>          da2p3     ONLINE       0     0     0
>
>errors: No known data errors
>
>=> Boot OK
>
>3) Mirroring on 3 disks
>
># zpool status
>  pool: zroot
> state: ONLINE
>  scan: none requested
>config:
>
>        NAME        STATE     READ WRITE CKSUM
>        zroot       ONLINE       0     0     0
>          mirror-0  ONLINE       0     0     0
>            da0p3   ONLINE       0     0     0
>            da1p3   ONLINE       0     0     0
>            da2p3   ONLINE       0     0     0
>
>errors: No known data errors
>
>=> Boot OK
>
>4) Raidz on 3 disks
>
># zpool status
>  pool: zroot
> state: ONLINE
>  scan: none requested
>config:
>
>        NAME        STATE     READ WRITE CKSUM
>        zroot       ONLINE       0     0     0
>          raidz1-0  ONLINE       0     0     0
>            da0p3   ONLINE       0     0     0
>            da1p3   ONLINE       0     0     0
>            da2p3   ONLINE       0     0     0
>
>errors: No known data errors
>
>=> Boot OK
>
>As Andrew asked, do you have an updated patch ? Or is the code
>available on 
>some repository ?
>
>Thanks again for your great work,
>Regards,
>
>-- 
>Ganaël LAPLANCHE <ganael.laplanche@corp.ovh.com>

-- 
Sent from my Blackphone with K-9 Mail. Please excuse my brevity.
From owner-freebsd-hackers@freebsd.org  Wed Oct 21 17:34:37 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 81B34A1BAFF
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed, 21 Oct 2015 17:34:37 +0000 (UTC) (envelope-from phk@frebsd.org)
Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222])
 by mx1.freebsd.org (Postfix) with ESMTP id 4C13418DC;
 Wed, 21 Oct 2015 17:34:37 +0000 (UTC) (envelope-from phk@frebsd.org)
Received: from critter.freebsd.dk (unknown [192.168.55.3])
 by phk.freebsd.dk (Postfix) with ESMTP id 4CE474F860;
 Wed, 21 Oct 2015 17:34:29 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
 by critter.freebsd.dk (8.15.2/8.15.2) with ESMTP id t9LHYSRi035099;
 Wed, 21 Oct 2015 17:34:29 GMT (envelope-from phk@frebsd.org)
To: Andriy Gapon <avg@FreeBSD.org>
cc: freebsd-hackers <freebsd-hackers@FreeBSD.org>,
 Jung-uk Kim <jkim@FreeBSD.org>
Subject: Re: instability of timekeeping
In-reply-to: <56274FFC.2000608@FreeBSD.org>
From: "Poul-Henning Kamp" <phk@frebsd.org>
References: <56261398.60102@FreeBSD.org> <56261FE6.90302@FreeBSD.org>
 <56274FFC.2000608@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-ID: <35097.1445448868.1@critter.freebsd.dk>
Content-Transfer-Encoding: 8bit
Date: Wed, 21 Oct 2015 17:34:28 +0000
Message-ID: <35098.1445448868@critter.freebsd.dk>
X-Mailman-Approved-At: Wed, 21 Oct 2015 17:39:51 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Oct 2015 17:34:37 -0000

--------

> Another thing that occurred to me is that TSC-low.mask /
> TSC-low.frequency ≈ 2.7 seconds.

That's why I suspect timecounters aren't being wound up as
they should be.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

From owner-freebsd-hackers@freebsd.org  Wed Oct 21 18:48:56 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF388A1AD2C
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed, 21 Oct 2015 18:48:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7C8591D3;
 Wed, 21 Oct 2015 18:48:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id t9LImorT015089
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Wed, 21 Oct 2015 21:48:51 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua t9LImorT015089
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id t9LImol2015088;
 Wed, 21 Oct 2015 21:48:50 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 21 Oct 2015 21:48:50 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@FreeBSD.org>
Cc: freebsd-hackers <freebsd-hackers@FreeBSD.org>,
 Poul-Henning Kamp <phk@phk.freebsd.dk>, Jung-uk Kim <jkim@FreeBSD.org>
Subject: Re: instability of timekeeping
Message-ID: <20151021184850.GX2257@kib.kiev.ua>
References: <56261398.60102@FreeBSD.org> <56261FE6.90302@FreeBSD.org>
 <56274FFC.2000608@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <56274FFC.2000608@FreeBSD.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Oct 2015 18:48:57 -0000

On Wed, Oct 21, 2015 at 11:42:36AM +0300, Andriy Gapon wrote:
> On 20/10/2015 14:05, Andriy Gapon wrote:
> > I performed a small observation.  With ntpd disabled I ran `ntpdate -d` at 10
> > second intervals in a loop (done via `sleep 10`).  It looks like for about 25
> > minutes the time offset between a reference server and my machine was quite
> > stable. But then it sort of jumped about 2.5 seconds between two consecutive
> > ntpdate invocations.
> [snip]
> > On 20/10/2015 13:12, Andriy Gapon wrote:
> [snip]
> >> kern.timecounter.tc.TSC-low.frequency: 1607357461
> >> kern.timecounter.tc.TSC-low.counter: 2457319922
> >> kern.timecounter.tc.TSC-low.mask: 4294967295
> [snip]
> 
> Another observation and a hypothesis.
> 
> I tried time counters other than TSC and I couldn't reproduce the issue with them.
> 
> Another thing that occurred to me is that TSC-low.mask / TSC-low.frequency � 2.7
> seconds.
> 
> >From these observations and reading some code comes my hypothesis.
> First, I assume that visible values of TSCs on different cores are not perfectly
> synchronized. Second, I think that there can be circumstances where
> tc_ticktock() -> tc_windup() can get called on different cores sufficiently
> close in time that the later call would see TSC value which is "before"
> (sequentially smaller) than the TSC value read earlier (on the other core).  In
> that case the delta between the readings would be close to TSC-low.mask.
> 
> Right now I do not have any proof that what the hypothesis says is what actually
> happens.  On the other hand, I do not see anything that would prevent the
> hypothesized situation from occurring.
> 
> To add more weight to the hypothesis:
> cpu1:invltlb                      674384          4
> cpu1:invlrng                      453020          3
> cpu1:invlpg                    108772578        652
> cpu1:preempt                    37435000        224
> cpu1:ast                           36757          0
> cpu1:rendezvous                     9473          0
> cpu1:hardclock                  22267434        133
> As you can see I am currently running workloads that result in a very
> significant number of IPIs, especially the page invalidation IPIs.  Given that
> all (x86) IPIs have the same priority (based on their vector numbers) I think
> it's plausible that the hardclock IPI could get arbitrarily delayed.
> 
> I guess I could add a tracepoint to record deltas that are close to a current
> timecounter's mask.
> 
> Assuming the hypothesis is correct I see two possible ways to work-around the
> problem:
> 
> 1. Increase tsc_shift, so that the cross-CPU TSC differences are smaller (at the
> cost of lower resolution).  That should reduce the risk of seeing "backwards"
> TSC values.  Judging from numbers that tools/tools/tscdrift produces I can set
> tsc_shift to 7.  The resulting resolution should not be worse than that of HPET
> or ACPI-fast counters with the benefit of TSC being much faster to read.
> 
> 2. Change the code, so that tc_windup() is always called on the same CPU.  E.g.
> it could be the BSP or a CPU that receives the actual timer interrupts (as
> opposed to the hardclock IPIs).  This should help with the timekeeping, but
> won't help with the "jitter" in binuptime() and friends.
> 
> 3. In tc_delta() somehow detect and filter out "slightly behind" timecounter
> readings.  Not sure if this is possible at all.

Am I right that the tsc synchronization test passes on your machine ? If
yes, you probably cannot read 'slightly behind' timecounter after IPI
on other core. Might be, try to change CPUID instruction in the test to
MFENCE and see if the test still able to pass.

Does the symptom disappear if you switch the eventtimer to LAPIC ?
What happens if you turn off usermode gettimeofday()
by setting kern.timercounter.fast_gettime to 0 ?

From owner-freebsd-hackers@freebsd.org  Thu Oct 22 06:28:33 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5026BA1C58A
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu, 22 Oct 2015 06:28:33 +0000 (UTC)
 (envelope-from ganael.laplanche@corp.ovh.com)
Received: from 9.mo175.mail-out.ovh.net (9.mo175.mail-out.ovh.net
 [46.105.54.132])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0F952123C
 for <freebsd-hackers@freebsd.org>; Thu, 22 Oct 2015 06:28:32 +0000 (UTC)
 (envelope-from ganael.laplanche@corp.ovh.com)
Received: from ex2.OVH.local (corp.ovh.com [5.196.251.137])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (No client certificate requested)
 by mo175.mail-out.ovh.net (Postfix) with ESMTPS id 9C3B5FF8265;
 Thu, 22 Oct 2015 07:33:33 +0200 (CEST)
Received: from desk533202.ovh.net (5.196.2.34) by ex2.OVH.local (172.16.7.2)
 with Microsoft SMTP Server (TLS) id 15.1.225.42; Thu, 22 Oct 2015 07:33:33
 +0200
From: Ganael Laplanche <ganael.laplanche@corp.ovh.com>
Organization: OVH
To: Eric McCorkle <eric@metricspace.net>
Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs
Date: Thu, 22 Oct 2015 07:33:32 +0200
User-Agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; )
CC: <freebsd-hackers@freebsd.org>
References: <56211825.3080403@metricspace.net>
 <201510211345.29460.ganael.laplanche@corp.ovh.com>
 <5ED4AE5E-8508-409B-B323-2CBFEBE77088@metricspace.net>
In-Reply-To: <5ED4AE5E-8508-409B-B323-2CBFEBE77088@metricspace.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-15"
Content-Transfer-Encoding: quoted-printable
Message-ID: <201510220733.32997.ganael.laplanche@corp.ovh.com>
X-Originating-IP: [5.196.2.34]
X-ClientProxiedBy: cas02.OVH.local (172.16.1.2) To ex2.OVH.local (172.16.7.2)
X-Ovh-Tracer-Id: 8443404878081145384
X-VR-SPAMSTATE: OK
X-VR-SPAMSCORE: -100
X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekhedrtdefucetufdoteggodftvfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Oct 2015 06:28:33 -0000

On Wednesday, October 21, 2015 05:28:52 PM Eric McCorkle wrote:

Hi Eric,

> Based on these tests, the reports of successful testing of UFS loading, t=
he
> fact that PCBSD and NextBSD have apparently picked it up, and the fact
> that I've been using it for months, I think it's time to move towards
> getting it committed.
>
> I'll post an updated patch by the end of the week.

Good news :)

Best regards,

=2D-=20
Gana=EBl LAPLANCHE <ganael.laplanche@corp.ovh.com>

From owner-freebsd-hackers@freebsd.org  Thu Oct 22 06:48:34 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2FDD0A1C905
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu, 22 Oct 2015 06:48:34 +0000 (UTC)
 (envelope-from ganael.laplanche@corp.ovh.com)
Received: from 6.mo175.mail-out.ovh.net (6.mo175.mail-out.ovh.net
 [46.105.47.107])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id EAACB1C6C
 for <freebsd-hackers@freebsd.org>; Thu, 22 Oct 2015 06:48:32 +0000 (UTC)
 (envelope-from ganael.laplanche@corp.ovh.com)
Received: from ex2.OVH.local (corp.ovh.com [5.196.251.137])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (No client certificate requested)
 by mo175.mail-out.ovh.net (Postfix) with ESMTPS id C5342FF8279;
 Thu, 22 Oct 2015 07:31:48 +0200 (CEST)
Received: from desk533202.ovh.net (5.196.2.34) by ex2.OVH.local (172.16.7.2)
 with Microsoft SMTP Server (TLS) id 15.1.225.42; Thu, 22 Oct 2015 07:31:48
 +0200
From: Ganael Laplanche <ganael.laplanche@corp.ovh.com>
Organization: OVH
To: Karl Denninger <karl@denninger.net>
Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs
Date: Thu, 22 Oct 2015 07:31:48 +0200
User-Agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; )
CC: <freebsd-hackers@freebsd.org>
References: <56211825.3080403@metricspace.net>
 <201510211345.29460.ganael.laplanche@corp.ovh.com>
 <56279F66.5030303@denninger.net>
In-Reply-To: <56279F66.5030303@denninger.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-15"
Content-Transfer-Encoding: quoted-printable
Message-ID: <201510220731.48157.ganael.laplanche@corp.ovh.com>
X-Originating-IP: [5.196.2.34]
X-ClientProxiedBy: cas02.OVH.local (172.16.1.2) To ex2.OVH.local (172.16.7.2)
X-Ovh-Tracer-Id: 8413850006895114841
X-VR-SPAMSTATE: OK
X-VR-SPAMSCORE: -100
X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekhedrtdefucetufdoteggodftvfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Oct 2015 06:48:34 -0000

On Wednesday, October 21, 2015 04:21:26 PM Karl Denninger wrote:

Hi Karl,

> Will loader.rc take all the parameters (e.g. loading of geli, aesni,
> geom, setting kernel parameters, etc) that loader.conf does?

=46or that purpose, you'll have to include appropriate .4th files from with=
in=20
loader.rc. See sys/boot/README (within sources), loader.conf(5) and your=20
/boot/loader.rc for more details.

Best regards,

=2D-=20
Gana=EBl LAPLANCHE <ganael.laplanche@corp.ovh.com>

From owner-freebsd-hackers@freebsd.org  Thu Oct 22 18:14:00 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8C780A1C4DA;
 Thu, 22 Oct 2015 18:14:00 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6A6751ACA;
 Thu, 22 Oct 2015 18:14:00 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net
 [73.231.226.104])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 27748B9BB;
 Thu, 22 Oct 2015 14:13:58 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-hardware@freebsd.org
Cc: Dieter BSD <dieterbsd@gmail.com>, freebsd-hackers@freebsd.org
Subject: Re: ECC support
Date: Thu, 22 Oct 2015 11:09:50 -0700
Message-ID: <1492434.22kxSKhHEJ@ralph.baldwin.cx>
User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; )
In-Reply-To: <CAA3ZYrDjTNM7AShdpFOjT-3wZnEV2u-2X6MnLksON61bw7=XiQ@mail.gmail.com>
References: <CAA3ZYrDjTNM7AShdpFOjT-3wZnEV2u-2X6MnLksON61bw7=XiQ@mail.gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Thu, 22 Oct 2015 14:13:59 -0400 (EDT)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Oct 2015 18:14:00 -0000

On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote:
> Chris:
> > MCA: Bank 1, Status 0x9400000000000151
> > MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000
> > MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2
> >
> > MCA: Address 0x81cc0e9f0
> >
> > Kind of freaky. I've never had this error on this board before.
> > On others tho.
> >
> > Try a search for MCA instead.
> 
> Is there a decoder ring for those messages?  I don't recall seeing
> messages like that, although I wasn't looking for them, and they
> don't leap out at you screaming ERROR! ERROR!  Digital Unix had its
> problems, but at least the error messages were fairly clear.
> Something like "single bit memory error at address 0x12345..."
> A simple edit to sys/x86/x86/mca.c
>    s/printf("UNCOR ");/printf("Uncorrectable ");/
>    s/printf("COR ");/printf("Correctable ");/
> would make the messages at least slightly more meaningful to a viewer
> who isn't intimently(sp) familiar with the mca.  Which most people aren't.

The problem is that there are other fields to decode and you can only fit so
much in one line.  Also, there is not a CPU-independent way to know the
address of an ECC error.  On Intel Core i3/5/7 (anything with QPI) you can
identify the individual DIMM at least, but the label that the motherboard
manufacturer uses varies by manufacturer.  (You can maybe scrape that text
from the SMBIOS tables, but only if they aren't wrong which they sometimes
are, and good luck knowing if they are wrong or right.)  Digital UNIX had the
luxury of running on hardware built by the same company, not on a random
assortment of boards built by various vendors.  FreeBSD does not.

sysutils/mcelog does some more verbose decoding of MCA records, but I find
it to be equally gibberish for anyone not intimately familiar with a specific
CPU.

I wrote a tool for a previous employer that was able to do some simple parsing
of MCA errors for Supermicro X7-X10 boards (Intel CPUs) and give a short
summary that was used in a nagios check.  However, it only handles a narrow
set of systems.

https://github.com/freebsd/freebsd/compare/master...bsdjhb:ecc

-- 
John Baldwin

From owner-freebsd-hackers@freebsd.org  Thu Oct 22 18:57:41 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 95046A1CED2;
 Thu, 22 Oct 2015 18:57:41 +0000 (UTC) (envelope-from rb@gid.co.uk)
Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3ED9411E3;
 Thu, 22 Oct 2015 18:57:41 +0000 (UTC) (envelope-from rb@gid.co.uk)
Received: from [194.32.164.24] ([194.32.164.24])
 by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id t9MInDTL087303;
 Thu, 22 Oct 2015 19:49:13 +0100 (BST) (envelope-from rb@gid.co.uk)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
Subject: Re: ECC support
From: Bob Bishop <rb@gid.co.uk>
In-Reply-To: <1492434.22kxSKhHEJ@ralph.baldwin.cx>
Date: Thu, 22 Oct 2015 19:49:13 +0100
Cc: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org,
 Dieter BSD <dieterbsd@gmail.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <74705089-408A-4FD3-899E-CA677390F855@gid.co.uk>
References: <CAA3ZYrDjTNM7AShdpFOjT-3wZnEV2u-2X6MnLksON61bw7=XiQ@mail.gmail.com>
 <1492434.22kxSKhHEJ@ralph.baldwin.cx>
To: John Baldwin <jhb@freebsd.org>
X-Mailer: Apple Mail (2.2104)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Oct 2015 18:57:41 -0000

HI,

> On 22 Oct 2015, at 19:09, John Baldwin <jhb@freebsd.org> wrote:
>=20
> On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote:
>> Chris:
>>> MCA: Bank 1, Status 0x9400000000000151
>>> MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000
>>> MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2
>>>=20
>>> MCA: Address 0x81cc0e9f0
>>>=20
>>> Kind of freaky. I've never had this error on this board before.
>>> On others tho.
>>>=20
>>> Try a search for MCA instead.
>>=20
>> Is there a decoder ring for those messages?  I don't recall seeing
>> messages like that, although I wasn't looking for them, and they
>> don't leap out at you screaming ERROR! ERROR!  Digital Unix had its
>> problems, but at least the error messages were fairly clear.
>> Something like "single bit memory error at address 0x12345..."
>> A simple edit to sys/x86/x86/mca.c
>>   s/printf("UNCOR ");/printf("Uncorrectable ");/
>>   s/printf("COR ");/printf("Correctable ");/
>> would make the messages at least slightly more meaningful to a viewer
>> who isn't intimently(sp) familiar with the mca.  Which most people =
aren't.
>=20
> The problem is that there are other fields to decode and you can only =
fit so
> much in one line.  Also, there is not a CPU-independent way to know =
the
> address of an ECC error. [etc]

On server-class hardware, the platform management (BMC or whatever) is =
probably decoding this stuff for event logs and can be interrogated via =
IPMI (or whatever).

--
Bob Bishop
rb@gid.co.uk


From owner-freebsd-hackers@freebsd.org  Thu Oct 22 21:17:24 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2136FA1CB25;
 Thu, 22 Oct 2015 21:17:24 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id F20E4804;
 Thu, 22 Oct 2015 21:17:23 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net
 [73.231.226.104])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 5362FB94F;
 Thu, 22 Oct 2015 17:17:22 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Bob Bishop <rb@gid.co.uk>
Cc: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org,
 Dieter BSD <dieterbsd@gmail.com>
Subject: Re: ECC support
Date: Thu, 22 Oct 2015 14:17:07 -0700
Message-ID: <1483396.WZc3qgD2yz@ralph.baldwin.cx>
User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; )
In-Reply-To: <74705089-408A-4FD3-899E-CA677390F855@gid.co.uk>
References: <CAA3ZYrDjTNM7AShdpFOjT-3wZnEV2u-2X6MnLksON61bw7=XiQ@mail.gmail.com>
 <1492434.22kxSKhHEJ@ralph.baldwin.cx>
 <74705089-408A-4FD3-899E-CA677390F855@gid.co.uk>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Thu, 22 Oct 2015 17:17:22 -0400 (EDT)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Oct 2015 21:17:24 -0000

On Thursday, October 22, 2015 07:49:13 PM Bob Bishop wrote:
> HI,
> 
> > On 22 Oct 2015, at 19:09, John Baldwin <jhb@freebsd.org> wrote:
> > 
> > On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote:
> >> Chris:
> >>> MCA: Bank 1, Status 0x9400000000000151
> >>> MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000
> >>> MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2
> >>> 
> >>> MCA: Address 0x81cc0e9f0
> >>> 
> >>> Kind of freaky. I've never had this error on this board before.
> >>> On others tho.
> >>> 
> >>> Try a search for MCA instead.
> >> 
> >> Is there a decoder ring for those messages?  I don't recall seeing
> >> messages like that, although I wasn't looking for them, and they
> >> don't leap out at you screaming ERROR! ERROR!  Digital Unix had its
> >> problems, but at least the error messages were fairly clear.
> >> Something like "single bit memory error at address 0x12345..."
> >> A simple edit to sys/x86/x86/mca.c
> >>   s/printf("UNCOR ");/printf("Uncorrectable ");/
> >>   s/printf("COR ");/printf("Correctable ");/
> >> would make the messages at least slightly more meaningful to a viewer
> >> who isn't intimently(sp) familiar with the mca.  Which most people aren't.
> > 
> > The problem is that there are other fields to decode and you can only fit so
> > much in one line.  Also, there is not a CPU-independent way to know the
> > address of an ECC error. [etc]
> 
> On server-class hardware, the platform management (BMC or whatever) is probably decoding this stuff for event logs and can be interrogated via IPMI (or whatever).

Not always well and not always with side effects you want.  On Core 2 and
Nehalem i7 class hardware I measured that it took on the order of 400
milliseconds (not micro) in SMM (system management mode, so your entire
OS is halted) to write out each log entry to NVRAM.  At least one place I
worked at turned the BIOS ECC logging off because that delay was too costly.

Also, even though your BMC may log it, the format for doing so isn't
standard.  The details such as the affected DIMM are in the OEM bits of
the log record, so not something you can easily extract from, say,
ipmitool sel elist.  You'd have to log into the BIOS itself (or the BMC's
web UI) to see which DIMM is affected.  Neither of those are really great
for automated reporting.

-- 
John Baldwin

From owner-freebsd-hackers@freebsd.org  Fri Oct 23 11:19:07 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 34966A1C2FC
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Fri, 23 Oct 2015 11:19:07 +0000 (UTC)
 (envelope-from eric@metricspace.net)
Received: from mail.metricspace.net
 (207-172-209-83.c3-0.arl-ubr1.sbo-arl.ma.static.cable.rcn.com
 [207.172.209.83])
 by mx1.freebsd.org (Postfix) with ESMTP id 06CA818F1
 for <freebsd-hackers@freebsd.org>; Fri, 23 Oct 2015 11:19:06 +0000 (UTC)
 (envelope-from eric@metricspace.net)
Received: from [IPv6:2001:470:1f11:617:ea2a:eaff:fe21:e067] (unknown
 [IPv6:2001:470:1f11:617:ea2a:eaff:fe21:e067])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client did not present a certificate) (Authenticated sender: eric)
 by mail.metricspace.net (Postfix) with ESMTPSA id E02091C92
 for <freebsd-hackers@freebsd.org>; Fri, 23 Oct 2015 11:18:59 +0000 (UTC)
Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs
To: freebsd-hackers@freebsd.org
References: <56211825.3080403@metricspace.net>
 <201510211345.29460.ganael.laplanche@corp.ovh.com>
 <5ED4AE5E-8508-409B-B323-2CBFEBE77088@metricspace.net>
 <201510220733.32997.ganael.laplanche@corp.ovh.com>
From: Eric McCorkle <eric@metricspace.net>
Message-ID: <562A17A3.6090803@metricspace.net>
Date: Fri, 23 Oct 2015 07:18:59 -0400
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101
 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <201510220733.32997.ganael.laplanche@corp.ovh.com>
Content-Type: multipart/mixed; boundary="------------070902030409020404010705"
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Oct 2015 11:19:07 -0000

This is a multi-part message in MIME format.
--------------070902030409020404010705
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 7bit

This is a patch pulled fresh from my /usr/src after an svn update. 
Therefore, it should represent a patch against the current head

On 10/22/15 01:33, Ganael Laplanche wrote:
> On Wednesday, October 21, 2015 05:28:52 PM Eric McCorkle wrote:
>
> Hi Eric,
>
>> Based on these tests, the reports of successful testing of UFS loading, the
>> fact that PCBSD and NextBSD have apparently picked it up, and the fact
>> that I've been using it for months, I think it's time to move towards
>> getting it committed.
>>
>> I'll post an updated patch by the end of the week.
>
> Good news :)
>
> Best regards,
>

--------------070902030409020404010705
Content-Type: text/x-patch;
 name="zfs_efi_curr.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="zfs_efi_curr.diff"

Index: sys/boot/efi/boot1/Makefile
===================================================================
--- sys/boot/efi/boot1/Makefile	(revision 289821)
+++ sys/boot/efi/boot1/Makefile	(working copy)
@@ -13,7 +13,7 @@
 INTERNALPROG=
 
 # architecture-specific loader code
-SRCS=	boot1.c self_reloc.c start.S
+SRCS=	boot1.c self_reloc.c start.S ufs_module.c zfs_module.c
 
 CFLAGS+=	-I.
 CFLAGS+=	-I${.CURDIR}/../include
@@ -20,6 +20,8 @@
 CFLAGS+=	-I${.CURDIR}/../include/${MACHINE}
 CFLAGS+=	-I${.CURDIR}/../../../contrib/dev/acpica/include
 CFLAGS+=	-I${.CURDIR}/../../..
+CFLAGS+=	-I${.CURDIR}/../../zfs/
+CFLAGS+=	-I${.CURDIR}/../../../cddl/boot/zfs/
 
 # Always add MI sources and REGULAR efi loader bits
 .PATH:		${.CURDIR}/../loader/arch/${MACHINE}
Index: sys/boot/efi/boot1/boot1.c
===================================================================
--- sys/boot/efi/boot1/boot1.c	(revision 289821)
+++ sys/boot/efi/boot1/boot1.c	(working copy)
@@ -5,6 +5,8 @@
  * All rights reserved.
  * Copyright (c) 2014 Nathan Whitehorn
  * All rights reserved.
+ * Copyright (c) 2014 Eric McCorkle
+ * All rights reverved.
  *
  * Redistribution and use in source and binary forms are freely
  * permitted provided that the above copyright notice and this
@@ -21,7 +23,6 @@
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
-#include <sys/dirent.h>
 #include <machine/elf.h>
 #include <machine/stdarg.h>
 
@@ -28,6 +29,8 @@
 #include <efi.h>
 #include <eficonsctl.h>
 
+#include "boot_module.h"
+
 #define _PATH_LOADER	"/boot/loader.efi"
 #define _PATH_KERNEL	"/boot/kernel/kernel"
 
@@ -41,14 +44,20 @@
 	u_int	sp_size;
 };
 
+static const boot_module_t* const boot_modules[] =
+{
+#ifdef ZFS_EFI_BOOT
+        &zfs_module,
+#endif
+#ifdef UFS_EFI_BOOT
+        &ufs_module
+#endif
+};
+
+#define NUM_BOOT_MODULES (sizeof(boot_modules) / sizeof(boot_module_t*))
+
 static const char digits[] = "0123456789abcdef";
 
-static void panic(const char *fmt, ...) __dead2;
-static int printf(const char *fmt, ...);
-static int putchar(char c, void *arg);
-static int vprintf(const char *fmt, va_list ap);
-static int vsnprintf(char *str, size_t sz, const char *fmt, va_list ap);
-
 static int __printf(const char *fmt, putc_func_t *putc, void *arg, va_list ap);
 static int __putc(char c, void *arg);
 static int __puts(const char *s, putc_func_t *putc, void *arg);
@@ -62,9 +71,80 @@
 static EFI_SYSTEM_TABLE *systab;
 static EFI_HANDLE *image;
 
-static void
-bcopy(const void *src, void *dst, size_t len)
+
+void* Malloc(size_t len, const char* file, int line)
 {
+        void* out;
+        if (systab->BootServices->AllocatePool(EfiLoaderData,
+                                               len, &out) !=
+            EFI_SUCCESS) {
+                printf("Can't allocate memory pool\n");
+                return NULL;
+        }
+        return out;
+}
+
+char* strcpy(char* dst, const char* src) {
+        for(int i = 0; src[i]; i++)
+                dst[i] = src[i];
+
+        return dst;
+}
+
+char* strchr(const char* s, int c) {
+        for(int i = 0; s[i]; i++)
+                if (s[i] == c)
+                        return (char*)(s + i);
+
+        return NULL;
+}
+
+int strncmp(const char *a, const char *b, size_t len)
+{
+        for (int i = 0; i < len; i++)
+                if(a[i] == '\0' && b[i] == '\0') {
+                        return 0;
+                } else if(a[i] < b[i]) {
+                        return -1;
+                } else if(a[i] > b[i]) {
+                        return 1;
+                }
+
+        return 0;
+}
+
+char* strdup(const char* s) {
+        int len;
+
+        for(len = 1; s[len]; len++);
+
+        char* out = malloc(len);
+
+        for(int i = 0; i < len; i++)
+                out[i] = s[i];
+
+        return out;
+}
+
+int bcmp(const void *a, const void *b, size_t len)
+{
+        const char *sa = a;
+        const char *sb = b;
+
+        for (int i = 0; i < len; i++)
+                if(sa[i] != sb[i])
+                        return 1;
+
+        return 0;
+}
+
+int memcmp(const void *a, const void *b, size_t len)
+{
+        return bcmp(a, b, len);
+}
+
+void bcopy(const void *src, void *dst, size_t len)
+{
 	const char *s = src;
 	char *d = dst;
 
@@ -72,23 +152,24 @@
 		*d++ = *s++;
 }
 
-static void
-memcpy(void *dst, const void *src, size_t len)
+void* memcpy(void *dst, const void *src, size_t len)
 {
 	bcopy(src, dst, len);
+        return dst;
 }
 
-static void
-bzero(void *b, size_t len)
+
+void* memset(void *b, int val, size_t len)
 {
 	char *p = b;
 
 	while (len-- != 0)
-		*p++ = 0;
+		*p++ = val;
+
+        return b;
 }
 
-static int
-strcmp(const char *s1, const char *s2)
+int strcmp(const char *s1, const char *s2)
 {
 	for (; *s1 == *s2 && *s1; s1++, s2++)
 		;
@@ -95,30 +176,99 @@
 	return ((u_char)*s1 - (u_char)*s2);
 }
 
+int putchr(char c, void *arg)
+{
+	CHAR16 buf[2];
+
+	if (c == '\n') {
+		buf[0] = '\r';
+		buf[1] = 0;
+		systab->ConOut->OutputString(systab->ConOut, buf);
+	}
+	buf[0] = c;
+	buf[1] = 0;
+        systab->ConOut->OutputString(systab->ConOut, buf);
+	return (1);
+}
+
 static EFI_GUID BlockIoProtocolGUID = BLOCK_IO_PROTOCOL;
 static EFI_GUID DevicePathGUID = DEVICE_PATH_PROTOCOL;
+static EFI_GUID ConsoleControlGUID = EFI_CONSOLE_CONTROL_PROTOCOL_GUID;
 static EFI_GUID LoadedImageGUID = LOADED_IMAGE_PROTOCOL;
-static EFI_GUID ConsoleControlGUID = EFI_CONSOLE_CONTROL_PROTOCOL_GUID;
 
-static EFI_BLOCK_IO *bootdev;
-static EFI_DEVICE_PATH *bootdevpath;
-static EFI_HANDLE *bootdevhandle;
+#define MAX_DEVS 128
 
-EFI_STATUS efi_main(EFI_HANDLE Ximage, EFI_SYSTEM_TABLE* Xsystab)
+void try_load(const boot_module_t* const mod,
+              const dev_info_t devs[],
+              size_t ndevs)
 {
-	EFI_HANDLE handles[128];
+        int idx;
+        size_t bufsize;
+        void* const buffer = mod->load(devs, ndevs, _PATH_LOADER, &idx, &bufsize);
+        EFI_HANDLE loaderhandle;
+        EFI_LOADED_IMAGE *loaded_image;
+
+        if (NULL == buffer) {
+                printf("Could not load file\n");
+                return;
+        }
+        //printf("Loaded file %s, image at %p\n"
+        //       "Attempting to load as bootable image...",
+        //       _PATH_LOADER, image);
+        if (systab->BootServices->LoadImage(TRUE, image, devs[idx].devpath,
+                                            buffer, bufsize, &loaderhandle) !=
+            EFI_SUCCESS) {
+                //printf("failed\n");
+                return;
+        }
+        //printf("success\n"
+        //       "Preparing to execute image...");
+
+        if (systab->BootServices->HandleProtocol(loaderhandle,
+                                                 &LoadedImageGUID,
+                                                 (VOID**)&loaded_image) !=
+            EFI_SUCCESS) {
+                //printf("failed\n");
+                return;
+        }
+
+        //printf("success\n");
+
+        loaded_image->DeviceHandle =  devs[idx].devhandle;
+
+	//printf("Image prepared, attempting to execute\n");
+        // XXX Set up command args first
+        if (systab->BootServices->StartImage(loaderhandle, NULL, NULL) !=
+            EFI_SUCCESS) {
+                //printf("Failed to execute loader\n");
+                return;
+        }
+        //printf("Shouldn't be here!\n");
+}
+
+void efi_main(EFI_HANDLE Ximage, EFI_SYSTEM_TABLE* Xsystab)
+{
+	EFI_HANDLE handles[MAX_DEVS];
+        dev_info_t module_devs[NUM_BOOT_MODULES][MAX_DEVS];
+        size_t dev_offsets[NUM_BOOT_MODULES];
 	EFI_BLOCK_IO *blkio;
-	UINTN i, nparts = sizeof(handles), cols, rows, max_dim, best_mode;
+	UINTN nparts = sizeof(handles);
 	EFI_STATUS status;
 	EFI_DEVICE_PATH *devpath;
 	EFI_BOOT_SERVICES *BS;
 	EFI_CONSOLE_CONTROL_PROTOCOL *ConsoleControl = NULL;
 	SIMPLE_TEXT_OUTPUT_INTERFACE *conout = NULL;
-	char *path = _PATH_LOADER;
 
+        // Basic initialization
 	systab = Xsystab;
 	image = Ximage;
 
+        for(int i = 0; i < NUM_BOOT_MODULES; i++)
+        {
+                dev_offsets[i] = 0;
+        }
+
+        // Set up the console, so printf works.
 	BS = systab->BootServices;
 	status = BS->LocateProtocol(&ConsoleControlGUID, NULL,
 	    (VOID **)&ConsoleControl);
@@ -128,10 +278,14 @@
 	/*
 	 * Reset the console and find the best text mode.
 	 */
+        UINTN max_dim;
+        UINTN best_mode;
+        UINTN cols;
+        UINTN rows;
 	conout = systab->ConOut;
 	conout->Reset(conout, TRUE);
 	max_dim = best_mode = 0;
-	for (i = 0; ; i++) {
+	for (int i = 0; ; i++) {
 		status = conout->QueryMode(conout, i,
 		    &cols, &rows);
 		if (EFI_ERROR(status))
@@ -141,6 +295,7 @@
 			best_mode = i;
 		}
 	}
+
 	if (max_dim > 0)
 		conout->SetMode(conout, best_mode);
 	conout->EnableCursor(conout, TRUE);
@@ -147,206 +302,94 @@
 	conout->ClearScreen(conout);
 
 	printf("\n"
-	       ">> FreeBSD EFI boot block\n");
-	printf("   Loader path: %s\n", path);
+	       ">> FreeBSD ZFS-enabled EFI boot block\n");
+	printf("   Loader path: %s\n\n", _PATH_LOADER);
 
+	printf("   Initializing modules:");
+        for(int i = 0; i < NUM_BOOT_MODULES; i++)
+        {
+                if (NULL != boot_modules[i])
+                {
+                        printf(" %s", boot_modules[i]->name);
+                        boot_modules[i]->init(image, systab, BS);
+                }
+        }
+        putchr('\n', NULL);
+
+        // Get all the device handles
 	status = systab->BootServices->LocateHandle(ByProtocol,
 	    &BlockIoProtocolGUID, NULL, &nparts, handles);
 	nparts /= sizeof(handles[0]);
+	//printf("   Scanning %lu device handles\n", nparts);
 
-	for (i = 0; i < nparts; i++) {
+        // Scan all partitions, probing with all modules.
+	for (int i = 0; i < nparts; i++) {
+                dev_info_t devinfo;
+
+                // Figure out if we're dealing with an actual partition
 		status = systab->BootServices->HandleProtocol(handles[i],
 		    &DevicePathGUID, (void **)&devpath);
-		if (EFI_ERROR(status))
+		if (EFI_ERROR(status)) {
+                        //printf("        Not a device path protocol\n");
 			continue;
+                }
 
-		while (!IsDevicePathEnd(NextDevicePathNode(devpath)))
+		while (!IsDevicePathEnd(NextDevicePathNode(devpath))) {
+                        //printf("        Advancing to next device\n");
 			devpath = NextDevicePathNode(devpath);
+                }
 
 		status = systab->BootServices->HandleProtocol(handles[i],
 		    &BlockIoProtocolGUID, (void **)&blkio);
-		if (EFI_ERROR(status))
+		if (EFI_ERROR(status)) {
+                        //printf("        Not a block device\n");
 			continue;
+                }
 
-		if (!blkio->Media->LogicalPartition)
+		if (!blkio->Media->LogicalPartition) {
+                        //printf("        Logical partition\n");
 			continue;
+                }
 
-		if (domount(devpath, blkio, 1) >= 0)
-			break;
-	}
+                // Setup devinfo
+                devinfo.dev = blkio;
+                devinfo.devpath = devpath;
+                devinfo.devhandle = handles[i];
+                devinfo.devdata = NULL;
 
-	if (i == nparts)
-		panic("No bootable partition found");
-
-	bootdevhandle = handles[i];
-	load(path);
-
-	panic("Load failed");
-
-	return EFI_SUCCESS;
-}
-
-static int
-dskread(void *buf, u_int64_t lba, int nblk)
-{
-	EFI_STATUS status;
-	int size;
-
-	lba = lba / (bootdev->Media->BlockSize / DEV_BSIZE);
-	size = nblk * DEV_BSIZE;
-	status = bootdev->ReadBlocks(bootdev, bootdev->Media->MediaId, lba,
-	    size, buf);
-
-	if (EFI_ERROR(status))
-		return (-1);
-
-	return (0);
-}
-
-#include "ufsread.c"
-
-static ssize_t
-fsstat(ufs_ino_t inode)
-{
-#ifndef UFS2_ONLY
-	static struct ufs1_dinode dp1;
-	ufs1_daddr_t addr1;
-#endif
-#ifndef UFS1_ONLY
-	static struct ufs2_dinode dp2;
-#endif
-	static struct fs fs;
-	static ufs_ino_t inomap;
-	char *blkbuf;
-	void *indbuf;
-	size_t n, nb, size, off, vboff;
-	ufs_lbn_t lbn;
-	ufs2_daddr_t addr2, vbaddr;
-	static ufs2_daddr_t blkmap, indmap;
-	u_int u;
-
-	blkbuf = dmadat->blkbuf;
-	indbuf = dmadat->indbuf;
-	if (!dsk_meta) {
-		inomap = 0;
-		for (n = 0; sblock_try[n] != -1; n++) {
-			if (dskread(dmadat->sbbuf, sblock_try[n] / DEV_BSIZE,
-			    SBLOCKSIZE / DEV_BSIZE))
-				return -1;
-			memcpy(&fs, dmadat->sbbuf, sizeof(struct fs));
-			if ((
-#if defined(UFS1_ONLY)
-			    fs.fs_magic == FS_UFS1_MAGIC
-#elif defined(UFS2_ONLY)
-			    (fs.fs_magic == FS_UFS2_MAGIC &&
-			    fs.fs_sblockloc == sblock_try[n])
-#else
-			    fs.fs_magic == FS_UFS1_MAGIC ||
-			    (fs.fs_magic == FS_UFS2_MAGIC &&
-			    fs.fs_sblockloc == sblock_try[n])
-#endif
-			    ) &&
-			    fs.fs_bsize <= MAXBSIZE &&
-			    fs.fs_bsize >= sizeof(struct fs))
-				break;
-		}
-		if (sblock_try[n] == -1) {
-			printf("Not ufs\n");
-			return -1;
-		}
-		dsk_meta++;
-	} else
-		memcpy(&fs, dmadat->sbbuf, sizeof(struct fs));
-	if (!inode)
-		return 0;
-	if (inomap != inode) {
-		n = IPERVBLK(&fs);
-		if (dskread(blkbuf, INO_TO_VBA(&fs, n, inode), DBPERVBLK))
-			return -1;
-		n = INO_TO_VBO(n, inode);
-#if defined(UFS1_ONLY)
-		memcpy(&dp1, (struct ufs1_dinode *)blkbuf + n,
-		    sizeof(struct ufs1_dinode));
-#elif defined(UFS2_ONLY)
-		memcpy(&dp2, (struct ufs2_dinode *)blkbuf + n,
-		    sizeof(struct ufs2_dinode));
-#else
-		if (fs.fs_magic == FS_UFS1_MAGIC)
-			memcpy(&dp1, (struct ufs1_dinode *)blkbuf + n,
-			    sizeof(struct ufs1_dinode));
-		else
-			memcpy(&dp2, (struct ufs2_dinode *)blkbuf + n,
-			    sizeof(struct ufs2_dinode));
-#endif
-		inomap = inode;
-		fs_off = 0;
-		blkmap = indmap = 0;
+                // Run through each module, see if it can load this partition
+                for (int j = 0; j < NUM_BOOT_MODULES; j++ )
+                {
+                        if (NULL != boot_modules[j] &&
+                            boot_modules[j]->probe(&devinfo))
+                        {
+                                // If it can, save it to the device list for
+                                // that module
+                                module_devs[j][dev_offsets[j]++] = devinfo;
+                        }
+                }
 	}
-	size = DIP(di_size);
-	n = size - fs_off;
-	return (n);
-}
 
-static struct dmadat __dmadat;
+        // Select a partition to boot.  We do this by trying each
+        // module in order.
+        for (int i = 0; i < NUM_BOOT_MODULES; i++)
+        {
+                if (NULL != boot_modules[i])
+                {
+                        //printf("   Trying to load from %lu %s partitions\n",
+                        //       dev_offsets[i], boot_modules[i]->name);
+                        try_load(boot_modules[i], module_devs[i],
+                                 dev_offsets[i]);
+                        //printf("   Failed\n");
+                }
+        }
 
-static int
-domount(EFI_DEVICE_PATH *device, EFI_BLOCK_IO *blkio, int quiet)
-{
-
-	dmadat = &__dmadat;
-	bootdev = blkio;
-	bootdevpath = device;
-	if (fsread(0, NULL, 0)) {
-		if (!quiet)
-			printf("domount: can't read superblock\n");
-		return (-1);
-	}
-	if (!quiet)
-		printf("Succesfully mounted UFS filesystem\n");
-	return (0);
+        // If we get here, we're out of luck...
+        panic("No bootable partitions found!");
 }
 
-static void
-load(const char *fname)
+void panic(const char *fmt, ...)
 {
-	ufs_ino_t ino;
-	EFI_STATUS status;
-	EFI_HANDLE loaderhandle;
-	EFI_LOADED_IMAGE *loaded_image;
-	void *buffer;
-	size_t bufsize;
-
-	if ((ino = lookup(fname)) == 0) {
-		printf("File %s not found\n", fname);
-		return;
-	}
-
-	bufsize = fsstat(ino);
-	status = systab->BootServices->AllocatePool(EfiLoaderData,
-	    bufsize, &buffer);
-	fsread(ino, buffer, bufsize);
-
-	/* XXX: For secure boot, we need our own loader here */
-	status = systab->BootServices->LoadImage(TRUE, image, bootdevpath,
-	    buffer, bufsize, &loaderhandle);
-	if (EFI_ERROR(status))
-		printf("LoadImage failed with error %lx\n", status);
-
-	status = systab->BootServices->HandleProtocol(loaderhandle,
-	    &LoadedImageGUID, (VOID**)&loaded_image);
-	if (EFI_ERROR(status))
-		printf("HandleProtocol failed with error %lx\n", status);
-
-	loaded_image->DeviceHandle = bootdevhandle;
-
-	status = systab->BootServices->StartImage(loaderhandle, NULL, NULL);
-	if (EFI_ERROR(status))
-		printf("StartImage failed with error %lx\n", status);
-}
-
-static void
-panic(const char *fmt, ...)
-{
 	char buf[128];
 	va_list ap;
 
@@ -358,50 +401,25 @@
 	while (1) {}
 }
 
-static int
-printf(const char *fmt, ...)
+int printf(const char *fmt, ...)
 {
 	va_list ap;
 	int ret;
 
-	/* Don't annoy the user as we probe for partitions */
-	if (strcmp(fmt,"Not ufs\n") == 0)
-		return 0;
 
 	va_start(ap, fmt);
-	ret = vprintf(fmt, ap);
+	ret = __printf(fmt, putchr, 0, ap);
 	va_end(ap);
 	return (ret);
 }
 
-static int
-putchar(char c, void *arg)
+void vprintf(const char *fmt, va_list ap)
 {
-	CHAR16 buf[2];
-
-	if (c == '\n') {
-		buf[0] = '\r';
-		buf[1] = 0;
-		systab->ConOut->OutputString(systab->ConOut, buf);
-	}
-	buf[0] = c;
-	buf[1] = 0;
-	systab->ConOut->OutputString(systab->ConOut, buf);
-	return (1);
+	__printf(fmt, putchr, 0, ap);
 }
 
-static int
-vprintf(const char *fmt, va_list ap)
+int vsnprintf(char *str, size_t sz, const char *fmt, va_list ap)
 {
-	int ret;
-
-	ret = __printf(fmt, putchar, 0, ap);
-	return (ret);
-}
-
-static int
-vsnprintf(char *str, size_t sz, const char *fmt, va_list ap)
-{
 	struct sp_data sp;
 	int ret;
 
Index: sys/boot/efi/include/efilib.h
===================================================================
--- sys/boot/efi/include/efilib.h	(revision 289821)
+++ sys/boot/efi/include/efilib.h	(working copy)
@@ -43,7 +43,8 @@
 
 int efi_register_handles(struct devsw *, EFI_HANDLE *, EFI_HANDLE *, int);
 EFI_HANDLE efi_find_handle(struct devsw *, int);
-int efi_handle_lookup(EFI_HANDLE, struct devsw **, int *);
+void efi_handle_update_dev(EFI_HANDLE, struct devsw *, int, uint64_t);
+int efi_handle_lookup(EFI_HANDLE, struct devsw **, int *,  uint64_t *);
 
 int efi_status_to_errno(EFI_STATUS);
 time_t efi_time(EFI_TIME *);
Index: sys/boot/efi/libefi/handles.c
===================================================================
--- sys/boot/efi/libefi/handles.c	(revision 289821)
+++ sys/boot/efi/libefi/handles.c	(working copy)
@@ -35,6 +35,7 @@
 	EFI_HANDLE alias;
 	struct devsw *dev;
 	int unit;
+        uint64_t extra;
 };
 
 struct entry *entry;
@@ -78,8 +79,28 @@
 	return (NULL);
 }
 
+void efi_handle_update_dev(const EFI_HANDLE handle,
+                           struct devsw * const dev,
+                           int unit,
+                           uint64_t guid)
+{
+	int idx;
+
+	for (idx = 0; idx < nentries; idx++) {
+		if (entry[idx].handle != handle)
+			continue;
+		entry[idx].dev = dev;
+                entry[idx].unit = unit;
+                entry[idx].alias = NULL;
+                entry[idx].extra = guid;
+	}
+}
+
 int
-efi_handle_lookup(EFI_HANDLE h, struct devsw **dev, int *unit)
+efi_handle_lookup(EFI_HANDLE h,
+                  struct devsw **dev,
+                  int *unit,
+                  uint64_t *extra)
 {
 	int idx;
 
@@ -90,6 +111,8 @@
 			*dev = entry[idx].dev;
 		if (unit != NULL)
 			*unit = entry[idx].unit;
+                if (extra != NULL)
+                        *extra = entry[idx].extra;
 		return (0);
 	}
 	return (ENOENT);
Index: sys/boot/efi/loader/Makefile
===================================================================
--- sys/boot/efi/loader/Makefile	(revision 289821)
+++ sys/boot/efi/loader/Makefile	(working copy)
@@ -21,7 +21,8 @@
 	main.c \
 	self_reloc.c \
 	smbios.c \
-	vers.c
+	vers.c \
+	${.CURDIR}/zfs.c
 
 .PATH: ${.CURDIR}/arch/${MACHINE}
 # For smbios.c
@@ -35,6 +36,8 @@
 CFLAGS+=	-I${.CURDIR}/../../../contrib/dev/acpica/include
 CFLAGS+=	-I${.CURDIR}/../../..
 CFLAGS+=	-I${.CURDIR}/../../i386/libi386
+CFLAGS+=	-I${.CURDIR}/../../zfs
+CFLAGS+=	-I${.CURDIR}/../../../cddl/boot/zfs
 CFLAGS+=	-DNO_PCI -DEFI
 
 # make buildenv doesn't set DESTDIR, this means LIBSTAND
@@ -67,7 +70,7 @@
 CFLAGS+=	-DEFI_STAGING_SIZE=${EFI_STAGING_SIZE}
 .endif
 
-# Always add MI sources 
+# Always add MI sources
 .PATH:		${.CURDIR}/../../common
 .include	"${.CURDIR}/../../common/Makefile.inc"
 CFLAGS+=	-I${.CURDIR}/../../common
@@ -78,7 +81,7 @@
 LDSCRIPT=	${.CURDIR}/arch/${MACHINE}/ldscript.${MACHINE}
 LDFLAGS+=	-Wl,-T${LDSCRIPT} -Wl,-Bsymbolic -shared
 
-CLEANFILES+=	vers.c loader.efi
+CLEANFILES+=	zfs.c vers.c loader.efi
 
 NEWVERSWHAT=	"EFI loader" ${MACHINE}
 
@@ -85,6 +88,9 @@
 vers.c:	${.CURDIR}/../../common/newvers.sh ${.CURDIR}/../../efi/loader/version
 	sh ${.CURDIR}/../../common/newvers.sh ${.CURDIR}/version ${NEWVERSWHAT}
 
+zfs.c:
+	cp ${.CURDIR}/../../zfs/zfs.c ${.CURDIR}
+
 OBJCOPY?=	objcopy
 OBJDUMP?=	objdump
 
@@ -108,9 +114,9 @@
 
 LIBEFI=		${.OBJDIR}/../libefi/libefi.a
 
-DPADD=		${LIBFICL} ${LIBEFI} ${LIBFDT} ${LIBEFI_FDT} ${LIBSTAND} \
+DPADD=		${LIBFICL} ${LIBEFI} ${LIBFDT} ${LIBEFI_FDT} ../../../../lib/libstand/libstand.a \
 		${LDSCRIPT}
-LDADD=		${LIBFICL} ${LIBEFI} ${LIBFDT} ${LIBEFI_FDT} ${LIBSTAND}
+LDADD=		${LIBFICL} ${LIBEFI} ${LIBFDT} ${LIBEFI_FDT} ../../../../lib/libstand/libstand.a
 
 .endif # ${COMPILER_TYPE} != "gcc"
 
Index: sys/boot/efi/loader/conf.c
===================================================================
--- sys/boot/efi/loader/conf.c	(revision 289821)
+++ sys/boot/efi/loader/conf.c	(working copy)
@@ -31,14 +31,17 @@
 #include <bootstrap.h>
 #include <efi.h>
 #include <efilib.h>
+#include "../zfs/libzfs.h"
 
 struct devsw *devsw[] = {
 	&efipart_dev,
 	&efinet_dev,
+        &zfs_dev,
 	NULL
 };
 
 struct fs_ops *file_system[] = {
+        &zfs_fsops,
 	&dosfs_fsops,
 	&ufs_fsops,
 	&cd9660_fsops,
Index: sys/boot/efi/loader/devicename.c
===================================================================
--- sys/boot/efi/loader/devicename.c	(revision 289821)
+++ sys/boot/efi/loader/devicename.c	(working copy)
@@ -32,6 +32,7 @@
 #include <string.h>
 #include <sys/disklabel.h>
 #include "bootstrap.h"
+#include "libzfs.h"
 
 #include <efi.h>
 #include <efilib.h>
@@ -38,7 +39,7 @@
 
 static int efi_parsedev(struct devdesc **, const char *, const char **);
 
-/* 
+/*
  * Point (dev) at an allocated device specifier for the device matching the
  * path in (devspec). If it contains an explicit device specification,
  * use that.  If not, use the default device.
@@ -48,7 +49,6 @@
 {
 	struct devdesc **dev = (struct devdesc **)vdev;
 	int rv;
-
 	/*
 	 * If it looks like this is just a path and no device, then
 	 * use the current device instead.
@@ -61,7 +61,8 @@
 	}
 
 	/* Parse the device name off the beginning of the devspec. */
-	return (efi_parsedev(dev, devspec, path));
+	const int out = efi_parsedev(dev, devspec, path);
+        return out;
 }
 
 /*
@@ -87,8 +88,9 @@
 	int i, err;
 
 	/* minimum length check */
-	if (strlen(devspec) < 2)
+	if (strlen(devspec) < 2) {
 		return (EINVAL);
+        }
 
 	/* look for a device that matches */
 	for (i = 0; devsw[i] != NULL; i++) {
@@ -96,27 +98,39 @@
 		if (!strncmp(devspec, dv->dv_name, strlen(dv->dv_name)))
 			break;
 	}
-	if (devsw[i] == NULL)
+	if (devsw[i] == NULL) {
 		return (ENOENT);
+        }
+	np = devspec + strlen(dv->dv_name);
 
-	idev = malloc(sizeof(struct devdesc));
-	if (idev == NULL)
-		return (ENOMEM);
+        if (DEVT_ZFS == dv->dv_type) {
+                idev = malloc(sizeof(struct zfs_devdesc));
+                int out = zfs_parsedev((struct zfs_devdesc*)idev, np, path);
+                if (0 == out) {
+                        *dev = idev;
+                        cp = strchr(np + 1, ':');
+                } else {
+                        free(idev);
+                        return out;
+                }
+        } else {
+                idev = malloc(sizeof(struct devdesc));
+                if (idev == NULL)
+                        return (ENOMEM);
 
-	idev->d_dev = dv;
-	idev->d_type = dv->dv_type;
-	idev->d_unit = -1;
-
+                idev->d_dev = dv;
+                idev->d_type = dv->dv_type;
+                idev->d_unit = -1;
+                if (*np != '\0' && *np != ':') {
+                        idev->d_unit = strtol(np, &cp, 0);
+                        if (cp == np) {
+                                idev->d_unit = -1;
+                                free(idev);
+                                return (EUNIT);
+                        }
+                }
+        }
 	err = 0;
-	np = devspec + strlen(dv->dv_name);
-	if (*np != '\0' && *np != ':') {
-		idev->d_unit = strtol(np, &cp, 0);
-		if (cp == np) {
-			idev->d_unit = -1;
-			free(idev);
-			return (EUNIT);
-		}
-	}
 	if (*cp != '\0' && *cp != ':') {
 		free(idev);
 		return (EINVAL);
@@ -138,10 +152,11 @@
 	static char buf[32];	/* XXX device length constant? */
 
 	switch(dev->d_type) {
+        case DEVT_ZFS:
+                return zfs_fmtdev(dev);
 	case DEVT_NONE:
 		strcpy(buf, "(no device)");
 		break;
-
 	default:
 		sprintf(buf, "%s%d:", dev->d_dev->dv_name, dev->d_unit);
 		break;
Index: sys/boot/efi/loader/main.c
===================================================================
--- sys/boot/efi/loader/main.c	(revision 289821)
+++ sys/boot/efi/loader/main.c	(working copy)
@@ -39,6 +39,7 @@
 #include <smbios.h>
 
 #include "loader_efi.h"
+#include "libzfs.h"
 
 extern char bootprog_name[];
 extern char bootprog_rev[];
@@ -45,8 +46,9 @@
 extern char bootprog_date[];
 extern char bootprog_maker[];
 
-struct devdesc currdev;		/* our current device */
-struct arch_switch archsw;	/* MI/MD interface boundary */
+/* our current device */
+/* MI/MD interface boundary */
+struct arch_switch archsw;
 
 EFI_GUID acpi = ACPI_TABLE_GUID;
 EFI_GUID acpi20 = ACPI_20_TABLE_GUID;
@@ -61,6 +63,70 @@
 EFI_GUID debugimg = DEBUG_IMAGE_INFO_TABLE_GUID;
 EFI_GUID fdtdtb = FDT_TABLE_GUID;
 
+static void efi_zfs_probe(void);
+
+static void
+print_str16(const CHAR16* const str)
+{
+        for(int i; str[i]; i++)
+        {
+                printf("%c", str[i]);
+        }
+}
+
+/*
+static int
+str16cmp(const CHAR16 const *a,
+         const char* const b)
+{
+        for(int i = 0; a[i] || b[i]; i++)
+        {
+                const CHAR16 achr = a[i];
+                const CHAR16 bchr = b[i];
+                if (achr < bchr)
+                {
+                        return -1;
+                } else if (achr > bchr)
+                {
+                        return 1;
+                }
+        }
+        return 0;
+}
+
+// Split an arg of the form "argname=argval", replacing the '=' with a \0
+static CHAR16*
+split_arg(CHAR16 *const str)
+{
+        for (int i = 0; str[i]; i++)
+        {
+                if ('=' == str[i])
+                {
+                        str[i] = 0;
+                        return str + i + 1;
+                }
+        }
+        return NULL;
+}
+
+static void
+handle_arg(CHAR16 *const arg)
+{
+        const CHAR16* const argval = split_arg(arg);
+        const CHAR16* const argname = arg;
+
+        if (NULL != argval)
+        {
+                printf("Unrecognized argument \"");
+                print_arg(argname);
+                printf("\n");
+        } else {
+                printf("Unrecognized argument \"");
+                print_arg(argname);
+                printf("\n");
+        }
+}
+*/
 EFI_STATUS
 main(int argc, CHAR16 *argv[])
 {
@@ -69,7 +135,15 @@
 	EFI_GUID *guid;
 	int i;
 
-	/*
+	archsw.arch_autoload = efi_autoload;
+	archsw.arch_getdev = efi_getdev;
+	archsw.arch_copyin = efi_copyin;
+	archsw.arch_copyout = efi_copyout;
+	archsw.arch_readin = efi_readin;
+        // Note this needs to be set before ZFS init
+        archsw.arch_zfs_probe = efi_zfs_probe;
+
+        /*
 	 * XXX Chicken-and-egg problem; we want to have console output
 	 * early, but some console attributes may depend on reading from
 	 * eg. the boot device, which we can't do yet.  We can use
@@ -85,13 +159,22 @@
 	/*
 	 * March through the device switch probing for things.
 	 */
-	for (i = 0; devsw[i] != NULL; i++)
-		if (devsw[i]->dv_init != NULL)
+	for (i = 0; devsw[i] != NULL; i++) {
+                if (devsw[i]->dv_init != NULL) {
+                        printf("Initializing %s\n", devsw[i]->dv_name);
 			(devsw[i]->dv_init)();
-
+                }
+        }
 	/* Get our loaded image protocol interface structure. */
 	BS->HandleProtocol(IH, &imgid, (VOID**)&img);
 
+	printf("Command line arguments:");
+        for(i = 0; i < argc; i++) {
+                printf(" ");
+                print_str16(argv[i]);
+        }
+	printf("\n");
+
 	printf("Image base: 0x%lx\n", (u_long)img->ImageBase);
 	printf("EFI version: %d.%02d\n", ST->Hdr.Revision >> 16,
 	    ST->Hdr.Revision & 0xffff);
@@ -105,8 +188,13 @@
 	printf("%s, Revision %s\n", bootprog_name, bootprog_rev);
 	printf("(%s, %s)\n", bootprog_maker, bootprog_date);
 
-	efi_handle_lookup(img->DeviceHandle, &currdev.d_dev, &currdev.d_unit);
-	currdev.d_type = currdev.d_dev->dv_type;
+        // Handle command-line arguments
+        /*
+        for(i = 1; i < argc; i++)
+        {
+                handle_arg(argv[i]);
+        }
+        */
 
 	/*
 	 * Disable the watchdog timer. By default the boot manager sets
@@ -119,19 +207,39 @@
 	 */
 	BS->SetWatchdogTimer(0, 0, 0, NULL);
 
-	env_setenv("currdev", EV_VOLATILE, efi_fmtdev(&currdev),
-	    efi_setcurrdev, env_nounset);
-	env_setenv("loaddev", EV_VOLATILE, efi_fmtdev(&currdev), env_noset,
-	    env_nounset);
+        struct devsw *dev;
+        int unit;
+        uint64_t pool_guid;
+        efi_handle_lookup(img->DeviceHandle, &dev, &unit, &pool_guid);
+        switch (dev->dv_type) {
+        case DEVT_ZFS: {
+                struct zfs_devdesc currdev;
+                currdev.d_dev = dev;
+                currdev.d_unit = unit;
+                currdev.d_type = currdev.d_dev->dv_type;
+                currdev.d_opendata = NULL;
+                currdev.pool_guid = pool_guid;
+                currdev.root_guid = 0;
+                env_setenv("currdev", EV_VOLATILE, efi_fmtdev(&currdev),
+                           efi_setcurrdev, env_nounset);
+                env_setenv("loaddev", EV_VOLATILE, efi_fmtdev(&currdev), env_noset,
+                           env_nounset);
+        } break;
+        default: {
+                struct devdesc currdev;
+                currdev.d_dev = dev;
+                currdev.d_unit = unit;
+                currdev.d_opendata = NULL;
+                currdev.d_type = currdev.d_dev->dv_type;
+                env_setenv("currdev", EV_VOLATILE, efi_fmtdev(&currdev),
+                           efi_setcurrdev, env_nounset);
+                env_setenv("loaddev", EV_VOLATILE, efi_fmtdev(&currdev), env_noset,
+                           env_nounset);
+        } break;
+        }
 
 	setenv("LINES", "24", 1);	/* optional */
 
-	archsw.arch_autoload = efi_autoload;
-	archsw.arch_getdev = efi_getdev;
-	archsw.arch_copyin = efi_copyin;
-	archsw.arch_copyout = efi_copyout;
-	archsw.arch_readin = efi_readin;
-
 	for (i = 0; i < ST->NumberOfTableEntries; i++) {
 		guid = &ST->ConfigurationTable[i].VendorGuid;
 		if (!memcmp(guid, &smbios, sizeof(EFI_GUID))) {
@@ -350,7 +458,6 @@
 	return (CMD_OK);
 }
 
-
 COMMAND_SET(nvram, "nvram", "get or set NVRAM variables", command_nvram);
 
 static int
@@ -402,6 +509,27 @@
 	return (CMD_OK);
 }
 
+COMMAND_SET(lszfs, "lszfs", "list child datasets of a zfs dataset",
+    command_lszfs);
+
+static int
+command_lszfs(int argc, char *argv[])
+{
+    int err;
+
+    if (argc != 2) {
+	command_errmsg = "wrong number of arguments";
+	return (CMD_ERROR);
+    }
+
+    err = zfs_list(argv[1]);
+    if (err != 0) {
+	command_errmsg = strerror(err);
+	return (CMD_ERROR);
+    }
+    return (CMD_OK);
+}
+
 #ifdef LOADER_FDT_SUPPORT
 extern int command_fdt_internal(int argc, char *argv[]);
 
@@ -420,3 +548,23 @@
 
 COMMAND_SET(fdt, "fdt", "flattened device tree handling", command_fdt);
 #endif
+
+static void
+efi_zfs_probe(void)
+{
+	EFI_BLOCK_IO *blkio;
+	EFI_HANDLE h;
+	EFI_STATUS status;
+	u_int unit = 0;
+        char devname[32];
+        uint64_t pool_guid;
+
+	for (int i = 0, h = efi_find_handle(&efipart_dev, 0);
+	    h != NULL; h = efi_find_handle(&efipart_dev, ++i)) {
+                snprintf(devname, sizeof devname, "%s%d:",
+                         efipart_dev.dv_name, i);
+                if(0 == zfs_probe_dev(devname, &pool_guid)) {
+                        efi_handle_update_dev(h, &zfs_dev, unit++, pool_guid);
+                }
+        }
+}
Index: sys/boot/zfs/zfs.c
===================================================================
--- sys/boot/zfs/zfs.c	(revision 289821)
+++ sys/boot/zfs/zfs.c	(working copy)
@@ -140,7 +140,7 @@
 	n = size;
 	if (fp->f_seekp + n > sb.st_size)
 		n = sb.st_size - fp->f_seekp;
-	
+
 	rc = dnode_read(spa, &fp->f_dnode, fp->f_seekp, start, n);
 	if (rc)
 		return (rc);
@@ -493,7 +493,7 @@
 		}
 	}
 	close(pa.fd);
-	return (0);
+	return (ret);
 }
 
 /*

--------------070902030409020404010705--

From owner-freebsd-hackers@freebsd.org  Fri Oct 23 11:37:34 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D5995A1C97E;
 Fri, 23 Oct 2015 11:37:34 +0000 (UTC) (envelope-from rb@gid.co.uk)
Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6869D7F0;
 Fri, 23 Oct 2015 11:37:33 +0000 (UTC) (envelope-from rb@gid.co.uk)
Received: from [194.32.164.28] ([194.32.164.28])
 by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id t9NBbVjl080406;
 Fri, 23 Oct 2015 12:37:31 +0100 (BST) (envelope-from rb@gid.co.uk)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
Subject: Re: ECC support
From: Bob Bishop <rb@gid.co.uk>
In-Reply-To: <1483396.WZc3qgD2yz@ralph.baldwin.cx>
Date: Fri, 23 Oct 2015 12:37:31 +0100
Cc: freebsd-hackers@freebsd.org, Dieter BSD <dieterbsd@gmail.com>,
 freebsd-hardware@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <97482413-D2AA-4C32-AEFF-EB65D5D8542B@gid.co.uk>
References: <CAA3ZYrDjTNM7AShdpFOjT-3wZnEV2u-2X6MnLksON61bw7=XiQ@mail.gmail.com>
 <1492434.22kxSKhHEJ@ralph.baldwin.cx>
 <74705089-408A-4FD3-899E-CA677390F855@gid.co.uk>
 <1483396.WZc3qgD2yz@ralph.baldwin.cx>
To: John Baldwin <jhb@freebsd.org>
X-Mailer: Apple Mail (2.2104)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Oct 2015 11:37:34 -0000

Hi,

> On 22 Oct 2015, at 22:17, John Baldwin <jhb@freebsd.org> wrote:
>=20
> On Thursday, October 22, 2015 07:49:13 PM Bob Bishop wrote:
>> HI,
>>=20
>>> On 22 Oct 2015, at 19:09, John Baldwin <jhb@freebsd.org> wrote:
>>>=20
>>> On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote:
>>>> Chris:
>>>>> MCA: Bank 1, Status 0x9400000000000151
>>>>> MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000
>>>>> MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2
>>>>>=20
>>>>> MCA: Address 0x81cc0e9f0
>>>>>=20
>>>>> Kind of freaky. I've never had this error on this board before.
>>>>> On others tho.
>>>>>=20
>>>>> Try a search for MCA instead.
>>>>=20
>>>> Is there a decoder ring for those messages?  I don't recall seeing
>>>> messages like that, although I wasn't looking for them, and they
>>>> don't leap out at you screaming ERROR! ERROR!  Digital Unix had its
>>>> problems, but at least the error messages were fairly clear.
>>>> Something like "single bit memory error at address 0x12345..."
>>>> A simple edit to sys/x86/x86/mca.c
>>>>  s/printf("UNCOR ");/printf("Uncorrectable ");/
>>>>  s/printf("COR ");/printf("Correctable ");/
>>>> would make the messages at least slightly more meaningful to a =
viewer
>>>> who isn't intimently(sp) familiar with the mca.  Which most people =
aren't.
>>>=20
>>> The problem is that there are other fields to decode and you can =
only fit so
>>> much in one line.  Also, there is not a CPU-independent way to know =
the
>>> address of an ECC error. [etc]
>>=20
>> On server-class hardware, the platform management (BMC or whatever) =
is probably decoding this stuff for event logs and can be interrogated =
via IPMI (or whatever).
>=20
> Not always well and not always with side effects you want.  On Core 2 =
and
> Nehalem i7 class hardware I measured that it took on the order of 400
> milliseconds (not micro) in SMM (system management mode, so your =
entire
> OS is halted) to write out each log entry to NVRAM.  At least one =
place I
> worked at turned the BIOS ECC logging off because that delay was too =
costly.
>=20
> Also, even though your BMC may log it, the format for doing so isn't
> standard.  The details such as the affected DIMM are in the OEM bits =
of
> the log record, so not something you can easily extract from, say,
> ipmitool sel elist.  You'd have to log into the BIOS itself (or the =
BMC's
> web UI) to see which DIMM is affected.  Neither of those are really =
great
> for automated reporting.

All agreed. I was just flagging up the existence of another possible =
channel to get at ECC logging.

> --=20
> John Baldwin

--
Bob Bishop
rb@gid.co.uk


From owner-freebsd-hackers@freebsd.org  Fri Oct 23 12:33:03 2015
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 609DCA1D081
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Fri, 23 Oct 2015 12:33:03 +0000 (UTC)
 (envelope-from ganael.laplanche@corp.ovh.com)
Received: from mo174.mail-out.ovh.net (mo174.mail-out.ovh.net [178.32.228.174])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 267CC314
 for <freebsd-hackers@freebsd.org>; Fri, 23 Oct 2015 12:33:02 +0000 (UTC)
 (envelope-from ganael.laplanche@corp.ovh.com)
Received: from ex2.OVH.local (corp.ovh.com [5.196.251.137])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (No client certificate requested)
 by mo174.mail-out.ovh.net (Postfix) with ESMTPS id ACD6FFF80A8;
 Fri, 23 Oct 2015 14:32:51 +0200 (CEST)
Received: from desk533202.ovh.net (5.196.2.34) by ex2.OVH.local (172.16.7.2)
 with Microsoft SMTP Server (TLS) id 15.1.225.42; Fri, 23 Oct 2015 14:32:51
 +0200
From: Ganael Laplanche <ganael.laplanche@corp.ovh.com>
Organization: OVH
To: Eric McCorkle <eric@metricspace.net>
Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs
Date: Fri, 23 Oct 2015 14:32:50 +0200
User-Agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; )
CC: <freebsd-hackers@freebsd.org>
References: <56211825.3080403@metricspace.net>
 <201510220733.32997.ganael.laplanche@corp.ovh.com>
 <562A17A3.6090803@metricspace.net>
In-Reply-To: <562A17A3.6090803@metricspace.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-15"
Content-Transfer-Encoding: quoted-printable
Message-ID: <201510231432.50315.ganael.laplanche@corp.ovh.com>
X-Originating-IP: [5.196.2.34]
X-ClientProxiedBy: cas02.OVH.local (172.16.1.2) To ex2.OVH.local (172.16.7.2)
X-Ovh-Tracer-Id: 2951265135050668584
X-VR-SPAMSTATE: OK
X-VR-SPAMSCORE: -100
X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekhedrtdeiucetufdoteggodftvfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Oct 2015 12:33:03 -0000

On Friday, October 23, 2015 01:18:59 PM Eric McCorkle wrote:

> This is a patch pulled fresh from my /usr/src after an svn update.
> Therefore, it should represent a patch against the current head

Thanks Eric !

Is that patch just an update (regarding -CURRENT src tree) or are there=20
technical modifications too ?

=2D-=20
Gana=EBl LAPLANCHE <ganael.laplanche@corp.ovh.com>