From owner-freebsd-hackers@freebsd.org Sun Oct 18 15:06:39 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 53325A18F73 for ; Sun, 18 Oct 2015 15:06:39 +0000 (UTC) (envelope-from jmaloney@pcbsd.org) Received: from barracuda.ixsystems.com (mail.ixsystems.com [69.198.165.135]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.ixsystems.com", Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3133EE57 for ; Sun, 18 Oct 2015 15:06:38 +0000 (UTC) (envelope-from jmaloney@pcbsd.org) X-ASG-Debug-ID: 1445180796-08ca040e8500c30002-P5m3U7 Received: from [10.0.1.52] (ip72-209-160-49.ks.ks.cox.net [72.209.160.49]) by barracuda.ixsystems.com with ESMTP id F2olhriWYHpLoehM (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 18 Oct 2015 08:06:36 -0700 (PDT) X-Barracuda-Envelope-From: jmaloney@pcbsd.org X-Barracuda-AUTH-User: jmaloney@pcbsd.org X-Barracuda-Apparent-Source-IP: 72.209.160.49 Mime-Version: 1.0 (Mac OS X Mail 9.0 \(3094\)) Subject: Re: rc(8) parallel tasks From: Joe Maloney X-ASG-Orig-Subj: Re: rc(8) parallel tasks In-Reply-To: <562143E7.3030104@jet9.net> Date: Sun, 18 Oct 2015 10:06:35 -0500 Cc: Mark Felder , freebsd-hackers@freebsd.org, freebsd-rc@freebsd.org, marck@rinet.ru Message-Id: <6D32B114-453C-4D25-8FF4-C8777D78C50C@pcbsd.org> References: <560EAC05.6050308@jet9.net> <1445011561.1233840.412174489.688C5822@webmail.messagingengine.com> <562143E7.3030104@jet9.net> To: Cyril Vechera X-Mailer: Apple Mail (2.3094) X-Barracuda-Connect: ip72-209-160-49.ks.ks.cox.net[72.209.160.49] X-Barracuda-Start-Time: 1445180796 X-Barracuda-Encrypted: ECDHE-RSA-AES256-SHA X-Barracuda-URL: https://10.2.0.41:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at ixsystems.com X-Barracuda-BRTS-Status: 1 X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=8.0 tests=HTML_MESSAGE X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.23602 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 HTML_MESSAGE BODY: HTML included in message Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Oct 2015 15:06:39 -0000 Thanks for sharing this. It worked great for me on FreeBSD as well with = minimal services, and does indeed make a difference. It blew up on = PCBSD which has many more services running out of box. I also noticed = /tmp/init has to be removed manually in some cases when boot hangs. = Very impressive otherwise. Joe Maloney > On Oct 16, 2015, at 1:37 PM, Cyril Vechera wrote: >=20 > On 10/16/2015 07:06 PM, Mark Felder wrote: >> On Fri, Oct 2, 2015, at 11:08, Cyril Vechera wrote: >>> Hi there. >>>=20 >>> We've got a small launcher script (~250 loc) for parallel services >>> start/stop etc. It is used on our embedded systems and our users >>> containers. And I've done a proof of concept for implanting it to = the >>> FreeBSD's standard /etc/rc for execution starting scripts in = parallel. >>> It gave me a boot time reduction of rc part from 27 to 7 seconds, = mostly >>> on eliminating jams for network or other long-latency resources = waiting. >>>=20 >>> The launcher is written in pure POSIX shell and uses FIFOs (named = pipes) >>> as a mutexes for synchronization. So it is embedded into /etc/rc and >>> /etc/rc.d preserving rc.subr preloading. As a primary requirement, = it >>> guarantees topological order (strict partial order) defined by >>> dependencies. It requires only POSIX shell, FreeBSD or Linux kernel, >>> mkfifo and a writeable file system. Due to last requirement, it can = be >>> run on the late stage or should be supplied by some kinf of = writtable >>> fs, ie tmpfs. The FreeBSD-integrated version uses standard rcorder >>> annotations (REQUIRE, BEFORE and PROVIDE) and there's no need to = change >>> rc.d scripts >>>=20 >>> It's not a full init replacement or a kind of services supervision = tool. >>> It only starts or invokes a group of scripts in parallel with = resolving >>> and assuring execution in dependencies order. >>>=20 >>> Please take a look at the script and patch set for FreeBSD: >>>=20 >>> = https://github.com/cvss/jet9-multitask-init/blob/master/jet9-multitask-ini= t >>> = https://github.com/cvss/jet9-multitask-init/tree/master/examples/freebsd >>>=20 >>>=20 >> Your first link is a 404, but this looks really nice! >=20 > In the last commit (v1.3.0) I've renamed 'jet9-multitask-init' to = 'jet9-multitask-flow' to avoid confusion with naming, because it's not = as it seems, from name, an init replacement, but just a parallel task = launcher. And now the actual repository URL is = https://github.com/cvss/jet9-multitask-flow = >=20 > In this commit I've complete the FreeBSD compatibility. Now a script = or dependency name can include minuses `-` and dots `.` (the first stone = I stumbled over was ftp-proxy). And I've cleaned up the main script = code https://github.com/cvss/jet9-multitask-flow/jet9-multitask-flow = and = have split it to more functions that can be redefined. So it's now = easier to rewrite dependency extraction for FreeBSD rc-scripts - current = implementation is too rough and takes three 'awk' runs for each = rc-script. The last is not only time loss, but as DMarck mentioned = before, using awk restricts parallel rc to be run only after FILESYSTEMS = stage is done. Maybe it would be better to add to the rcorder(8) some = new option to dump the gathered dependencies in tsort-compatible listing = and insert them directly to flow execution plan. >=20 > I've also added rc.conf variable `rc_parallel` to turn on and off = parallel execution. There's a risk to discover an incomplete dependency = annotations in some rc-scripts that earlier were masked by serial = execution. I've done some checks by enabling as much rc.conf variables = as possible to start more rc-scripts, and didn't found any error. But it = looks too good to be true and I'm afraid that it's just to poor testing. = I think if some ordering conflict will be found, it could be = worked-around with introducing a white-list for script names that must = be run only in sequentially. >=20 > So it remained first to check if it really works in different = conditions. >=20 >=20 >=20 >=20 > --=20 > Cyril Vechera >=20 > _______________________________________________ > freebsd-hackers@freebsd.org = mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers = > To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org = " From owner-freebsd-hackers@freebsd.org Mon Oct 19 09:19:25 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ECFCCA18646 for ; Mon, 19 Oct 2015 09:19:25 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from kif.fubar.geek.nz (kif.fubar.geek.nz [178.62.119.249]) by mx1.freebsd.org (Postfix) with ESMTP id BCA861DE9 for ; Mon, 19 Oct 2015 09:19:25 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from bender.Home (bcdccf38.skybroadband.com [188.220.207.56]) by kif.fubar.geek.nz (Postfix) with ESMTPSA id 98ADBD7900; Mon, 19 Oct 2015 09:18:53 +0000 (UTC) Date: Mon, 19 Oct 2015 10:18:51 +0100 From: Andrew Turner To: Eric McCorkle Cc: "freebsd-hackers@freebsd.org" Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs Message-ID: <20151019101851.28895dfa@bender.Home> In-Reply-To: <56211825.3080403@metricspace.net> References: <56211825.3080403@metricspace.net> X-Mailer: Claws Mail 3.12.0 (GTK+ 2.24.28; amd64-portbld-freebsd10.1) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Oct 2015 09:19:26 -0000 On Fri, 16 Oct 2015 11:30:45 -0400 Eric McCorkle wrote: > Hi, > > I've received a few successful test reports for EFI/ZFS, on both ZFS > and UFS systems. I have personally been using it for some time in a > GRUB + loader.efi setup. > > I have a fairly minor test result for loader.efi. I have confirmed > that "nextboot -k" works fine. > > In general, I need testing on ZFS setups with more complex vdevs > (l2arc, intent logs, mirroring, striping, raidz, etc.) Do you have an updated patch? When I looked at this recently the patch didn't apply cleanly. Andrew From owner-freebsd-hackers@freebsd.org Tue Oct 20 10:14:15 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 149A6A1835B for ; Tue, 20 Oct 2015 10:14:15 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 8AF47A3; Tue, 20 Oct 2015 10:14:12 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA02295; Tue, 20 Oct 2015 13:14:04 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZoTvg-0004vO-5d; Tue, 20 Oct 2015 13:14:04 +0300 To: freebsd-hackers From: Andriy Gapon Subject: instability of timekeeping X-Enigmail-Draft-Status: N1110 Message-ID: <56261398.60102@FreeBSD.org> Date: Tue, 20 Oct 2015 13:12:40 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Oct 2015 10:14:15 -0000 I recently replaced a 2-core Athlon II X2 CPU with a same-family Phenom II X4 CPU and after that I started noticing problems with the timekeeping. It seems that from time to time the jitter becomes so high that ntpd goes nuts or stops synchronizing or panics. Here how the current event timer and time counter configurations look (slightly trimmed): $ sysctl kern.timecounter kern.timecounter.tsc_shift: 1 kern.timecounter.smp_tsc_adjust: 0 kern.timecounter.smp_tsc: 1 kern.timecounter.invariant_tsc: 1 kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: TSC-low(800) ACPI-fast(900) HPET(950) i8254(0) dummy(-1000000) kern.timecounter.hardware: TSC-low kern.timecounter.alloweddeviation: 5 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.TSC-low.quality: 800 kern.timecounter.tc.TSC-low.frequency: 1607357461 kern.timecounter.tc.TSC-low.counter: 2457319922 kern.timecounter.tc.TSC-low.mask: 4294967295 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.i8254.quality: 0 $ sysctl kern.eventtimer kern.eventtimer.periodic: 0 kern.eventtimer.timer: HPET kern.eventtimer.idletick: 0 kern.eventtimer.singlemul: 2 kern.eventtimer.choice: HPET(450) HPET1(450) HPET2(450) LAPIC(400) i8254(100) RTC(0) kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.HPET2.quality: 450 kern.eventtimer.et.HPET1.quality: 450 kern.eventtimer.et.HPET.quality: 450 kern.eventtimer.et.HPET.frequency: 14318180 kern.eventtimer.et.HPET.flags: 3 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.LAPIC.quality: 400 Please note is that TSC-low time counter is chosen administratively whereas the event timer configuration is fully automatic. The previous configuration was produced in the same fashion. One notable difference is that the previous CPU was 2-core and so two HPET timers were virtually combined into a single timer with per-CPU capability. In other words, two HPET timers used two drive two cores. The newer CPU has four cores, so there are not enough HPET timers to drive each core independently and thus there is no virtual bundling. Thus, one HPET timer drives one core and that core forwards the interrupts to other cores via IPIs as necessary. But I am far from sure that the stated difference is actually the source of the instability. There could be other hardware-related reasons, of course. I wonder if there is a good way to analyze / debug this situation to see what exactly is wrong. For now I am thinking about trying different time counter and event timer configurations, but I would prefer a more guided "scientific" approach over a blind trial and error one. I would appreciate any help, suggestions, hints. The CPUs: CPU: AMD Athlon(tm) II X2 250 Processor (3013.79-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x100f62 Family=0x10 Model=0x6 Stepping=2 Features=0x178bfbff Features2=0x802009 AMD Features=0xee500800 AMD Features2=0x37ff SVM: Features=0xf Revision=1, ASIDs=64 TSC: P-state invariant CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x100f43 Family=0x10 Model=0x4 Stepping=3 Features=0x178bfbff Features2=0x802009 AMD Features=0xee500800 AMD Features2=0x37ff SVM: Features=0xf Revision=1, ASIDs=64 TSC: P-state invariant -- Andriy Gapon From owner-freebsd-hackers@freebsd.org Tue Oct 20 11:06:57 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2A117A19685 for ; Tue, 20 Oct 2015 11:06:57 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 4A1F314AA; Tue, 20 Oct 2015 11:06:55 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA03152; Tue, 20 Oct 2015 14:06:54 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZoUko-0004zJ-5g; Tue, 20 Oct 2015 14:06:54 +0300 Subject: Re: instability of timekeeping To: freebsd-hackers References: <56261398.60102@FreeBSD.org> From: Andriy Gapon X-Enigmail-Draft-Status: N1110 Message-ID: <56261FE6.90302@FreeBSD.org> Date: Tue, 20 Oct 2015 14:05:10 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <56261398.60102@FreeBSD.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Oct 2015 11:06:57 -0000 I performed a small observation. With ntpd disabled I ran `ntpdate -d` at 10 second intervals in a loop (done via `sleep 10`). It looks like for about 25 minutes the time offset between a reference server and my machine was quite stable. But then it sort of jumped about 2.5 seconds between two consecutive ntpdate invocations. 20 Oct 13:21:02 ntpdate[85157]: ntpdate 4.2.8p3-a (1) Looking for host ntp.time.in.ua and service ntp 62.149.0.30 reversed to ntp.time.in.ua transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) server 62.149.0.30, port 123 stratum 1, precision -20, leap 00, trust 000 refid [GPS], delay 0.03326, dispersion 0.00014 transmitted 4, in filter 4 reference time: d9d09422.65c26838 Tue, Oct 20 2015 13:21:22.397 originate timestamp: d9d0942f.a03ebee7 Tue, Oct 20 2015 13:21:35.625 transmit timestamp: d9d09414.e9ed230c Tue, Oct 20 2015 13:21:08.913 filter delay: 0.03372 0.03491 0.03378 0.03326 0.00000 0.00000 0.00000 0.00000 filter offset: 26.70845 26.70896 26.70834 26.70832 0.000000 0.000000 0.000000 0.000000 delay 0.03326, dispersion 0.00014 offset 26.708320 20 Oct 13:21:08 ntpdate[85157]: step time server 62.149.0.30 offset 26.708320 sec [...] 20 Oct 13:45:20 ntpdate[87088]: ntpdate 4.2.8p3-a (1) Looking for host ntp.time.in.ua and service ntp 62.149.0.30 reversed to ntp.time.in.ua transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) server 62.149.0.30, port 123 stratum 1, precision -20, leap 00, trust 000 refid [GPS], delay 0.03442, dispersion 0.00018 transmitted 4, in filter 4 reference time: d9d099d3.67703742 Tue, Oct 20 2015 13:45:39.404 originate timestamp: d9d099e1.8a1b6c78 Tue, Oct 20 2015 13:45:53.539 transmit timestamp: d9d099c6.d34c1e00 Tue, Oct 20 2015 13:45:26.825 filter delay: 0.03481 0.03442 0.03448 0.03458 0.00000 0.00000 0.00000 0.00000 filter offset: 26.70957 26.70943 26.70913 26.70957 0.000000 0.000000 0.000000 0.000000 delay 0.03442, dispersion 0.00018 offset 26.709437 20 Oct 13:45:26 ntpdate[87088]: step time server 62.149.0.30 offset 26.709437 sec 20 Oct 13:45:36 ntpdate[87094]: ntpdate 4.2.8p3-a (1) Looking for host ntp.time.in.ua and service ntp 62.149.0.30 reversed to ntp.time.in.ua transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) server 62.149.0.30, port 123 stratum 1, precision -20, leap 00, trust 000 refid [GPS], delay 0.03349, dispersion 0.00012 transmitted 4, in filter 4 reference time: d9d099e5.63ead89c Tue, Oct 20 2015 13:45:57.390 originate timestamp: d9d099f4.6364c717 Tue, Oct 20 2015 13:46:12.388 transmit timestamp: d9d099d7.00939943 Tue, Oct 20 2015 13:45:43.002 filter delay: 0.03349 0.03413 0.03419 0.03455 0.00000 0.00000 0.00000 0.00000 filter offset: 29.38105 29.38106 29.38134 29.38149 0.000000 0.000000 0.000000 0.000000 delay 0.03349, dispersion 0.00012 offset 29.381055 20 Oct 13:45:43 ntpdate[87094]: step time server 62.149.0.30 offset 29.381055 sec On 20/10/2015 13:12, Andriy Gapon wrote: > > I recently replaced a 2-core Athlon II X2 CPU with a same-family Phenom II X4 > CPU and after that I started noticing problems with the timekeeping. It seems > that from time to time the jitter becomes so high that ntpd goes nuts or stops > synchronizing or panics. > > Here how the current event timer and time counter configurations look (slightly > trimmed): > $ sysctl kern.timecounter > kern.timecounter.tsc_shift: 1 > kern.timecounter.smp_tsc_adjust: 0 > kern.timecounter.smp_tsc: 1 > kern.timecounter.invariant_tsc: 1 > kern.timecounter.fast_gettime: 1 > kern.timecounter.tick: 1 > kern.timecounter.choice: TSC-low(800) ACPI-fast(900) HPET(950) i8254(0) > dummy(-1000000) > kern.timecounter.hardware: TSC-low > kern.timecounter.alloweddeviation: 5 > kern.timecounter.stepwarnings: 0 > kern.timecounter.tc.TSC-low.quality: 800 > kern.timecounter.tc.TSC-low.frequency: 1607357461 > kern.timecounter.tc.TSC-low.counter: 2457319922 > kern.timecounter.tc.TSC-low.mask: 4294967295 > kern.timecounter.tc.ACPI-fast.quality: 900 > kern.timecounter.tc.HPET.quality: 950 > kern.timecounter.tc.i8254.quality: 0 > $ sysctl kern.eventtimer > kern.eventtimer.periodic: 0 > kern.eventtimer.timer: HPET > kern.eventtimer.idletick: 0 > kern.eventtimer.singlemul: 2 > kern.eventtimer.choice: HPET(450) HPET1(450) HPET2(450) LAPIC(400) i8254(100) RTC(0) > kern.eventtimer.et.RTC.quality: 0 > kern.eventtimer.et.HPET2.quality: 450 > kern.eventtimer.et.HPET1.quality: 450 > kern.eventtimer.et.HPET.quality: 450 > kern.eventtimer.et.HPET.frequency: 14318180 > kern.eventtimer.et.HPET.flags: 3 > kern.eventtimer.et.i8254.quality: 100 > kern.eventtimer.et.LAPIC.quality: 400 > > Please note is that TSC-low time counter is chosen administratively whereas the > event timer configuration is fully automatic. > The previous configuration was produced in the same fashion. > One notable difference is that the previous CPU was 2-core and so two HPET > timers were virtually combined into a single timer with per-CPU capability. In > other words, two HPET timers used two drive two cores. > The newer CPU has four cores, so there are not enough HPET timers to drive each > core independently and thus there is no virtual bundling. Thus, one HPET timer > drives one core and that core forwards the interrupts to other cores via IPIs as > necessary. > > But I am far from sure that the stated difference is actually the source of the > instability. There could be other hardware-related reasons, of course. > > I wonder if there is a good way to analyze / debug this situation to see what > exactly is wrong. For now I am thinking about trying different time counter and > event timer configurations, but I would prefer a more guided "scientific" > approach over a blind trial and error one. > > I would appreciate any help, suggestions, hints. > > The CPUs: > CPU: AMD Athlon(tm) II X2 250 Processor (3013.79-MHz K8-class CPU) > Origin="AuthenticAMD" Id=0x100f62 Family=0x10 Model=0x6 Stepping=2 > > Features=0x178bfbff > Features2=0x802009 > AMD Features=0xee500800 > AMD > Features2=0x37ff > SVM: Features=0xf > Revision=1, ASIDs=64 > TSC: P-state invariant > > CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU) > Origin="AuthenticAMD" Id=0x100f43 Family=0x10 Model=0x4 Stepping=3 > > Features=0x178bfbff > Features2=0x802009 > AMD Features=0xee500800 > AMD > Features2=0x37ff > SVM: Features=0xf > Revision=1, ASIDs=64 > TSC: P-state invariant > -- Andriy Gapon From owner-freebsd-hackers@freebsd.org Tue Oct 20 11:10:39 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5C1FDA19784 for ; Tue, 20 Oct 2015 11:10:39 +0000 (UTC) (envelope-from phk@phk.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 2710A164E; Tue, 20 Oct 2015 11:10:38 +0000 (UTC) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.55.3]) by phk.freebsd.dk (Postfix) with ESMTP id 14D794F860; Tue, 20 Oct 2015 11:10:37 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.15.2/8.15.2) with ESMTP id t9KBAaJX097480; Tue, 20 Oct 2015 11:10:36 GMT (envelope-from phk@phk.freebsd.dk) To: Andriy Gapon cc: freebsd-hackers Subject: Re: instability of timekeeping In-reply-to: <56261FE6.90302@FreeBSD.org> From: "Poul-Henning Kamp" References: <56261398.60102@FreeBSD.org> <56261FE6.90302@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-ID: <97478.1445339436.1@critter.freebsd.dk> Content-Transfer-Encoding: 8bit Date: Tue, 20 Oct 2015 11:10:36 +0000 Message-ID: <97479.1445339436@critter.freebsd.dk> X-Mailman-Approved-At: Tue, 20 Oct 2015 11:17:43 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Oct 2015 11:10:39 -0000 -------- In message <56261FE6.90302@FreeBSD.org>, Andriy Gapon writes: >I performed a small observation. With ntpd disabled I ran `ntpdate -d` at 10 >second intervals in a loop (done via `sleep 10`). It looks like for about 25 >minutes the time offset between a reference server and my machine was quite >stable. But then it sort of jumped about 2.5 seconds between two consecutive >ntpdate invocations. Pure guesswork: Somebody may have börked the code to wind up timecounters. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-hackers@freebsd.org Tue Oct 20 21:59:54 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D3727A1AF3A for ; Tue, 20 Oct 2015 21:59:54 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from outbound1b.ore.mailhop.org (outbound1b.ore.mailhop.org [54.200.247.200]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9E55EB2E for ; Tue, 20 Oct 2015 21:59:54 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from ilsoft.org (unknown [73.34.117.227]) by outbound1.ore.mailhop.org (Halon Mail Gateway) with ESMTPSA for ; Tue, 20 Oct 2015 22:00:00 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id t9KLxkhr013544 for ; Tue, 20 Oct 2015 15:59:46 -0600 (MDT) (envelope-from ian@freebsd.org) Message-ID: <1445378386.14127.2.camel@freebsd.org> Subject: vmstat -m strangeness From: Ian Lepore To: freebsd-hackers Date: Tue, 20 Oct 2015 15:59:46 -0600 Content-Type: text/plain; charset="us-ascii" X-Mailer: Evolution 3.16.5 FreeBSD GNOME Team Port Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Oct 2015 21:59:54 -0000 root@wand:~ # vmstat -m | egrep "busdma|bounce|devbuf|Type" Type InUse MemUse HighUse Requests Size(s) devbuf 125 10K - 166 16,32,64,256,512,1024 busdma 922 116K - 922 128 bounce 385 775K - 385 32,128 How do 385 allocations of 32 or 128 bytes add up to 775K? The answer... 768K of individual pages each allocated via contigmalloc() do n't show up in that output. Why is that, and is it something that should be fixed? -- Ian From owner-freebsd-hackers@freebsd.org Wed Oct 21 08:43:18 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ABCF3A1AEDC for ; Wed, 21 Oct 2015 08:43:18 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 9B8C6F9E; Wed, 21 Oct 2015 08:43:17 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA21745; Wed, 21 Oct 2015 11:43:13 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZoozI-0006YE-SL; Wed, 21 Oct 2015 11:43:13 +0300 Subject: Re: instability of timekeeping To: freebsd-hackers References: <56261398.60102@FreeBSD.org> <56261FE6.90302@FreeBSD.org> From: Andriy Gapon X-Enigmail-Draft-Status: N1110 Cc: Poul-Henning Kamp , Jung-uk Kim Message-ID: <56274FFC.2000608@FreeBSD.org> Date: Wed, 21 Oct 2015 11:42:36 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <56261FE6.90302@FreeBSD.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Oct 2015 08:43:18 -0000 On 20/10/2015 14:05, Andriy Gapon wrote: > I performed a small observation. With ntpd disabled I ran `ntpdate -d` at 10 > second intervals in a loop (done via `sleep 10`). It looks like for about 25 > minutes the time offset between a reference server and my machine was quite > stable. But then it sort of jumped about 2.5 seconds between two consecutive > ntpdate invocations. [snip] > On 20/10/2015 13:12, Andriy Gapon wrote: [snip] >> kern.timecounter.tc.TSC-low.frequency: 1607357461 >> kern.timecounter.tc.TSC-low.counter: 2457319922 >> kern.timecounter.tc.TSC-low.mask: 4294967295 [snip] Another observation and a hypothesis. I tried time counters other than TSC and I couldn't reproduce the issue with them. Another thing that occurred to me is that TSC-low.mask / TSC-low.frequency ≈ 2.7 seconds. >From these observations and reading some code comes my hypothesis. First, I assume that visible values of TSCs on different cores are not perfectly synchronized. Second, I think that there can be circumstances where tc_ticktock() -> tc_windup() can get called on different cores sufficiently close in time that the later call would see TSC value which is "before" (sequentially smaller) than the TSC value read earlier (on the other core). In that case the delta between the readings would be close to TSC-low.mask. Right now I do not have any proof that what the hypothesis says is what actually happens. On the other hand, I do not see anything that would prevent the hypothesized situation from occurring. To add more weight to the hypothesis: cpu1:invltlb 674384 4 cpu1:invlrng 453020 3 cpu1:invlpg 108772578 652 cpu1:preempt 37435000 224 cpu1:ast 36757 0 cpu1:rendezvous 9473 0 cpu1:hardclock 22267434 133 As you can see I am currently running workloads that result in a very significant number of IPIs, especially the page invalidation IPIs. Given that all (x86) IPIs have the same priority (based on their vector numbers) I think it's plausible that the hardclock IPI could get arbitrarily delayed. I guess I could add a tracepoint to record deltas that are close to a current timecounter's mask. Assuming the hypothesis is correct I see two possible ways to work-around the problem: 1. Increase tsc_shift, so that the cross-CPU TSC differences are smaller (at the cost of lower resolution). That should reduce the risk of seeing "backwards" TSC values. Judging from numbers that tools/tools/tscdrift produces I can set tsc_shift to 7. The resulting resolution should not be worse than that of HPET or ACPI-fast counters with the benefit of TSC being much faster to read. 2. Change the code, so that tc_windup() is always called on the same CPU. E.g. it could be the BSP or a CPU that receives the actual timer interrupts (as opposed to the hardclock IPIs). This should help with the timekeeping, but won't help with the "jitter" in binuptime() and friends. 3. In tc_delta() somehow detect and filter out "slightly behind" timecounter readings. Not sure if this is possible at all. -- Andriy Gapon From owner-freebsd-hackers@freebsd.org Wed Oct 21 14:11:22 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3838FA1B57C for ; Wed, 21 Oct 2015 14:11:22 +0000 (UTC) (envelope-from ganael.laplanche@corp.ovh.com) Received: from 7.mo174.mail-out.ovh.net (7.mo174.mail-out.ovh.net [46.105.47.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F3334C85 for ; Wed, 21 Oct 2015 14:11:21 +0000 (UTC) (envelope-from ganael.laplanche@corp.ovh.com) Received: from ex2.OVH.local (corp.ovh.com [5.196.251.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mo174.mail-out.ovh.net (Postfix) with ESMTPS id 21616FF80A4; Wed, 21 Oct 2015 13:45:31 +0200 (CEST) Received: from desk533202.ovh.net (5.196.2.34) by ex2.OVH.local (172.16.7.2) with Microsoft SMTP Server (TLS) id 15.1.225.42; Wed, 21 Oct 2015 13:45:31 +0200 From: Ganael Laplanche Organization: OVH To: Eric McCorkle Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs Date: Wed, 21 Oct 2015 13:45:29 +0200 User-Agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; ) CC: References: <56211825.3080403@metricspace.net> In-Reply-To: <56211825.3080403@metricspace.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Message-ID: <201510211345.29460.ganael.laplanche@corp.ovh.com> X-Originating-IP: [5.196.2.34] X-ClientProxiedBy: cas02.OVH.local (172.16.1.2) To ex2.OVH.local (172.16.7.2) X-Ovh-Tracer-Id: 8852669496638618152 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekhedrtddvucetufdoteggodftvfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Oct 2015 14:11:22 -0000 On Friday, October 16, 2015 05:30:45 PM Eric McCorkle wrote: Hi Eric, > In general, I need testing on ZFS setups with more complex vdevs (l2arc,= =20 intent logs, mirroring, striping, raidz, etc.) I have successfully run the following tests, on a server with 3 SSDs and=20 following the method I explained in my previous post (i.e. booting from=20 patched loader.efi, *not* boot1.efi), see:=20 https://lists.freebsd.org/pipermail/freebsd-hackers/2015-August/048141.html 1) Single disk + SLOG + L2ARC # zpool status pool: zroot state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 da0p3 ONLINE 0 0 0 logs da2 ONLINE 0 0 0 cache da1 ONLINE 0 0 0 errors: No known data errors =3D> Boot OK, the pool is up and running with logs and cache online 2) Striping on 3 disks # zpool status pool: zroot state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 da0p3 ONLINE 0 0 0 da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 errors: No known data errors =3D> Boot OK 3) Mirroring on 3 disks # zpool status pool: zroot state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da0p3 ONLINE 0 0 0 da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 errors: No known data errors =3D> Boot OK 4) Raidz on 3 disks # zpool status pool: zroot state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 da0p3 ONLINE 0 0 0 da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 errors: No known data errors =3D> Boot OK As Andrew asked, do you have an updated patch ? Or is the code available on= =20 some repository ? Thanks again for your great work, Regards, =2D-=20 Gana=EBl LAPLANCHE From owner-freebsd-hackers@freebsd.org Wed Oct 21 14:21:31 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0F764A1BB4B for ; Wed, 21 Oct 2015 14:21:31 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CD5C81F1D for ; Wed, 21 Oct 2015 14:21:30 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 732811C3DD2 for ; Wed, 21 Oct 2015 09:21:28 -0500 (CDT) Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs To: freebsd-hackers@freebsd.org References: <56211825.3080403@metricspace.net> <201510211345.29460.ganael.laplanche@corp.ovh.com> From: Karl Denninger Message-ID: <56279F66.5030303@denninger.net> Date: Wed, 21 Oct 2015 09:21:26 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <201510211345.29460.ganael.laplanche@corp.ovh.com> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms010603070509040907090006" X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Oct 2015 14:21:31 -0000 This is a cryptographically signed message in MIME format. --------------ms010603070509040907090006 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Will loader.rc take all the parameters (e.g. loading of geli, aesni, geom, setting kernel parameters, etc) that loader.conf does? On 10/21/2015 06:45, Ganael Laplanche wrote: > On Friday, October 16, 2015 05:30:45 PM Eric McCorkle wrote: > > Hi Eric, > >> In general, I need testing on ZFS setups with more complex vdevs (l2ar= c,=20 > intent logs, mirroring, striping, raidz, etc.) > > I have successfully run the following tests, on a server with 3 SSDs an= d=20 > following the method I explained in my previous post (i.e. booting from= =20 > patched loader.efi, *not* boot1.efi), see:=20 > https://lists.freebsd.org/pipermail/freebsd-hackers/2015-August/048141.= html > > 1) Single disk + SLOG + L2ARC > > # zpool status > pool: zroot > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > da0p3 ONLINE 0 0 0 > logs > da2 ONLINE 0 0 0 > cache > da1 ONLINE 0 0 0 > > errors: No known data errors > > =3D> Boot OK, the pool is up and running with logs and cache online > > 2) Striping on 3 disks > > # zpool status > pool: zroot > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > da0p3 ONLINE 0 0 0 > da1p3 ONLINE 0 0 0 > da2p3 ONLINE 0 0 0 > > errors: No known data errors > > =3D> Boot OK > > 3) Mirroring on 3 disks > > # zpool status > pool: zroot > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > da0p3 ONLINE 0 0 0 > da1p3 ONLINE 0 0 0 > da2p3 ONLINE 0 0 0 > > errors: No known data errors > > =3D> Boot OK > > 4) Raidz on 3 disks > > # zpool status > pool: zroot > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > da0p3 ONLINE 0 0 0 > da1p3 ONLINE 0 0 0 > da2p3 ONLINE 0 0 0 > > errors: No known data errors > > =3D> Boot OK > > As Andrew asked, do you have an updated patch ? Or is the code availabl= e on=20 > some repository ? > > Thanks again for your great work, > Regards, > --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms010603070509040907090006 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNTEwMjExNDIxMjZaME8GCSqGSIb3DQEJBDFCBEDH O5nRcgyTDktZA4kewy/lnNN1GbVnvTZvnrVTWEyn0xysdtvDkMJ5NvEns/EEh/iYtQeJeNqp Y3nFs21pbbbqMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAj90wTa+f urYq00XXdU1ZogBkQxc0/c7eQA2c0NXsnynVbpiYq1sS7Ozr/NVpR59qdxnGRSOarhn4v7i8 EJbz6F8JEK2IO5IhaDdSwMKEkdjhY9ep6Zi1NgtAciag2hsbmbHtBorsoqbtlby8cHOAk3Kg P9Co9zpFIGjM2pCeqGKCAaTye66su5Aiz7VOz9r5jyhr+hQDBhAWYpUI1SdSs8hMWe+Z0qH5 EZ53Wylg06FMEv9HmlsRKpduYvisk8fQMs/7pW1/zFMoYpBHHGZurtfecdklija1kqg9phTI PHb8LIC/Ws0n+KCFGN4mn2Jp/WwLfNdycvw/9hgPFWkTl5tovAgV17IZthX0HO0B0gOIu3nX A2cNo9doVCbnca+gOzl6uzUQkNAa6N3iSAZ1xwq20jCtxz3X83SnbaiLahTu/480vBFA4QFe NWK1MCyVwz3VXmRlT/RDHhPe5YDZiYdAxYeq353ApTXbBBGvWlMf9QTONWv3ugvEvjwtOiMJ vnsci6FYKQXyYtArMZTzMoNKdaCMOU+TgK2NQoArtJHEVf2+6I6vSB7z+nrCuzaocq7+71yC sX9veD1EtXEzu8UDKmQmW3kPD99oGZu1KOxvYOGOxrfa1eBFCVGk2+HWZt2yYoOlH/CwX/fv pbb3jA+ZrRQvNmx6Oi/VplHJJKEAAAAAAAA= --------------ms010603070509040907090006-- From owner-freebsd-hackers@freebsd.org Wed Oct 21 15:29:08 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8F93DA1BCCB for ; Wed, 21 Oct 2015 15:29:08 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from mail.metricspace.net (207-172-209-83.c3-0.arl-ubr1.sbo-arl.ma.static.cable.rcn.com [207.172.209.83]) by mx1.freebsd.org (Postfix) with ESMTP id 6886CC11 for ; Wed, 21 Oct 2015 15:29:07 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from [192.168.2.2] (unknown [166.197.121.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) (Authenticated sender: eric) by mail.metricspace.net (Postfix) with ESMTPSA id 1CA30182B; Wed, 21 Oct 2015 15:28:59 +0000 (UTC) User-Agent: K-9 Mail for blackphone In-Reply-To: <201510211345.29460.ganael.laplanche@corp.ovh.com> References: <56211825.3080403@metricspace.net> <201510211345.29460.ganael.laplanche@corp.ovh.com> MIME-Version: 1.0 Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs From: Eric McCorkle Date: Wed, 21 Oct 2015 11:28:52 -0400 To: Ganael Laplanche CC: freebsd-hackers@freebsd.org Message-ID: <5ED4AE5E-8508-409B-B323-2CBFEBE77088@metricspace.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Oct 2015 15:29:08 -0000 Outstanding. Based on these tests, the reports of successful testing of UFS loading, the fact that PCBSD and NextBSD have apparently picked it up, and the fact that I've been using it for months, I think it's time to move towards getting it committed. I'll post an updated patch by the end of the week. On October 21, 2015 7:45:29 AM EDT, Ganael Laplanche wrote: >On Friday, October 16, 2015 05:30:45 PM Eric McCorkle wrote: > >Hi Eric, > >> In general, I need testing on ZFS setups with more complex vdevs >(l2arc, >intent logs, mirroring, striping, raidz, etc.) > >I have successfully run the following tests, on a server with 3 SSDs >and >following the method I explained in my previous post (i.e. booting from > >patched loader.efi, *not* boot1.efi), see: >https://lists.freebsd.org/pipermail/freebsd-hackers/2015-August/048141.html > >1) Single disk + SLOG + L2ARC > ># zpool status > pool: zroot > state: ONLINE > scan: none requested >config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > da0p3 ONLINE 0 0 0 > logs > da2 ONLINE 0 0 0 > cache > da1 ONLINE 0 0 0 > >errors: No known data errors > >=> Boot OK, the pool is up and running with logs and cache online > >2) Striping on 3 disks > ># zpool status > pool: zroot > state: ONLINE > scan: none requested >config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > da0p3 ONLINE 0 0 0 > da1p3 ONLINE 0 0 0 > da2p3 ONLINE 0 0 0 > >errors: No known data errors > >=> Boot OK > >3) Mirroring on 3 disks > ># zpool status > pool: zroot > state: ONLINE > scan: none requested >config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > da0p3 ONLINE 0 0 0 > da1p3 ONLINE 0 0 0 > da2p3 ONLINE 0 0 0 > >errors: No known data errors > >=> Boot OK > >4) Raidz on 3 disks > ># zpool status > pool: zroot > state: ONLINE > scan: none requested >config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > da0p3 ONLINE 0 0 0 > da1p3 ONLINE 0 0 0 > da2p3 ONLINE 0 0 0 > >errors: No known data errors > >=> Boot OK > >As Andrew asked, do you have an updated patch ? Or is the code >available on >some repository ? > >Thanks again for your great work, >Regards, > >-- >Ganaël LAPLANCHE -- Sent from my Blackphone with K-9 Mail. Please excuse my brevity. From owner-freebsd-hackers@freebsd.org Wed Oct 21 17:34:37 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 81B34A1BAFF for ; Wed, 21 Oct 2015 17:34:37 +0000 (UTC) (envelope-from phk@frebsd.org) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 4C13418DC; Wed, 21 Oct 2015 17:34:37 +0000 (UTC) (envelope-from phk@frebsd.org) Received: from critter.freebsd.dk (unknown [192.168.55.3]) by phk.freebsd.dk (Postfix) with ESMTP id 4CE474F860; Wed, 21 Oct 2015 17:34:29 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.15.2/8.15.2) with ESMTP id t9LHYSRi035099; Wed, 21 Oct 2015 17:34:29 GMT (envelope-from phk@frebsd.org) To: Andriy Gapon cc: freebsd-hackers , Jung-uk Kim Subject: Re: instability of timekeeping In-reply-to: <56274FFC.2000608@FreeBSD.org> From: "Poul-Henning Kamp" References: <56261398.60102@FreeBSD.org> <56261FE6.90302@FreeBSD.org> <56274FFC.2000608@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-ID: <35097.1445448868.1@critter.freebsd.dk> Content-Transfer-Encoding: 8bit Date: Wed, 21 Oct 2015 17:34:28 +0000 Message-ID: <35098.1445448868@critter.freebsd.dk> X-Mailman-Approved-At: Wed, 21 Oct 2015 17:39:51 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Oct 2015 17:34:37 -0000 -------- > Another thing that occurred to me is that TSC-low.mask / > TSC-low.frequency ≈ 2.7 seconds. That's why I suspect timecounters aren't being wound up as they should be. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-hackers@freebsd.org Wed Oct 21 18:48:56 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF388A1AD2C for ; Wed, 21 Oct 2015 18:48:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7C8591D3; Wed, 21 Oct 2015 18:48:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id t9LImorT015089 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 21 Oct 2015 21:48:51 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua t9LImorT015089 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id t9LImol2015088; Wed, 21 Oct 2015 21:48:50 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 21 Oct 2015 21:48:50 +0300 From: Konstantin Belousov To: Andriy Gapon Cc: freebsd-hackers , Poul-Henning Kamp , Jung-uk Kim Subject: Re: instability of timekeeping Message-ID: <20151021184850.GX2257@kib.kiev.ua> References: <56261398.60102@FreeBSD.org> <56261FE6.90302@FreeBSD.org> <56274FFC.2000608@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <56274FFC.2000608@FreeBSD.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Oct 2015 18:48:57 -0000 On Wed, Oct 21, 2015 at 11:42:36AM +0300, Andriy Gapon wrote: > On 20/10/2015 14:05, Andriy Gapon wrote: > > I performed a small observation. With ntpd disabled I ran `ntpdate -d` at 10 > > second intervals in a loop (done via `sleep 10`). It looks like for about 25 > > minutes the time offset between a reference server and my machine was quite > > stable. But then it sort of jumped about 2.5 seconds between two consecutive > > ntpdate invocations. > [snip] > > On 20/10/2015 13:12, Andriy Gapon wrote: > [snip] > >> kern.timecounter.tc.TSC-low.frequency: 1607357461 > >> kern.timecounter.tc.TSC-low.counter: 2457319922 > >> kern.timecounter.tc.TSC-low.mask: 4294967295 > [snip] > > Another observation and a hypothesis. > > I tried time counters other than TSC and I couldn't reproduce the issue with them. > > Another thing that occurred to me is that TSC-low.mask / TSC-low.frequency — 2.7 > seconds. > > >From these observations and reading some code comes my hypothesis. > First, I assume that visible values of TSCs on different cores are not perfectly > synchronized. Second, I think that there can be circumstances where > tc_ticktock() -> tc_windup() can get called on different cores sufficiently > close in time that the later call would see TSC value which is "before" > (sequentially smaller) than the TSC value read earlier (on the other core). In > that case the delta between the readings would be close to TSC-low.mask. > > Right now I do not have any proof that what the hypothesis says is what actually > happens. On the other hand, I do not see anything that would prevent the > hypothesized situation from occurring. > > To add more weight to the hypothesis: > cpu1:invltlb 674384 4 > cpu1:invlrng 453020 3 > cpu1:invlpg 108772578 652 > cpu1:preempt 37435000 224 > cpu1:ast 36757 0 > cpu1:rendezvous 9473 0 > cpu1:hardclock 22267434 133 > As you can see I am currently running workloads that result in a very > significant number of IPIs, especially the page invalidation IPIs. Given that > all (x86) IPIs have the same priority (based on their vector numbers) I think > it's plausible that the hardclock IPI could get arbitrarily delayed. > > I guess I could add a tracepoint to record deltas that are close to a current > timecounter's mask. > > Assuming the hypothesis is correct I see two possible ways to work-around the > problem: > > 1. Increase tsc_shift, so that the cross-CPU TSC differences are smaller (at the > cost of lower resolution). That should reduce the risk of seeing "backwards" > TSC values. Judging from numbers that tools/tools/tscdrift produces I can set > tsc_shift to 7. The resulting resolution should not be worse than that of HPET > or ACPI-fast counters with the benefit of TSC being much faster to read. > > 2. Change the code, so that tc_windup() is always called on the same CPU. E.g. > it could be the BSP or a CPU that receives the actual timer interrupts (as > opposed to the hardclock IPIs). This should help with the timekeeping, but > won't help with the "jitter" in binuptime() and friends. > > 3. In tc_delta() somehow detect and filter out "slightly behind" timecounter > readings. Not sure if this is possible at all. Am I right that the tsc synchronization test passes on your machine ? If yes, you probably cannot read 'slightly behind' timecounter after IPI on other core. Might be, try to change CPUID instruction in the test to MFENCE and see if the test still able to pass. Does the symptom disappear if you switch the eventtimer to LAPIC ? What happens if you turn off usermode gettimeofday() by setting kern.timercounter.fast_gettime to 0 ? From owner-freebsd-hackers@freebsd.org Thu Oct 22 06:28:33 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5026BA1C58A for ; Thu, 22 Oct 2015 06:28:33 +0000 (UTC) (envelope-from ganael.laplanche@corp.ovh.com) Received: from 9.mo175.mail-out.ovh.net (9.mo175.mail-out.ovh.net [46.105.54.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0F952123C for ; Thu, 22 Oct 2015 06:28:32 +0000 (UTC) (envelope-from ganael.laplanche@corp.ovh.com) Received: from ex2.OVH.local (corp.ovh.com [5.196.251.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mo175.mail-out.ovh.net (Postfix) with ESMTPS id 9C3B5FF8265; Thu, 22 Oct 2015 07:33:33 +0200 (CEST) Received: from desk533202.ovh.net (5.196.2.34) by ex2.OVH.local (172.16.7.2) with Microsoft SMTP Server (TLS) id 15.1.225.42; Thu, 22 Oct 2015 07:33:33 +0200 From: Ganael Laplanche Organization: OVH To: Eric McCorkle Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs Date: Thu, 22 Oct 2015 07:33:32 +0200 User-Agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; ) CC: References: <56211825.3080403@metricspace.net> <201510211345.29460.ganael.laplanche@corp.ovh.com> <5ED4AE5E-8508-409B-B323-2CBFEBE77088@metricspace.net> In-Reply-To: <5ED4AE5E-8508-409B-B323-2CBFEBE77088@metricspace.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: quoted-printable Message-ID: <201510220733.32997.ganael.laplanche@corp.ovh.com> X-Originating-IP: [5.196.2.34] X-ClientProxiedBy: cas02.OVH.local (172.16.1.2) To ex2.OVH.local (172.16.7.2) X-Ovh-Tracer-Id: 8443404878081145384 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekhedrtdefucetufdoteggodftvfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Oct 2015 06:28:33 -0000 On Wednesday, October 21, 2015 05:28:52 PM Eric McCorkle wrote: Hi Eric, > Based on these tests, the reports of successful testing of UFS loading, t= he > fact that PCBSD and NextBSD have apparently picked it up, and the fact > that I've been using it for months, I think it's time to move towards > getting it committed. > > I'll post an updated patch by the end of the week. Good news :) Best regards, =2D-=20 Gana=EBl LAPLANCHE From owner-freebsd-hackers@freebsd.org Thu Oct 22 06:48:34 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2FDD0A1C905 for ; Thu, 22 Oct 2015 06:48:34 +0000 (UTC) (envelope-from ganael.laplanche@corp.ovh.com) Received: from 6.mo175.mail-out.ovh.net (6.mo175.mail-out.ovh.net [46.105.47.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EAACB1C6C for ; Thu, 22 Oct 2015 06:48:32 +0000 (UTC) (envelope-from ganael.laplanche@corp.ovh.com) Received: from ex2.OVH.local (corp.ovh.com [5.196.251.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mo175.mail-out.ovh.net (Postfix) with ESMTPS id C5342FF8279; Thu, 22 Oct 2015 07:31:48 +0200 (CEST) Received: from desk533202.ovh.net (5.196.2.34) by ex2.OVH.local (172.16.7.2) with Microsoft SMTP Server (TLS) id 15.1.225.42; Thu, 22 Oct 2015 07:31:48 +0200 From: Ganael Laplanche Organization: OVH To: Karl Denninger Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs Date: Thu, 22 Oct 2015 07:31:48 +0200 User-Agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; ) CC: References: <56211825.3080403@metricspace.net> <201510211345.29460.ganael.laplanche@corp.ovh.com> <56279F66.5030303@denninger.net> In-Reply-To: <56279F66.5030303@denninger.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: quoted-printable Message-ID: <201510220731.48157.ganael.laplanche@corp.ovh.com> X-Originating-IP: [5.196.2.34] X-ClientProxiedBy: cas02.OVH.local (172.16.1.2) To ex2.OVH.local (172.16.7.2) X-Ovh-Tracer-Id: 8413850006895114841 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekhedrtdefucetufdoteggodftvfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Oct 2015 06:48:34 -0000 On Wednesday, October 21, 2015 04:21:26 PM Karl Denninger wrote: Hi Karl, > Will loader.rc take all the parameters (e.g. loading of geli, aesni, > geom, setting kernel parameters, etc) that loader.conf does? =46or that purpose, you'll have to include appropriate .4th files from with= in=20 loader.rc. See sys/boot/README (within sources), loader.conf(5) and your=20 /boot/loader.rc for more details. Best regards, =2D-=20 Gana=EBl LAPLANCHE From owner-freebsd-hackers@freebsd.org Thu Oct 22 18:14:00 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8C780A1C4DA; Thu, 22 Oct 2015 18:14:00 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6A6751ACA; Thu, 22 Oct 2015 18:14:00 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 27748B9BB; Thu, 22 Oct 2015 14:13:58 -0400 (EDT) From: John Baldwin To: freebsd-hardware@freebsd.org Cc: Dieter BSD , freebsd-hackers@freebsd.org Subject: Re: ECC support Date: Thu, 22 Oct 2015 11:09:50 -0700 Message-ID: <1492434.22kxSKhHEJ@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; ) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 22 Oct 2015 14:13:59 -0400 (EDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Oct 2015 18:14:00 -0000 On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote: > Chris: > > MCA: Bank 1, Status 0x9400000000000151 > > MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 > > MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2 > > > > MCA: Address 0x81cc0e9f0 > > > > Kind of freaky. I've never had this error on this board before. > > On others tho. > > > > Try a search for MCA instead. > > Is there a decoder ring for those messages? I don't recall seeing > messages like that, although I wasn't looking for them, and they > don't leap out at you screaming ERROR! ERROR! Digital Unix had its > problems, but at least the error messages were fairly clear. > Something like "single bit memory error at address 0x12345..." > A simple edit to sys/x86/x86/mca.c > s/printf("UNCOR ");/printf("Uncorrectable ");/ > s/printf("COR ");/printf("Correctable ");/ > would make the messages at least slightly more meaningful to a viewer > who isn't intimently(sp) familiar with the mca. Which most people aren't. The problem is that there are other fields to decode and you can only fit so much in one line. Also, there is not a CPU-independent way to know the address of an ECC error. On Intel Core i3/5/7 (anything with QPI) you can identify the individual DIMM at least, but the label that the motherboard manufacturer uses varies by manufacturer. (You can maybe scrape that text from the SMBIOS tables, but only if they aren't wrong which they sometimes are, and good luck knowing if they are wrong or right.) Digital UNIX had the luxury of running on hardware built by the same company, not on a random assortment of boards built by various vendors. FreeBSD does not. sysutils/mcelog does some more verbose decoding of MCA records, but I find it to be equally gibberish for anyone not intimately familiar with a specific CPU. I wrote a tool for a previous employer that was able to do some simple parsing of MCA errors for Supermicro X7-X10 boards (Intel CPUs) and give a short summary that was used in a nagios check. However, it only handles a narrow set of systems. https://github.com/freebsd/freebsd/compare/master...bsdjhb:ecc -- John Baldwin From owner-freebsd-hackers@freebsd.org Thu Oct 22 18:57:41 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 95046A1CED2; Thu, 22 Oct 2015 18:57:41 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3ED9411E3; Thu, 22 Oct 2015 18:57:41 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from [194.32.164.24] ([194.32.164.24]) by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id t9MInDTL087303; Thu, 22 Oct 2015 19:49:13 +0100 (BST) (envelope-from rb@gid.co.uk) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ECC support From: Bob Bishop In-Reply-To: <1492434.22kxSKhHEJ@ralph.baldwin.cx> Date: Thu, 22 Oct 2015 19:49:13 +0100 Cc: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org, Dieter BSD Content-Transfer-Encoding: quoted-printable Message-Id: <74705089-408A-4FD3-899E-CA677390F855@gid.co.uk> References: <1492434.22kxSKhHEJ@ralph.baldwin.cx> To: John Baldwin X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Oct 2015 18:57:41 -0000 HI, > On 22 Oct 2015, at 19:09, John Baldwin wrote: >=20 > On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote: >> Chris: >>> MCA: Bank 1, Status 0x9400000000000151 >>> MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 >>> MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2 >>>=20 >>> MCA: Address 0x81cc0e9f0 >>>=20 >>> Kind of freaky. I've never had this error on this board before. >>> On others tho. >>>=20 >>> Try a search for MCA instead. >>=20 >> Is there a decoder ring for those messages? I don't recall seeing >> messages like that, although I wasn't looking for them, and they >> don't leap out at you screaming ERROR! ERROR! Digital Unix had its >> problems, but at least the error messages were fairly clear. >> Something like "single bit memory error at address 0x12345..." >> A simple edit to sys/x86/x86/mca.c >> s/printf("UNCOR ");/printf("Uncorrectable ");/ >> s/printf("COR ");/printf("Correctable ");/ >> would make the messages at least slightly more meaningful to a viewer >> who isn't intimently(sp) familiar with the mca. Which most people = aren't. >=20 > The problem is that there are other fields to decode and you can only = fit so > much in one line. Also, there is not a CPU-independent way to know = the > address of an ECC error. [etc] On server-class hardware, the platform management (BMC or whatever) is = probably decoding this stuff for event logs and can be interrogated via = IPMI (or whatever). -- Bob Bishop rb@gid.co.uk From owner-freebsd-hackers@freebsd.org Thu Oct 22 21:17:24 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2136FA1CB25; Thu, 22 Oct 2015 21:17:24 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F20E4804; Thu, 22 Oct 2015 21:17:23 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 5362FB94F; Thu, 22 Oct 2015 17:17:22 -0400 (EDT) From: John Baldwin To: Bob Bishop Cc: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org, Dieter BSD Subject: Re: ECC support Date: Thu, 22 Oct 2015 14:17:07 -0700 Message-ID: <1483396.WZc3qgD2yz@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; ) In-Reply-To: <74705089-408A-4FD3-899E-CA677390F855@gid.co.uk> References: <1492434.22kxSKhHEJ@ralph.baldwin.cx> <74705089-408A-4FD3-899E-CA677390F855@gid.co.uk> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 22 Oct 2015 17:17:22 -0400 (EDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Oct 2015 21:17:24 -0000 On Thursday, October 22, 2015 07:49:13 PM Bob Bishop wrote: > HI, > > > On 22 Oct 2015, at 19:09, John Baldwin wrote: > > > > On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote: > >> Chris: > >>> MCA: Bank 1, Status 0x9400000000000151 > >>> MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 > >>> MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2 > >>> > >>> MCA: Address 0x81cc0e9f0 > >>> > >>> Kind of freaky. I've never had this error on this board before. > >>> On others tho. > >>> > >>> Try a search for MCA instead. > >> > >> Is there a decoder ring for those messages? I don't recall seeing > >> messages like that, although I wasn't looking for them, and they > >> don't leap out at you screaming ERROR! ERROR! Digital Unix had its > >> problems, but at least the error messages were fairly clear. > >> Something like "single bit memory error at address 0x12345..." > >> A simple edit to sys/x86/x86/mca.c > >> s/printf("UNCOR ");/printf("Uncorrectable ");/ > >> s/printf("COR ");/printf("Correctable ");/ > >> would make the messages at least slightly more meaningful to a viewer > >> who isn't intimently(sp) familiar with the mca. Which most people aren't. > > > > The problem is that there are other fields to decode and you can only fit so > > much in one line. Also, there is not a CPU-independent way to know the > > address of an ECC error. [etc] > > On server-class hardware, the platform management (BMC or whatever) is probably decoding this stuff for event logs and can be interrogated via IPMI (or whatever). Not always well and not always with side effects you want. On Core 2 and Nehalem i7 class hardware I measured that it took on the order of 400 milliseconds (not micro) in SMM (system management mode, so your entire OS is halted) to write out each log entry to NVRAM. At least one place I worked at turned the BIOS ECC logging off because that delay was too costly. Also, even though your BMC may log it, the format for doing so isn't standard. The details such as the affected DIMM are in the OEM bits of the log record, so not something you can easily extract from, say, ipmitool sel elist. You'd have to log into the BIOS itself (or the BMC's web UI) to see which DIMM is affected. Neither of those are really great for automated reporting. -- John Baldwin From owner-freebsd-hackers@freebsd.org Fri Oct 23 11:19:07 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 34966A1C2FC for ; Fri, 23 Oct 2015 11:19:07 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from mail.metricspace.net (207-172-209-83.c3-0.arl-ubr1.sbo-arl.ma.static.cable.rcn.com [207.172.209.83]) by mx1.freebsd.org (Postfix) with ESMTP id 06CA818F1 for ; Fri, 23 Oct 2015 11:19:06 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from [IPv6:2001:470:1f11:617:ea2a:eaff:fe21:e067] (unknown [IPv6:2001:470:1f11:617:ea2a:eaff:fe21:e067]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: eric) by mail.metricspace.net (Postfix) with ESMTPSA id E02091C92 for ; Fri, 23 Oct 2015 11:18:59 +0000 (UTC) Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs To: freebsd-hackers@freebsd.org References: <56211825.3080403@metricspace.net> <201510211345.29460.ganael.laplanche@corp.ovh.com> <5ED4AE5E-8508-409B-B323-2CBFEBE77088@metricspace.net> <201510220733.32997.ganael.laplanche@corp.ovh.com> From: Eric McCorkle Message-ID: <562A17A3.6090803@metricspace.net> Date: Fri, 23 Oct 2015 07:18:59 -0400 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <201510220733.32997.ganael.laplanche@corp.ovh.com> Content-Type: multipart/mixed; boundary="------------070902030409020404010705" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Oct 2015 11:19:07 -0000 This is a multi-part message in MIME format. --------------070902030409020404010705 Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Transfer-Encoding: 7bit This is a patch pulled fresh from my /usr/src after an svn update. Therefore, it should represent a patch against the current head On 10/22/15 01:33, Ganael Laplanche wrote: > On Wednesday, October 21, 2015 05:28:52 PM Eric McCorkle wrote: > > Hi Eric, > >> Based on these tests, the reports of successful testing of UFS loading, the >> fact that PCBSD and NextBSD have apparently picked it up, and the fact >> that I've been using it for months, I think it's time to move towards >> getting it committed. >> >> I'll post an updated patch by the end of the week. > > Good news :) > > Best regards, > --------------070902030409020404010705 Content-Type: text/x-patch; name="zfs_efi_curr.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="zfs_efi_curr.diff" Index: sys/boot/efi/boot1/Makefile =================================================================== --- sys/boot/efi/boot1/Makefile (revision 289821) +++ sys/boot/efi/boot1/Makefile (working copy) @@ -13,7 +13,7 @@ INTERNALPROG= # architecture-specific loader code -SRCS= boot1.c self_reloc.c start.S +SRCS= boot1.c self_reloc.c start.S ufs_module.c zfs_module.c CFLAGS+= -I. CFLAGS+= -I${.CURDIR}/../include @@ -20,6 +20,8 @@ CFLAGS+= -I${.CURDIR}/../include/${MACHINE} CFLAGS+= -I${.CURDIR}/../../../contrib/dev/acpica/include CFLAGS+= -I${.CURDIR}/../../.. +CFLAGS+= -I${.CURDIR}/../../zfs/ +CFLAGS+= -I${.CURDIR}/../../../cddl/boot/zfs/ # Always add MI sources and REGULAR efi loader bits .PATH: ${.CURDIR}/../loader/arch/${MACHINE} Index: sys/boot/efi/boot1/boot1.c =================================================================== --- sys/boot/efi/boot1/boot1.c (revision 289821) +++ sys/boot/efi/boot1/boot1.c (working copy) @@ -5,6 +5,8 @@ * All rights reserved. * Copyright (c) 2014 Nathan Whitehorn * All rights reserved. + * Copyright (c) 2014 Eric McCorkle + * All rights reverved. * * Redistribution and use in source and binary forms are freely * permitted provided that the above copyright notice and this @@ -21,7 +23,6 @@ __FBSDID("$FreeBSD$"); #include -#include #include #include @@ -28,6 +29,8 @@ #include #include +#include "boot_module.h" + #define _PATH_LOADER "/boot/loader.efi" #define _PATH_KERNEL "/boot/kernel/kernel" @@ -41,14 +44,20 @@ u_int sp_size; }; +static const boot_module_t* const boot_modules[] = +{ +#ifdef ZFS_EFI_BOOT + &zfs_module, +#endif +#ifdef UFS_EFI_BOOT + &ufs_module +#endif +}; + +#define NUM_BOOT_MODULES (sizeof(boot_modules) / sizeof(boot_module_t*)) + static const char digits[] = "0123456789abcdef"; -static void panic(const char *fmt, ...) __dead2; -static int printf(const char *fmt, ...); -static int putchar(char c, void *arg); -static int vprintf(const char *fmt, va_list ap); -static int vsnprintf(char *str, size_t sz, const char *fmt, va_list ap); - static int __printf(const char *fmt, putc_func_t *putc, void *arg, va_list ap); static int __putc(char c, void *arg); static int __puts(const char *s, putc_func_t *putc, void *arg); @@ -62,9 +71,80 @@ static EFI_SYSTEM_TABLE *systab; static EFI_HANDLE *image; -static void -bcopy(const void *src, void *dst, size_t len) + +void* Malloc(size_t len, const char* file, int line) { + void* out; + if (systab->BootServices->AllocatePool(EfiLoaderData, + len, &out) != + EFI_SUCCESS) { + printf("Can't allocate memory pool\n"); + return NULL; + } + return out; +} + +char* strcpy(char* dst, const char* src) { + for(int i = 0; src[i]; i++) + dst[i] = src[i]; + + return dst; +} + +char* strchr(const char* s, int c) { + for(int i = 0; s[i]; i++) + if (s[i] == c) + return (char*)(s + i); + + return NULL; +} + +int strncmp(const char *a, const char *b, size_t len) +{ + for (int i = 0; i < len; i++) + if(a[i] == '\0' && b[i] == '\0') { + return 0; + } else if(a[i] < b[i]) { + return -1; + } else if(a[i] > b[i]) { + return 1; + } + + return 0; +} + +char* strdup(const char* s) { + int len; + + for(len = 1; s[len]; len++); + + char* out = malloc(len); + + for(int i = 0; i < len; i++) + out[i] = s[i]; + + return out; +} + +int bcmp(const void *a, const void *b, size_t len) +{ + const char *sa = a; + const char *sb = b; + + for (int i = 0; i < len; i++) + if(sa[i] != sb[i]) + return 1; + + return 0; +} + +int memcmp(const void *a, const void *b, size_t len) +{ + return bcmp(a, b, len); +} + +void bcopy(const void *src, void *dst, size_t len) +{ const char *s = src; char *d = dst; @@ -72,23 +152,24 @@ *d++ = *s++; } -static void -memcpy(void *dst, const void *src, size_t len) +void* memcpy(void *dst, const void *src, size_t len) { bcopy(src, dst, len); + return dst; } -static void -bzero(void *b, size_t len) + +void* memset(void *b, int val, size_t len) { char *p = b; while (len-- != 0) - *p++ = 0; + *p++ = val; + + return b; } -static int -strcmp(const char *s1, const char *s2) +int strcmp(const char *s1, const char *s2) { for (; *s1 == *s2 && *s1; s1++, s2++) ; @@ -95,30 +176,99 @@ return ((u_char)*s1 - (u_char)*s2); } +int putchr(char c, void *arg) +{ + CHAR16 buf[2]; + + if (c == '\n') { + buf[0] = '\r'; + buf[1] = 0; + systab->ConOut->OutputString(systab->ConOut, buf); + } + buf[0] = c; + buf[1] = 0; + systab->ConOut->OutputString(systab->ConOut, buf); + return (1); +} + static EFI_GUID BlockIoProtocolGUID = BLOCK_IO_PROTOCOL; static EFI_GUID DevicePathGUID = DEVICE_PATH_PROTOCOL; +static EFI_GUID ConsoleControlGUID = EFI_CONSOLE_CONTROL_PROTOCOL_GUID; static EFI_GUID LoadedImageGUID = LOADED_IMAGE_PROTOCOL; -static EFI_GUID ConsoleControlGUID = EFI_CONSOLE_CONTROL_PROTOCOL_GUID; -static EFI_BLOCK_IO *bootdev; -static EFI_DEVICE_PATH *bootdevpath; -static EFI_HANDLE *bootdevhandle; +#define MAX_DEVS 128 -EFI_STATUS efi_main(EFI_HANDLE Ximage, EFI_SYSTEM_TABLE* Xsystab) +void try_load(const boot_module_t* const mod, + const dev_info_t devs[], + size_t ndevs) { - EFI_HANDLE handles[128]; + int idx; + size_t bufsize; + void* const buffer = mod->load(devs, ndevs, _PATH_LOADER, &idx, &bufsize); + EFI_HANDLE loaderhandle; + EFI_LOADED_IMAGE *loaded_image; + + if (NULL == buffer) { + printf("Could not load file\n"); + return; + } + //printf("Loaded file %s, image at %p\n" + // "Attempting to load as bootable image...", + // _PATH_LOADER, image); + if (systab->BootServices->LoadImage(TRUE, image, devs[idx].devpath, + buffer, bufsize, &loaderhandle) != + EFI_SUCCESS) { + //printf("failed\n"); + return; + } + //printf("success\n" + // "Preparing to execute image..."); + + if (systab->BootServices->HandleProtocol(loaderhandle, + &LoadedImageGUID, + (VOID**)&loaded_image) != + EFI_SUCCESS) { + //printf("failed\n"); + return; + } + + //printf("success\n"); + + loaded_image->DeviceHandle = devs[idx].devhandle; + + //printf("Image prepared, attempting to execute\n"); + // XXX Set up command args first + if (systab->BootServices->StartImage(loaderhandle, NULL, NULL) != + EFI_SUCCESS) { + //printf("Failed to execute loader\n"); + return; + } + //printf("Shouldn't be here!\n"); +} + +void efi_main(EFI_HANDLE Ximage, EFI_SYSTEM_TABLE* Xsystab) +{ + EFI_HANDLE handles[MAX_DEVS]; + dev_info_t module_devs[NUM_BOOT_MODULES][MAX_DEVS]; + size_t dev_offsets[NUM_BOOT_MODULES]; EFI_BLOCK_IO *blkio; - UINTN i, nparts = sizeof(handles), cols, rows, max_dim, best_mode; + UINTN nparts = sizeof(handles); EFI_STATUS status; EFI_DEVICE_PATH *devpath; EFI_BOOT_SERVICES *BS; EFI_CONSOLE_CONTROL_PROTOCOL *ConsoleControl = NULL; SIMPLE_TEXT_OUTPUT_INTERFACE *conout = NULL; - char *path = _PATH_LOADER; + // Basic initialization systab = Xsystab; image = Ximage; + for(int i = 0; i < NUM_BOOT_MODULES; i++) + { + dev_offsets[i] = 0; + } + + // Set up the console, so printf works. BS = systab->BootServices; status = BS->LocateProtocol(&ConsoleControlGUID, NULL, (VOID **)&ConsoleControl); @@ -128,10 +278,14 @@ /* * Reset the console and find the best text mode. */ + UINTN max_dim; + UINTN best_mode; + UINTN cols; + UINTN rows; conout = systab->ConOut; conout->Reset(conout, TRUE); max_dim = best_mode = 0; - for (i = 0; ; i++) { + for (int i = 0; ; i++) { status = conout->QueryMode(conout, i, &cols, &rows); if (EFI_ERROR(status)) @@ -141,6 +295,7 @@ best_mode = i; } } + if (max_dim > 0) conout->SetMode(conout, best_mode); conout->EnableCursor(conout, TRUE); @@ -147,206 +302,94 @@ conout->ClearScreen(conout); printf("\n" - ">> FreeBSD EFI boot block\n"); - printf(" Loader path: %s\n", path); + ">> FreeBSD ZFS-enabled EFI boot block\n"); + printf(" Loader path: %s\n\n", _PATH_LOADER); + printf(" Initializing modules:"); + for(int i = 0; i < NUM_BOOT_MODULES; i++) + { + if (NULL != boot_modules[i]) + { + printf(" %s", boot_modules[i]->name); + boot_modules[i]->init(image, systab, BS); + } + } + putchr('\n', NULL); + + // Get all the device handles status = systab->BootServices->LocateHandle(ByProtocol, &BlockIoProtocolGUID, NULL, &nparts, handles); nparts /= sizeof(handles[0]); + //printf(" Scanning %lu device handles\n", nparts); - for (i = 0; i < nparts; i++) { + // Scan all partitions, probing with all modules. + for (int i = 0; i < nparts; i++) { + dev_info_t devinfo; + + // Figure out if we're dealing with an actual partition status = systab->BootServices->HandleProtocol(handles[i], &DevicePathGUID, (void **)&devpath); - if (EFI_ERROR(status)) + if (EFI_ERROR(status)) { + //printf(" Not a device path protocol\n"); continue; + } - while (!IsDevicePathEnd(NextDevicePathNode(devpath))) + while (!IsDevicePathEnd(NextDevicePathNode(devpath))) { + //printf(" Advancing to next device\n"); devpath = NextDevicePathNode(devpath); + } status = systab->BootServices->HandleProtocol(handles[i], &BlockIoProtocolGUID, (void **)&blkio); - if (EFI_ERROR(status)) + if (EFI_ERROR(status)) { + //printf(" Not a block device\n"); continue; + } - if (!blkio->Media->LogicalPartition) + if (!blkio->Media->LogicalPartition) { + //printf(" Logical partition\n"); continue; + } - if (domount(devpath, blkio, 1) >= 0) - break; - } + // Setup devinfo + devinfo.dev = blkio; + devinfo.devpath = devpath; + devinfo.devhandle = handles[i]; + devinfo.devdata = NULL; - if (i == nparts) - panic("No bootable partition found"); - - bootdevhandle = handles[i]; - load(path); - - panic("Load failed"); - - return EFI_SUCCESS; -} - -static int -dskread(void *buf, u_int64_t lba, int nblk) -{ - EFI_STATUS status; - int size; - - lba = lba / (bootdev->Media->BlockSize / DEV_BSIZE); - size = nblk * DEV_BSIZE; - status = bootdev->ReadBlocks(bootdev, bootdev->Media->MediaId, lba, - size, buf); - - if (EFI_ERROR(status)) - return (-1); - - return (0); -} - -#include "ufsread.c" - -static ssize_t -fsstat(ufs_ino_t inode) -{ -#ifndef UFS2_ONLY - static struct ufs1_dinode dp1; - ufs1_daddr_t addr1; -#endif -#ifndef UFS1_ONLY - static struct ufs2_dinode dp2; -#endif - static struct fs fs; - static ufs_ino_t inomap; - char *blkbuf; - void *indbuf; - size_t n, nb, size, off, vboff; - ufs_lbn_t lbn; - ufs2_daddr_t addr2, vbaddr; - static ufs2_daddr_t blkmap, indmap; - u_int u; - - blkbuf = dmadat->blkbuf; - indbuf = dmadat->indbuf; - if (!dsk_meta) { - inomap = 0; - for (n = 0; sblock_try[n] != -1; n++) { - if (dskread(dmadat->sbbuf, sblock_try[n] / DEV_BSIZE, - SBLOCKSIZE / DEV_BSIZE)) - return -1; - memcpy(&fs, dmadat->sbbuf, sizeof(struct fs)); - if (( -#if defined(UFS1_ONLY) - fs.fs_magic == FS_UFS1_MAGIC -#elif defined(UFS2_ONLY) - (fs.fs_magic == FS_UFS2_MAGIC && - fs.fs_sblockloc == sblock_try[n]) -#else - fs.fs_magic == FS_UFS1_MAGIC || - (fs.fs_magic == FS_UFS2_MAGIC && - fs.fs_sblockloc == sblock_try[n]) -#endif - ) && - fs.fs_bsize <= MAXBSIZE && - fs.fs_bsize >= sizeof(struct fs)) - break; - } - if (sblock_try[n] == -1) { - printf("Not ufs\n"); - return -1; - } - dsk_meta++; - } else - memcpy(&fs, dmadat->sbbuf, sizeof(struct fs)); - if (!inode) - return 0; - if (inomap != inode) { - n = IPERVBLK(&fs); - if (dskread(blkbuf, INO_TO_VBA(&fs, n, inode), DBPERVBLK)) - return -1; - n = INO_TO_VBO(n, inode); -#if defined(UFS1_ONLY) - memcpy(&dp1, (struct ufs1_dinode *)blkbuf + n, - sizeof(struct ufs1_dinode)); -#elif defined(UFS2_ONLY) - memcpy(&dp2, (struct ufs2_dinode *)blkbuf + n, - sizeof(struct ufs2_dinode)); -#else - if (fs.fs_magic == FS_UFS1_MAGIC) - memcpy(&dp1, (struct ufs1_dinode *)blkbuf + n, - sizeof(struct ufs1_dinode)); - else - memcpy(&dp2, (struct ufs2_dinode *)blkbuf + n, - sizeof(struct ufs2_dinode)); -#endif - inomap = inode; - fs_off = 0; - blkmap = indmap = 0; + // Run through each module, see if it can load this partition + for (int j = 0; j < NUM_BOOT_MODULES; j++ ) + { + if (NULL != boot_modules[j] && + boot_modules[j]->probe(&devinfo)) + { + // If it can, save it to the device list for + // that module + module_devs[j][dev_offsets[j]++] = devinfo; + } + } } - size = DIP(di_size); - n = size - fs_off; - return (n); -} -static struct dmadat __dmadat; + // Select a partition to boot. We do this by trying each + // module in order. + for (int i = 0; i < NUM_BOOT_MODULES; i++) + { + if (NULL != boot_modules[i]) + { + //printf(" Trying to load from %lu %s partitions\n", + // dev_offsets[i], boot_modules[i]->name); + try_load(boot_modules[i], module_devs[i], + dev_offsets[i]); + //printf(" Failed\n"); + } + } -static int -domount(EFI_DEVICE_PATH *device, EFI_BLOCK_IO *blkio, int quiet) -{ - - dmadat = &__dmadat; - bootdev = blkio; - bootdevpath = device; - if (fsread(0, NULL, 0)) { - if (!quiet) - printf("domount: can't read superblock\n"); - return (-1); - } - if (!quiet) - printf("Succesfully mounted UFS filesystem\n"); - return (0); + // If we get here, we're out of luck... + panic("No bootable partitions found!"); } -static void -load(const char *fname) +void panic(const char *fmt, ...) { - ufs_ino_t ino; - EFI_STATUS status; - EFI_HANDLE loaderhandle; - EFI_LOADED_IMAGE *loaded_image; - void *buffer; - size_t bufsize; - - if ((ino = lookup(fname)) == 0) { - printf("File %s not found\n", fname); - return; - } - - bufsize = fsstat(ino); - status = systab->BootServices->AllocatePool(EfiLoaderData, - bufsize, &buffer); - fsread(ino, buffer, bufsize); - - /* XXX: For secure boot, we need our own loader here */ - status = systab->BootServices->LoadImage(TRUE, image, bootdevpath, - buffer, bufsize, &loaderhandle); - if (EFI_ERROR(status)) - printf("LoadImage failed with error %lx\n", status); - - status = systab->BootServices->HandleProtocol(loaderhandle, - &LoadedImageGUID, (VOID**)&loaded_image); - if (EFI_ERROR(status)) - printf("HandleProtocol failed with error %lx\n", status); - - loaded_image->DeviceHandle = bootdevhandle; - - status = systab->BootServices->StartImage(loaderhandle, NULL, NULL); - if (EFI_ERROR(status)) - printf("StartImage failed with error %lx\n", status); -} - -static void -panic(const char *fmt, ...) -{ char buf[128]; va_list ap; @@ -358,50 +401,25 @@ while (1) {} } -static int -printf(const char *fmt, ...) +int printf(const char *fmt, ...) { va_list ap; int ret; - /* Don't annoy the user as we probe for partitions */ - if (strcmp(fmt,"Not ufs\n") == 0) - return 0; va_start(ap, fmt); - ret = vprintf(fmt, ap); + ret = __printf(fmt, putchr, 0, ap); va_end(ap); return (ret); } -static int -putchar(char c, void *arg) +void vprintf(const char *fmt, va_list ap) { - CHAR16 buf[2]; - - if (c == '\n') { - buf[0] = '\r'; - buf[1] = 0; - systab->ConOut->OutputString(systab->ConOut, buf); - } - buf[0] = c; - buf[1] = 0; - systab->ConOut->OutputString(systab->ConOut, buf); - return (1); + __printf(fmt, putchr, 0, ap); } -static int -vprintf(const char *fmt, va_list ap) +int vsnprintf(char *str, size_t sz, const char *fmt, va_list ap) { - int ret; - - ret = __printf(fmt, putchar, 0, ap); - return (ret); -} - -static int -vsnprintf(char *str, size_t sz, const char *fmt, va_list ap) -{ struct sp_data sp; int ret; Index: sys/boot/efi/include/efilib.h =================================================================== --- sys/boot/efi/include/efilib.h (revision 289821) +++ sys/boot/efi/include/efilib.h (working copy) @@ -43,7 +43,8 @@ int efi_register_handles(struct devsw *, EFI_HANDLE *, EFI_HANDLE *, int); EFI_HANDLE efi_find_handle(struct devsw *, int); -int efi_handle_lookup(EFI_HANDLE, struct devsw **, int *); +void efi_handle_update_dev(EFI_HANDLE, struct devsw *, int, uint64_t); +int efi_handle_lookup(EFI_HANDLE, struct devsw **, int *, uint64_t *); int efi_status_to_errno(EFI_STATUS); time_t efi_time(EFI_TIME *); Index: sys/boot/efi/libefi/handles.c =================================================================== --- sys/boot/efi/libefi/handles.c (revision 289821) +++ sys/boot/efi/libefi/handles.c (working copy) @@ -35,6 +35,7 @@ EFI_HANDLE alias; struct devsw *dev; int unit; + uint64_t extra; }; struct entry *entry; @@ -78,8 +79,28 @@ return (NULL); } +void efi_handle_update_dev(const EFI_HANDLE handle, + struct devsw * const dev, + int unit, + uint64_t guid) +{ + int idx; + + for (idx = 0; idx < nentries; idx++) { + if (entry[idx].handle != handle) + continue; + entry[idx].dev = dev; + entry[idx].unit = unit; + entry[idx].alias = NULL; + entry[idx].extra = guid; + } +} + int -efi_handle_lookup(EFI_HANDLE h, struct devsw **dev, int *unit) +efi_handle_lookup(EFI_HANDLE h, + struct devsw **dev, + int *unit, + uint64_t *extra) { int idx; @@ -90,6 +111,8 @@ *dev = entry[idx].dev; if (unit != NULL) *unit = entry[idx].unit; + if (extra != NULL) + *extra = entry[idx].extra; return (0); } return (ENOENT); Index: sys/boot/efi/loader/Makefile =================================================================== --- sys/boot/efi/loader/Makefile (revision 289821) +++ sys/boot/efi/loader/Makefile (working copy) @@ -21,7 +21,8 @@ main.c \ self_reloc.c \ smbios.c \ - vers.c + vers.c \ + ${.CURDIR}/zfs.c .PATH: ${.CURDIR}/arch/${MACHINE} # For smbios.c @@ -35,6 +36,8 @@ CFLAGS+= -I${.CURDIR}/../../../contrib/dev/acpica/include CFLAGS+= -I${.CURDIR}/../../.. CFLAGS+= -I${.CURDIR}/../../i386/libi386 +CFLAGS+= -I${.CURDIR}/../../zfs +CFLAGS+= -I${.CURDIR}/../../../cddl/boot/zfs CFLAGS+= -DNO_PCI -DEFI # make buildenv doesn't set DESTDIR, this means LIBSTAND @@ -67,7 +70,7 @@ CFLAGS+= -DEFI_STAGING_SIZE=${EFI_STAGING_SIZE} .endif -# Always add MI sources +# Always add MI sources .PATH: ${.CURDIR}/../../common .include "${.CURDIR}/../../common/Makefile.inc" CFLAGS+= -I${.CURDIR}/../../common @@ -78,7 +81,7 @@ LDSCRIPT= ${.CURDIR}/arch/${MACHINE}/ldscript.${MACHINE} LDFLAGS+= -Wl,-T${LDSCRIPT} -Wl,-Bsymbolic -shared -CLEANFILES+= vers.c loader.efi +CLEANFILES+= zfs.c vers.c loader.efi NEWVERSWHAT= "EFI loader" ${MACHINE} @@ -85,6 +88,9 @@ vers.c: ${.CURDIR}/../../common/newvers.sh ${.CURDIR}/../../efi/loader/version sh ${.CURDIR}/../../common/newvers.sh ${.CURDIR}/version ${NEWVERSWHAT} +zfs.c: + cp ${.CURDIR}/../../zfs/zfs.c ${.CURDIR} + OBJCOPY?= objcopy OBJDUMP?= objdump @@ -108,9 +114,9 @@ LIBEFI= ${.OBJDIR}/../libefi/libefi.a -DPADD= ${LIBFICL} ${LIBEFI} ${LIBFDT} ${LIBEFI_FDT} ${LIBSTAND} \ +DPADD= ${LIBFICL} ${LIBEFI} ${LIBFDT} ${LIBEFI_FDT} ../../../../lib/libstand/libstand.a \ ${LDSCRIPT} -LDADD= ${LIBFICL} ${LIBEFI} ${LIBFDT} ${LIBEFI_FDT} ${LIBSTAND} +LDADD= ${LIBFICL} ${LIBEFI} ${LIBFDT} ${LIBEFI_FDT} ../../../../lib/libstand/libstand.a .endif # ${COMPILER_TYPE} != "gcc" Index: sys/boot/efi/loader/conf.c =================================================================== --- sys/boot/efi/loader/conf.c (revision 289821) +++ sys/boot/efi/loader/conf.c (working copy) @@ -31,14 +31,17 @@ #include #include #include +#include "../zfs/libzfs.h" struct devsw *devsw[] = { &efipart_dev, &efinet_dev, + &zfs_dev, NULL }; struct fs_ops *file_system[] = { + &zfs_fsops, &dosfs_fsops, &ufs_fsops, &cd9660_fsops, Index: sys/boot/efi/loader/devicename.c =================================================================== --- sys/boot/efi/loader/devicename.c (revision 289821) +++ sys/boot/efi/loader/devicename.c (working copy) @@ -32,6 +32,7 @@ #include #include #include "bootstrap.h" +#include "libzfs.h" #include #include @@ -38,7 +39,7 @@ static int efi_parsedev(struct devdesc **, const char *, const char **); -/* +/* * Point (dev) at an allocated device specifier for the device matching the * path in (devspec). If it contains an explicit device specification, * use that. If not, use the default device. @@ -48,7 +49,6 @@ { struct devdesc **dev = (struct devdesc **)vdev; int rv; - /* * If it looks like this is just a path and no device, then * use the current device instead. @@ -61,7 +61,8 @@ } /* Parse the device name off the beginning of the devspec. */ - return (efi_parsedev(dev, devspec, path)); + const int out = efi_parsedev(dev, devspec, path); + return out; } /* @@ -87,8 +88,9 @@ int i, err; /* minimum length check */ - if (strlen(devspec) < 2) + if (strlen(devspec) < 2) { return (EINVAL); + } /* look for a device that matches */ for (i = 0; devsw[i] != NULL; i++) { @@ -96,27 +98,39 @@ if (!strncmp(devspec, dv->dv_name, strlen(dv->dv_name))) break; } - if (devsw[i] == NULL) + if (devsw[i] == NULL) { return (ENOENT); + } + np = devspec + strlen(dv->dv_name); - idev = malloc(sizeof(struct devdesc)); - if (idev == NULL) - return (ENOMEM); + if (DEVT_ZFS == dv->dv_type) { + idev = malloc(sizeof(struct zfs_devdesc)); + int out = zfs_parsedev((struct zfs_devdesc*)idev, np, path); + if (0 == out) { + *dev = idev; + cp = strchr(np + 1, ':'); + } else { + free(idev); + return out; + } + } else { + idev = malloc(sizeof(struct devdesc)); + if (idev == NULL) + return (ENOMEM); - idev->d_dev = dv; - idev->d_type = dv->dv_type; - idev->d_unit = -1; - + idev->d_dev = dv; + idev->d_type = dv->dv_type; + idev->d_unit = -1; + if (*np != '\0' && *np != ':') { + idev->d_unit = strtol(np, &cp, 0); + if (cp == np) { + idev->d_unit = -1; + free(idev); + return (EUNIT); + } + } + } err = 0; - np = devspec + strlen(dv->dv_name); - if (*np != '\0' && *np != ':') { - idev->d_unit = strtol(np, &cp, 0); - if (cp == np) { - idev->d_unit = -1; - free(idev); - return (EUNIT); - } - } if (*cp != '\0' && *cp != ':') { free(idev); return (EINVAL); @@ -138,10 +152,11 @@ static char buf[32]; /* XXX device length constant? */ switch(dev->d_type) { + case DEVT_ZFS: + return zfs_fmtdev(dev); case DEVT_NONE: strcpy(buf, "(no device)"); break; - default: sprintf(buf, "%s%d:", dev->d_dev->dv_name, dev->d_unit); break; Index: sys/boot/efi/loader/main.c =================================================================== --- sys/boot/efi/loader/main.c (revision 289821) +++ sys/boot/efi/loader/main.c (working copy) @@ -39,6 +39,7 @@ #include #include "loader_efi.h" +#include "libzfs.h" extern char bootprog_name[]; extern char bootprog_rev[]; @@ -45,8 +46,9 @@ extern char bootprog_date[]; extern char bootprog_maker[]; -struct devdesc currdev; /* our current device */ -struct arch_switch archsw; /* MI/MD interface boundary */ +/* our current device */ +/* MI/MD interface boundary */ +struct arch_switch archsw; EFI_GUID acpi = ACPI_TABLE_GUID; EFI_GUID acpi20 = ACPI_20_TABLE_GUID; @@ -61,6 +63,70 @@ EFI_GUID debugimg = DEBUG_IMAGE_INFO_TABLE_GUID; EFI_GUID fdtdtb = FDT_TABLE_GUID; +static void efi_zfs_probe(void); + +static void +print_str16(const CHAR16* const str) +{ + for(int i; str[i]; i++) + { + printf("%c", str[i]); + } +} + +/* +static int +str16cmp(const CHAR16 const *a, + const char* const b) +{ + for(int i = 0; a[i] || b[i]; i++) + { + const CHAR16 achr = a[i]; + const CHAR16 bchr = b[i]; + if (achr < bchr) + { + return -1; + } else if (achr > bchr) + { + return 1; + } + } + return 0; +} + +// Split an arg of the form "argname=argval", replacing the '=' with a \0 +static CHAR16* +split_arg(CHAR16 *const str) +{ + for (int i = 0; str[i]; i++) + { + if ('=' == str[i]) + { + str[i] = 0; + return str + i + 1; + } + } + return NULL; +} + +static void +handle_arg(CHAR16 *const arg) +{ + const CHAR16* const argval = split_arg(arg); + const CHAR16* const argname = arg; + + if (NULL != argval) + { + printf("Unrecognized argument \""); + print_arg(argname); + printf("\n"); + } else { + printf("Unrecognized argument \""); + print_arg(argname); + printf("\n"); + } +} +*/ EFI_STATUS main(int argc, CHAR16 *argv[]) { @@ -69,7 +135,15 @@ EFI_GUID *guid; int i; - /* + archsw.arch_autoload = efi_autoload; + archsw.arch_getdev = efi_getdev; + archsw.arch_copyin = efi_copyin; + archsw.arch_copyout = efi_copyout; + archsw.arch_readin = efi_readin; + // Note this needs to be set before ZFS init + archsw.arch_zfs_probe = efi_zfs_probe; + + /* * XXX Chicken-and-egg problem; we want to have console output * early, but some console attributes may depend on reading from * eg. the boot device, which we can't do yet. We can use @@ -85,13 +159,22 @@ /* * March through the device switch probing for things. */ - for (i = 0; devsw[i] != NULL; i++) - if (devsw[i]->dv_init != NULL) + for (i = 0; devsw[i] != NULL; i++) { + if (devsw[i]->dv_init != NULL) { + printf("Initializing %s\n", devsw[i]->dv_name); (devsw[i]->dv_init)(); - + } + } /* Get our loaded image protocol interface structure. */ BS->HandleProtocol(IH, &imgid, (VOID**)&img); + printf("Command line arguments:"); + for(i = 0; i < argc; i++) { + printf(" "); + print_str16(argv[i]); + } + printf("\n"); + printf("Image base: 0x%lx\n", (u_long)img->ImageBase); printf("EFI version: %d.%02d\n", ST->Hdr.Revision >> 16, ST->Hdr.Revision & 0xffff); @@ -105,8 +188,13 @@ printf("%s, Revision %s\n", bootprog_name, bootprog_rev); printf("(%s, %s)\n", bootprog_maker, bootprog_date); - efi_handle_lookup(img->DeviceHandle, &currdev.d_dev, &currdev.d_unit); - currdev.d_type = currdev.d_dev->dv_type; + // Handle command-line arguments + /* + for(i = 1; i < argc; i++) + { + handle_arg(argv[i]); + } + */ /* * Disable the watchdog timer. By default the boot manager sets @@ -119,19 +207,39 @@ */ BS->SetWatchdogTimer(0, 0, 0, NULL); - env_setenv("currdev", EV_VOLATILE, efi_fmtdev(&currdev), - efi_setcurrdev, env_nounset); - env_setenv("loaddev", EV_VOLATILE, efi_fmtdev(&currdev), env_noset, - env_nounset); + struct devsw *dev; + int unit; + uint64_t pool_guid; + efi_handle_lookup(img->DeviceHandle, &dev, &unit, &pool_guid); + switch (dev->dv_type) { + case DEVT_ZFS: { + struct zfs_devdesc currdev; + currdev.d_dev = dev; + currdev.d_unit = unit; + currdev.d_type = currdev.d_dev->dv_type; + currdev.d_opendata = NULL; + currdev.pool_guid = pool_guid; + currdev.root_guid = 0; + env_setenv("currdev", EV_VOLATILE, efi_fmtdev(&currdev), + efi_setcurrdev, env_nounset); + env_setenv("loaddev", EV_VOLATILE, efi_fmtdev(&currdev), env_noset, + env_nounset); + } break; + default: { + struct devdesc currdev; + currdev.d_dev = dev; + currdev.d_unit = unit; + currdev.d_opendata = NULL; + currdev.d_type = currdev.d_dev->dv_type; + env_setenv("currdev", EV_VOLATILE, efi_fmtdev(&currdev), + efi_setcurrdev, env_nounset); + env_setenv("loaddev", EV_VOLATILE, efi_fmtdev(&currdev), env_noset, + env_nounset); + } break; + } setenv("LINES", "24", 1); /* optional */ - archsw.arch_autoload = efi_autoload; - archsw.arch_getdev = efi_getdev; - archsw.arch_copyin = efi_copyin; - archsw.arch_copyout = efi_copyout; - archsw.arch_readin = efi_readin; - for (i = 0; i < ST->NumberOfTableEntries; i++) { guid = &ST->ConfigurationTable[i].VendorGuid; if (!memcmp(guid, &smbios, sizeof(EFI_GUID))) { @@ -350,7 +458,6 @@ return (CMD_OK); } - COMMAND_SET(nvram, "nvram", "get or set NVRAM variables", command_nvram); static int @@ -402,6 +509,27 @@ return (CMD_OK); } +COMMAND_SET(lszfs, "lszfs", "list child datasets of a zfs dataset", + command_lszfs); + +static int +command_lszfs(int argc, char *argv[]) +{ + int err; + + if (argc != 2) { + command_errmsg = "wrong number of arguments"; + return (CMD_ERROR); + } + + err = zfs_list(argv[1]); + if (err != 0) { + command_errmsg = strerror(err); + return (CMD_ERROR); + } + return (CMD_OK); +} + #ifdef LOADER_FDT_SUPPORT extern int command_fdt_internal(int argc, char *argv[]); @@ -420,3 +548,23 @@ COMMAND_SET(fdt, "fdt", "flattened device tree handling", command_fdt); #endif + +static void +efi_zfs_probe(void) +{ + EFI_BLOCK_IO *blkio; + EFI_HANDLE h; + EFI_STATUS status; + u_int unit = 0; + char devname[32]; + uint64_t pool_guid; + + for (int i = 0, h = efi_find_handle(&efipart_dev, 0); + h != NULL; h = efi_find_handle(&efipart_dev, ++i)) { + snprintf(devname, sizeof devname, "%s%d:", + efipart_dev.dv_name, i); + if(0 == zfs_probe_dev(devname, &pool_guid)) { + efi_handle_update_dev(h, &zfs_dev, unit++, pool_guid); + } + } +} Index: sys/boot/zfs/zfs.c =================================================================== --- sys/boot/zfs/zfs.c (revision 289821) +++ sys/boot/zfs/zfs.c (working copy) @@ -140,7 +140,7 @@ n = size; if (fp->f_seekp + n > sb.st_size) n = sb.st_size - fp->f_seekp; - + rc = dnode_read(spa, &fp->f_dnode, fp->f_seekp, start, n); if (rc) return (rc); @@ -493,7 +493,7 @@ } } close(pa.fd); - return (0); + return (ret); } /* --------------070902030409020404010705-- From owner-freebsd-hackers@freebsd.org Fri Oct 23 11:37:34 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D5995A1C97E; Fri, 23 Oct 2015 11:37:34 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6869D7F0; Fri, 23 Oct 2015 11:37:33 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from [194.32.164.28] ([194.32.164.28]) by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id t9NBbVjl080406; Fri, 23 Oct 2015 12:37:31 +0100 (BST) (envelope-from rb@gid.co.uk) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ECC support From: Bob Bishop In-Reply-To: <1483396.WZc3qgD2yz@ralph.baldwin.cx> Date: Fri, 23 Oct 2015 12:37:31 +0100 Cc: freebsd-hackers@freebsd.org, Dieter BSD , freebsd-hardware@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <97482413-D2AA-4C32-AEFF-EB65D5D8542B@gid.co.uk> References: <1492434.22kxSKhHEJ@ralph.baldwin.cx> <74705089-408A-4FD3-899E-CA677390F855@gid.co.uk> <1483396.WZc3qgD2yz@ralph.baldwin.cx> To: John Baldwin X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Oct 2015 11:37:34 -0000 Hi, > On 22 Oct 2015, at 22:17, John Baldwin wrote: >=20 > On Thursday, October 22, 2015 07:49:13 PM Bob Bishop wrote: >> HI, >>=20 >>> On 22 Oct 2015, at 19:09, John Baldwin wrote: >>>=20 >>> On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote: >>>> Chris: >>>>> MCA: Bank 1, Status 0x9400000000000151 >>>>> MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 >>>>> MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2 >>>>>=20 >>>>> MCA: Address 0x81cc0e9f0 >>>>>=20 >>>>> Kind of freaky. I've never had this error on this board before. >>>>> On others tho. >>>>>=20 >>>>> Try a search for MCA instead. >>>>=20 >>>> Is there a decoder ring for those messages? I don't recall seeing >>>> messages like that, although I wasn't looking for them, and they >>>> don't leap out at you screaming ERROR! ERROR! Digital Unix had its >>>> problems, but at least the error messages were fairly clear. >>>> Something like "single bit memory error at address 0x12345..." >>>> A simple edit to sys/x86/x86/mca.c >>>> s/printf("UNCOR ");/printf("Uncorrectable ");/ >>>> s/printf("COR ");/printf("Correctable ");/ >>>> would make the messages at least slightly more meaningful to a = viewer >>>> who isn't intimently(sp) familiar with the mca. Which most people = aren't. >>>=20 >>> The problem is that there are other fields to decode and you can = only fit so >>> much in one line. Also, there is not a CPU-independent way to know = the >>> address of an ECC error. [etc] >>=20 >> On server-class hardware, the platform management (BMC or whatever) = is probably decoding this stuff for event logs and can be interrogated = via IPMI (or whatever). >=20 > Not always well and not always with side effects you want. On Core 2 = and > Nehalem i7 class hardware I measured that it took on the order of 400 > milliseconds (not micro) in SMM (system management mode, so your = entire > OS is halted) to write out each log entry to NVRAM. At least one = place I > worked at turned the BIOS ECC logging off because that delay was too = costly. >=20 > Also, even though your BMC may log it, the format for doing so isn't > standard. The details such as the affected DIMM are in the OEM bits = of > the log record, so not something you can easily extract from, say, > ipmitool sel elist. You'd have to log into the BIOS itself (or the = BMC's > web UI) to see which DIMM is affected. Neither of those are really = great > for automated reporting. All agreed. I was just flagging up the existence of another possible = channel to get at ECC logging. > --=20 > John Baldwin -- Bob Bishop rb@gid.co.uk From owner-freebsd-hackers@freebsd.org Fri Oct 23 12:33:03 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 609DCA1D081 for ; Fri, 23 Oct 2015 12:33:03 +0000 (UTC) (envelope-from ganael.laplanche@corp.ovh.com) Received: from mo174.mail-out.ovh.net (mo174.mail-out.ovh.net [178.32.228.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 267CC314 for ; Fri, 23 Oct 2015 12:33:02 +0000 (UTC) (envelope-from ganael.laplanche@corp.ovh.com) Received: from ex2.OVH.local (corp.ovh.com [5.196.251.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mo174.mail-out.ovh.net (Postfix) with ESMTPS id ACD6FFF80A8; Fri, 23 Oct 2015 14:32:51 +0200 (CEST) Received: from desk533202.ovh.net (5.196.2.34) by ex2.OVH.local (172.16.7.2) with Microsoft SMTP Server (TLS) id 15.1.225.42; Fri, 23 Oct 2015 14:32:51 +0200 From: Ganael Laplanche Organization: OVH To: Eric McCorkle Subject: Re: EFI/ZFS Update: successful tests, need more complex vdevs Date: Fri, 23 Oct 2015 14:32:50 +0200 User-Agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; ) CC: References: <56211825.3080403@metricspace.net> <201510220733.32997.ganael.laplanche@corp.ovh.com> <562A17A3.6090803@metricspace.net> In-Reply-To: <562A17A3.6090803@metricspace.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: quoted-printable Message-ID: <201510231432.50315.ganael.laplanche@corp.ovh.com> X-Originating-IP: [5.196.2.34] X-ClientProxiedBy: cas02.OVH.local (172.16.1.2) To ex2.OVH.local (172.16.7.2) X-Ovh-Tracer-Id: 2951265135050668584 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekhedrtdeiucetufdoteggodftvfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Oct 2015 12:33:03 -0000 On Friday, October 23, 2015 01:18:59 PM Eric McCorkle wrote: > This is a patch pulled fresh from my /usr/src after an svn update. > Therefore, it should represent a patch against the current head Thanks Eric ! Is that patch just an update (regarding -CURRENT src tree) or are there=20 technical modifications too ? =2D-=20 Gana=EBl LAPLANCHE