Date: Mon, 9 Apr 2018 08:22:13 -0400 From: Yoshihiro Ota <ota@j.email.ne.jp> To: Bruce Evans <brde@optusnet.com.au> Cc: Dimitry Andric <dim@FreeBSD.org>, Konstantin Belousov <kostikbel@gmail.com>, current@FreeBSD.org, amd64@FreeBSD.org Subject: Re: i386 4/4 change Message-ID: <20180409082213.1ca1fc0cd589bafa98a4fead@j.email.ne.jp> In-Reply-To: <20180401151124.G893@besplex.bde.org> References: <20180331102901.GN1774@kib.kiev.ua> <20180401004833.L3296@besplex.bde.org> <3FAD36FD-FA90-49F6-9141-B9CCCCA2BF00@FreeBSD.org> <20180401151124.G893@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
What is the current status of this? Based on SVN history, it doesn't look https://reviews.freebsd.org/D14633 has been merged/commited yet. I can try after I recover from disk crahes. I expect I need few more days to restore. Will this retire PAE option? Thanks, Hiro On Sun, 1 Apr 2018 17:05:03 +1000 (EST) Bruce Evans <brde@optusnet.com.au> wrote: > > On Sun, 1 Apr 2018, Dimitry Andric wrote: > > > On 31 Mar 2018, at 17:57, Bruce Evans <brde@optusnet.com.au> wrote: > >> > >> On Sat, 31 Mar 2018, Konstantin Belousov wrote: > >> > >>> the change to provide full 4G of address space for both kernel and > >>> user on i386 is ready to land. The motivation for the work was to both > >>> mitigate Meltdown on i386, and to give more breazing space for still > >>> used 32bit architecture. The patch was tested by Peter Holm, and I am > >>> satisfied with the code. > >>> > >>> If you use i386 with HEAD, I recommend you to apply the patch from > >>> https://reviews.freebsd.org/D14633 > >>> and report any regressions before the commit, not after. Unless > >>> a significant issue is reported, I plan to commit the change somewhere > >>> at Wed/Thu next week. > >>> > >>> Also I welcome patch comments and reviews. > >> > >> It crashes at boot time in getmemsize() unless booted with loader which > >> I don't want to use. > > > For me, it at least compiles and boots OK, but I'm one of those crazy > > people who use the default boot loader. ;) > > I found a quick fix and sent it to kib. (2 crashes in vm86 code for memory > sizing. This is not called if loader is used && the system has smap. Old > systems don't have smap, so they crash even if loader is used.) > > > I haven't yet run any performance tests, I'll try building world and a > > few large ports tomorrow. General operation from the command line does > > not feel "sluggish" in any way, however. > > Further performance tests: > - reading /dev/zero using tinygrams is 6 times slower > - read/write of a pipe using tinygrams is 25 times slower. It also gives > unexpected values in wait statuses on exit, hopefully just because the > bug is in the test program is exposed by the changed timing (but later > it also gave SIGBUS errors). This does a context switch or 2 for every > read/write. It now runs 7 times slower using 2 4.GHz CPUs than in > FreeBSD-5 using 1 2.0 GHz CPU. The faster CPUs and 2 of them used to > make it run 4 times faster. It shows another slowdown since FreeBSD-5, > and much larger slowdowns since FreeBSD-1: > > 1996 FreeBSD on P1 133MHz: 72k/s > 1997 FreeBSD on P1 133MHz: 44k/s (after dyson's opts for large sizes) > 1997 Linux on P1 133MHz: 93k/s (simpler is faster for small sizes) > 1999 FreeBSD on K6 266MHz: 129k/s > 2018 FBSD-~5 on AthXP 2GHz: 696k/s > 2018 FreeBSD on i7 2x4GHz: 2900k/s > 2018 FBSD4+4 on i7 2x4GHz: 113k/s (faster than Linux on a P1 133MHz!!) > > Netblast to localhost has much the same 6 times slowness as reading > /dev/zero using tinygrams. This is the slowdown for syscalls. > Tinygrams are hard to avoid for UDP. Even 1500 bytes is a tinygram > for /dev/zero. Without 4+4, localhost is very slow because it does > a context switch or 2 for every packet (even with 2 CPUs when there is > no need to switch). Without 4+4 this used to cost much the same as the > context switches for the pipe benchmark. Now it costs relatively much > less since (for netblast to localhost) all of the context switches are > between kernel threads. > > The pipe benchmark uses select() to avoid busy-waiting. That was good > for UP. But for SMP with just 2 CPUs, it is better to busy-wait and > poll in the reader and writer. > > netblast already uses busy-waiting. It used to be a bug that select() > doesn't work on sockets, at least for UDP, so blasting using busy-waiting > is the only possible method (timeouts are usually too coarse-grained to > go as fast as blasting, and if they are fine-grained enough to go fast > then they are not much better than busy-waiting with time wasted for > setting up timeouts). SMP makes this a feature. It forces use of busy- > waiting, which is best if you have a CPU free to run it and this method > doesn't take to much power. > > Context switches to task queues give similar slowness. This won't be > affected by 4+4 since task queues are in the kernel. I don't like > networking in userland since it has large syscall and context switch > costs. Increasing these by factors of 6 and 25 doesn't help. It > can only be better by combining i/o in a way that the kernel neglects > to do or which is imposed by per-packet APIs. Slowdown factors of 6 > or 25 require the combined i/o to be 6 or 25 larger to amortise the costs. > > Bruce > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180409082213.1ca1fc0cd589bafa98a4fead>