From owner-svn-src-head@FreeBSD.ORG Sun May 11 17:43:41 2014 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 81326B9F; Sun, 11 May 2014 17:43:41 +0000 (UTC) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2D17E2645; Sun, 11 May 2014 17:43:40 +0000 (UTC) Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 6F8403C0D08; Mon, 12 May 2014 03:43:31 +1000 (EST) Date: Mon, 12 May 2014 03:43:24 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Nathan Whitehorn Subject: Re: svn commit: r265864 - head/sys/dev/vt/hw/ofwfb In-Reply-To: <536F9864.9080606@freebsd.org> Message-ID: <20140512015015.G1959@besplex.bde.org> References: <201405110158.s4B1wvFA072381@svn.freebsd.org> <20140511133517.N1100@besplex.bde.org> <536F9864.9080606@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=eojmkOZX c=1 sm=1 tr=0 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=5bbp3XeTFoQA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=tRt_y7I4hvYYh4juQrIA:9 a=B1gl18DMszdvRfAL:21 a=YydNsdlel4AjIjML:21 a=CjuIK1q_8ugA:10 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Bruce Evans X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 May 2014 17:43:41 -0000 On Sun, 11 May 2014, Nathan Whitehorn wrote: > On 05/10/14 23:51, Bruce Evans wrote: >> On Sun, 11 May 2014, Nathan Whitehorn wrote: >> Only 10% slower? Bitmapped mode with 256 colors is inherently 4 times >> slower for an 8x8 font (8 bytes/char instead 2) of and 8 times slower for >> an 8x16 font. That's without any I/O pathology. Perhaps you are comparing >> with a syscons that is already very slow due to the hardware not supporting >> text mode. >> >> However, syscons has buffering that should limit this problem. > > This is indeed comparison to syscons in bitmap mode. PowerPC has no VGA text > mode, so that's the best we could do. That using newcons's bitmap console > instead of syscons's bitmap console almost tripled my boot time, however, was > totally unreasonable and needed fixing. Whatever buffering syscons may have > beyond what newcons has is at most a 10% thing. Really? The slowdown would need to be a factor of several hundred to be noticeable and several thousand to be painful. Syscons actually uses a slow mode with non-delayed update for kernel messages. This means that it will it must be many times slower than the best case. Calculations below. >> A correctly-implemented console driver doesn't have itty-bitty hardware >> i/o like the old version of this or itty-bitty buffering like the changed >> version. > > There are many deficiencies in the general approach being used here. I'm > trying to patch it just to work for the time being so that it isn't a huge > regression in console performance compared to syscons. Hopefully, the general > architectural issues -- which you outline well below -- get solved in due > course. This patch at least fixes the immediate problem. Some more details on the timing... >> Some old screen benchmarks. The benchmark is basically to write lines >> of the screen width and scroll. I stopped updating this often about 15 >> years ago when frame buffers and CPUs became fast enough. But it appears >> that software bloat and design errors have caught up. It is difficult to generate data fast enough for syscons to be the bottleneck. My simple test program does 1-char writes so the syscall overhead dominates. I must have used a variation of it in the old tests, but can't remember what. So I reran some tests using: dd if=/dev/zero bs=1000000 count=many | tr '\000' c | time dd bs=10000000 (or with bs reduced to time slow cases). Here tr is barely fast enough to not be a bottleneck. The final dd was needed needed to reblock, else tr sometimes does 1-char writes. c is either 'p' to test normal output, or '\012' to test scrolling. >> % machine video O/S where real user sys >> speed >> % --------- ------- -------------- --------- ----- ---- ----- >> ----- >> % A/2223 PCI R9200SE FreeBSD-5.2m onscreen-o .026 0.00 .026 >> 76.9 >> % A/2223 PCI R9200SE FreeBSD-5.2m offscreen-o .026 0.00 .026 >> 76.9 >> % A/2223 PCI R9200SE FreeBSD-5.2m onscreen .031 0.00 .031 >> 64.5 >> % A/2223 PCI R9200SE FreeBSD-5.2m offscreen .031 0.00 .031 >> 64.5 >> >> An 11 year old system. >> ... >> I forget the units for these measurements, except that the speed column >> gives a bandwidth in MB/sec. I don't remember if this is for write(2) >> bandwidth or is related to frame buffer bandwidth). Interpret them as >> relative. The speed is for write(2) bandwidth, times 2 for character+attribute. It is for writing p's. It is close to the frame buffer bandwidth. You can pessimize this speed by a factor of 1000 and still have a usable console. More than 30 thousand characters/sec instead of more than 30 million. I find 11520 for a serial tty noticeably slow, but useable. >> On a system similar to the above, syscons scrolls at 50000 lines/sec. >> Non-virtually, this would require a frame buffer bandwidth of 200MB/sec, >> which is several times faster than possible. Since syscons only does >> a direct update for bytes written, it needs only about 1/25 of this >> bandwidth or 800KB/sec. This is not quite in the noise compared with >> a frame buffer bandwidth of 60.2MB/sec. Actually, on a similar system, syscons scrolls at 1.04 million lines/sec with -opost and at 0.94 million lines/sec with opost (for printing lots of newlines). This must be mostly virtual, with most steps not done in the frame buffer. If it were physical, then the frame buffer bandwidth for the -opost case would be 8.3GB/sec. The main memory bandwidth for this is relatively trivial, since writing 1 newline only involves clearing 1 line in the history buffer and not moving a screenful to the frame buffer. It is 166MB/sec. >> % K6/233 PCI S3/Virge linux-2.1.63 offscreen-o 0.97 0.00 0.97 >> 2.06 >> % K6/233 PCI S3/Virge linux-2.1.63 onscreen-o 1.03 0.00 1.03 >> 1.93 >> % K6/233 PCI S3/Virge linux-2.1.63 offscreen 1.18 0.00 1.18 >> 1.69 I tried a newer Linux (ttylinux 2.6.30.5) console on newer hardware similar to the above. For normal output, its speed was 2.7 million characters/sec (double this to compare with the speed column above). -opost didn't make much difference for normal output. For scrolling, its speed was 22 thousand lines/sec with opost and 83 thousand lines/sec with -opost. I think it writes every line to the frame buffer, but reduces the slowness of this by using hardware scrolling. Calculations for direct updates in kernel mode: at best they go at the frame buffer bandwidth, with the whole screen copied for each scroll (hardware scrolling would help here). For 80x25, this gives 4KB to move per newline and relatively few other slow accesses. So the scrolling speed is about 20 thousand lines/sec for an 80MB/sec frame buffer. Almost the same as Linux with opost. Divide by 8 for pixel mode with 8x16 256 colors. Still plenty for kernel messages, but there is no longer a factor of hundreds to spare. Bruce