From owner-svn-src-head@freebsd.org  Tue Sep  3 14:06:52 2019
Return-Path: <owner-svn-src-head@freebsd.org>
Delivered-To: svn-src-head@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id C9158DD074;
 Tue,  3 Sep 2019 14:06:51 +0000 (UTC)
 (envelope-from yuripv@freebsd.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2610:1c1:1:6074::16:84])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (4096 bits) client-digest SHA256)
 (Client CN "freefall.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 46N7zv1Ysrz4PyM;
 Tue,  3 Sep 2019 14:06:51 +0000 (UTC)
 (envelope-from yuripv@freebsd.org)
Received: by freefall.freebsd.org (Postfix, from userid 1452)
 id 193A11AC79; Tue,  3 Sep 2019 14:06:19 +0000 (UTC)
X-Original-To: yuripv@localmail.freebsd.org
Delivered-To: yuripv@localmail.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
 (Client CN "mx1.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by freefall.freebsd.org (Postfix) with ESMTPS id 7704C1632B;
 Sun, 14 Apr 2019 13:37:54 +0000 (UTC)
 (envelope-from owner-src-committers@freebsd.org)
Received: from freefall.freebsd.org (freefall.freebsd.org [96.47.72.132])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (4096 bits) client-digest SHA256)
 (Client CN "freefall.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 285246AFED;
 Sun, 14 Apr 2019 13:37:54 +0000 (UTC)
 (envelope-from owner-src-committers@freebsd.org)
Received: by freefall.freebsd.org (Postfix, from userid 538)
 id 0DA9016329; Sun, 14 Apr 2019 13:37:54 +0000 (UTC)
Delivered-To: src-committers@localmail.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
 (Client CN "mx1.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by freefall.freebsd.org (Postfix) with ESMTPS id 74D0B16327
 for <src-committers@localmail.freebsd.org>;
 Sun, 14 Apr 2019 13:37:51 +0000 (UTC) (envelope-from bde@FreeBSD.org)
Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org
 [IPv6:2610:1c1:1:606c::19:3])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (4096 bits) client-digest SHA256)
 (Client CN "mxrelay.nyi.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4369C6AFEB;
 Sun, 14 Apr 2019 13:37:51 +0000 (UTC) (envelope-from bde@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 1C342212B9;
 Sun, 14 Apr 2019 13:37:51 +0000 (UTC) (envelope-from bde@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id x3EDbokg084986;
 Sun, 14 Apr 2019 13:37:50 GMT (envelope-from bde@FreeBSD.org)
Received: (from bde@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id x3EDboMO084985;
 Sun, 14 Apr 2019 13:37:50 GMT (envelope-from bde@FreeBSD.org)
Message-Id: <201904141337.x3EDboMO084985@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: bde set sender to bde@FreeBSD.org
 using -f
From: Bruce Evans <bde@FreeBSD.org>
To: src-committers@freebsd.org, svn-src-all@freebsd.org,
 svn-src-head@freebsd.org
Subject: svn commit: r346215 - head/lib/libvgl
X-SVN-Group: head
X-SVN-Commit-Author: bde
X-SVN-Commit-Paths: head/lib/libvgl
X-SVN-Commit-Revision: 346215
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: bulk
X-Loop: FreeBSD.org
Sender: owner-src-committers@freebsd.org
X-Rspamd-Queue-Id: 285246AFED
X-Spamd-Bar: --
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-2.98 / 15.00];
 local_wl_from(0.00)[freebsd.org];
 NEURAL_HAM_MEDIUM(-1.00)[-0.999,0];
 NEURAL_HAM_SHORT(-0.98)[-0.983,0];
 ASN(0.00)[asn:11403, ipnet:96.47.64.0/20, country:US];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]
Status: O
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.29
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head/>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
Date: Tue, 03 Sep 2019 14:06:52 -0000
X-Original-Date: Sun, 14 Apr 2019 13:37:50 +0000 (UTC)
X-List-Received-Date: Tue, 03 Sep 2019 14:06:52 -0000

Author: bde
Date: Sun Apr 14 13:37:50 2019
New Revision: 346215
URL: https://svnweb.freebsd.org/changeset/base/346215

Log:
  For writing and reading single pixels, avoid some pessimizations for
  depths > 8.  Add some smaller optimizations for these depths.  Use a
  more generic method for all depths >= 8, although this gives tiny
  pessimizations for these depths.
  
  For clearing the whole frame buffer, avoid the same pessimizations
  for depths > 8.  Add some larger optimizations for these depths.  Use
  an even more generic method for all depths >= 8 to give the optimizations
  for depths > 8 and a tiny pessimization for depth 8.
  
  The main pessimization was that old versions of bcopy() copy 1 byte at a
  time for all trailing bytes.  (i386 still does this.  amd64 now pessimizzes
  large sizes instead of small ones if the CPU supports ERMS.  dev/fb gets
  this wrong by mostly not using the bcopy() family or the technically correct
  bus space functions but by mostly copying 2 bytes at a time using an
  unoptimized loop without even volatile declarations to prevent the compiler
  rewriting it.)
  
  The sizes here are 1, 2, 3 or 4 bytes, so depths 9-16 were up to twice as
  slow as necessary and depths 17-24 were up to 3 times slower than necessary.
  Fix this (except depths 17-24 are still up to 2 times slower than necessary)
  by using (builtin) memcpy() instead of bcopy() and reorganizing so that the
  complier can see the small constant sizes.  Reduce special cases while
  reorganizing although this is slightly slower than adding special cases.
  The compiler inlining (and even -O2 vs -O0) makes little difference compared
  with reducing the number of accesses except on modern hardware it gives a
  small improvement.
  
  Clearing was also pessimized mainly by the extra accesses.  Fix it quite
  differently by creating a MEMBUF containing 1 line (in fast memory using
  a slow method) and copying this.  This is only slightly slower than reducing
  everything to efficient memset()s and bcopy()s, but simpler, especially
  for the segmented case.  This works for planar modes too, but don't use it
  then since the old method was actually optimal for planar modes (it works
  by moving the slow i/o instructions out of inner loops), while for direct
  modes the slow instructions were all in the invisible inner loop in bcopy().
  
  Use htole32() and le32toh() and some type puns instead of unoptimized
  functions for converting colors.  This optimization is mostly in the noise.
  libvgl is only supported on x86, so it could hard-code the assumption that
  the byte order is le32, but the old conversion functions didn't hard-code
  this.

Modified:
  head/lib/libvgl/simple.c

Modified: head/lib/libvgl/simple.c
==============================================================================
--- head/lib/libvgl/simple.c	Sun Apr 14 13:17:40 2019	(r346214)
+++ head/lib/libvgl/simple.c	Sun Apr 14 13:37:50 2019	(r346215)
@@ -33,6 +33,7 @@ __FBSDID("$FreeBSD$");
 
 #include <signal.h>
 #include <sys/fbio.h>
+#include <sys/endian.h>
 #include "vgl.h"
 
 static byte VGLSavePaletteRed[256];
@@ -44,96 +45,44 @@ static byte VGLSavePaletteBlue[256];
 #define min(x, y)	(((x) < (y)) ? (x) : (y))
 #define max(x, y)	(((x) > (y)) ? (x) : (y))
 
-static void
-color2mem(u_long color, byte *b, int len)
-{
-  switch (len) {
-  case 4:
-    b[3] = (color >> 24) & 0xff;
-    /* fallthrough */
-  case 3:
-    b[2] = (color >> 16) & 0xff;
-    /* fallthrough */
-  case 2:
-    b[1] = (color >> 8) & 0xff;
-    /* fallthrough */
-  case 1:
-  default:
-    b[0] = color & 0xff;
-    break;
-  }
-
-  return;
-}
-
-static u_long
-mem2color(byte *b, int len)
-{
-  u_long color = 0;
-
-  switch (len) {
-  case 4:
-    color |= (b[3] & 0xff) << 24;
-    /* fallthrough */
-  case 3:
-    color |= (b[2] & 0xff) << 16;
-    /* fallthrough */
-  case 2:
-    color |= (b[1] & 0xff) << 8;
-    /* fallthrough */
-  case 1:
-  default:
-    color |= (b[0] & 0xff);
-    break;
-  }
-
-  return color;
-}
-
 void
 VGLSetXY(VGLBitmap *object, int x, int y, u_long color)
 {
   int offset;
-  byte b[4];
 
   VGLCheckSwitch();
   if (x>=0 && x<object->VXsize && y>=0 && y<object->VYsize) {
     if (object->Type == MEMBUF ||
         !VGLMouseFreeze(x, y, 1, 1, 0x80000000 | color)) {
+      offset = (y * object->VXsize + x) * object->PixelBytes;
       switch (object->Type) {
+      case VIDBUF8S:
+      case VIDBUF16S:
+      case VIDBUF24S:
+      case VIDBUF32S:
+        offset = VGLSetSegment(offset);
+        /* FALLTHROUGH */
       case MEMBUF:
-      switch (object->PixelBytes) {
-      case 2:
-        goto vidbuf16;
-      case 3:
-        goto vidbuf24;
-      case 4:
-        goto vidbuf32;
-      }
-      /* fallthrough */
       case VIDBUF8:
-        object->Bitmap[y*object->VXsize+x]=((byte)color);
-        break;
-      case VIDBUF8S:
-	object->Bitmap[VGLSetSegment(y*object->VXsize+x)]=((byte)color);
-	break;
       case VIDBUF16:
-vidbuf16:
       case VIDBUF24:
-vidbuf24:
       case VIDBUF32:
-vidbuf32:
-	color2mem(color, b, object->PixelBytes);
-        bcopy(b, &object->Bitmap[(y*object->VXsize+x) * object->PixelBytes],
-		object->PixelBytes);
+        color = htole32(color);
+        switch (object->PixelBytes) {
+        case 1:
+          memcpy(&object->Bitmap[offset], &color, 1);
+          break;
+        case 2:
+          memcpy(&object->Bitmap[offset], &color, 2);
+          break;
+        case 3:
+          memcpy(&object->Bitmap[offset], &color, 3);
+          break;
+        case 4:
+          memcpy(&object->Bitmap[offset], &color, 4);
+          break;
+        }
         break;
-      case VIDBUF16S:
-      case VIDBUF24S:
-      case VIDBUF32S:
-	color2mem(color, b, object->PixelBytes);
-	offset = VGLSetSegment((y*object->VXsize+x) * object->PixelBytes);
-	bcopy(b, &object->Bitmap[offset], object->PixelBytes);
-	break;
       case VIDBUF8X:
         outb(0x3c4, 0x02);
         outb(0x3c5, 0x01 << (x&0x3));
@@ -161,42 +110,38 @@ static u_long
 __VGLGetXY(VGLBitmap *object, int x, int y)
 {
   int offset;
-  byte b[4];
   int i;
   u_long color;
   byte mask;
 
+  offset = (y * object->VXsize + x) * object->PixelBytes;
   switch (object->Type) {
+    case VIDBUF8S:
+    case VIDBUF16S:
+    case VIDBUF24S:
+    case VIDBUF32S:
+      offset = VGLSetSegment(offset);
+      /* FALLTHROUGH */
     case MEMBUF:
-    switch (object->PixelBytes) {
-    case 2:
-      goto vidbuf16;
-    case 3:
-      goto vidbuf24;
-    case 4:
-      goto vidbuf32;
-    }
-    /* fallthrough */
     case VIDBUF8:
-      return object->Bitmap[((y*object->VXsize)+x)];
-    case VIDBUF8S:
-      return object->Bitmap[VGLSetSegment(y*object->VXsize+x)];
     case VIDBUF16:
-vidbuf16:
     case VIDBUF24:
-vidbuf24:
     case VIDBUF32:
-vidbuf32:
-      bcopy(&object->Bitmap[(y*object->VXsize+x) * object->PixelBytes],
-		b, object->PixelBytes);
-      return (mem2color(b, object->PixelBytes));
-    case VIDBUF16S:
-    case VIDBUF24S:
-    case VIDBUF32S:
-	offset = VGLSetSegment((y*object->VXsize+x) * object->PixelBytes);
-	bcopy(&object->Bitmap[offset], b, object->PixelBytes);
-
-      return (mem2color(b, object->PixelBytes));
+      switch (object->PixelBytes) {
+      case 1:
+        memcpy(&color, &object->Bitmap[offset], 1);
+        return le32toh(color) & 0xff;
+      case 2:
+        memcpy(&color, &object->Bitmap[offset], 2);
+        return le32toh(color) & 0xffff;
+      case 3:
+        memcpy(&color, &object->Bitmap[offset], 3);
+        return le32toh(color) & 0xffffff;
+      case 4:
+        memcpy(&color, &object->Bitmap[offset], 4);
+        return le32toh(color);
+      }
+      break;
     case VIDBUF8X:
       outb(0x3ce, 0x04); outb(0x3cf, x & 0x3);
       return object->Bitmap[(unsigned)(VGLAdpInfo.va_line_width*y)+(x/4)];
@@ -539,63 +484,38 @@ VGLFilledEllipse(VGLBitmap *object, int xc, int yc, in
 void
 VGLClear(VGLBitmap *object, u_long color)
 {
+  VGLBitmap src;
   int offset;
   int len;
-  int i, total = 0;
-  byte b[4];
+  int i;
 
   VGLCheckSwitch();
   if (object->Type != MEMBUF)
     VGLMouseFreeze(0, 0, object->Xsize, object->Ysize, color);
   switch (object->Type) {
   case MEMBUF:
-  switch (object->PixelBytes) {
-  case 2:
-    goto vidbuf16;
-  case 3:
-    goto vidbuf24;
-  case 4:
-    goto vidbuf32;
-  }
-  /* fallthrough */
   case VIDBUF8:
-    memset(object->Bitmap, (byte)color, object->VXsize*object->VYsize);
-    break;
-
   case VIDBUF8S:
-    for (offset = 0; offset < object->VXsize*object->VYsize; ) {
-      VGLSetSegment(offset);
-      len = min(object->VXsize*object->VYsize - offset,
-		VGLAdpInfo.va_window_size);
-      memset(object->Bitmap, (byte)color, len);
-      offset += len;
-    }
-    break;
   case VIDBUF16:
-vidbuf16:
-  case VIDBUF24:
-vidbuf24:
-  case VIDBUF32:
-vidbuf32:
-    color2mem(color, b, object->PixelBytes);
-    total = object->VXsize*object->VYsize*object->PixelBytes;
-    for (i = 0; i < total; i += object->PixelBytes)
-      bcopy(b, object->Bitmap + i, object->PixelBytes);
-    break;
-
   case VIDBUF16S:
+  case VIDBUF24:
   case VIDBUF24S:
+  case VIDBUF32:
   case VIDBUF32S:
-    color2mem(color, b, object->PixelBytes);
-    total = object->VXsize*object->VYsize*object->PixelBytes;
-    for (offset = 0; offset < total; ) {
-      VGLSetSegment(offset);
-      len = min(total - offset, VGLAdpInfo.va_window_size);
-      for (i = 0; i < len; i += object->PixelBytes)
-        bcopy(b, object->Bitmap + (offset + i) % VGLAdpInfo.va_window_size,
-              object->PixelBytes);
-      offset += len;
-    }
+    src.Type = MEMBUF;
+    src.Xsize = object->Xsize;
+    src.VXsize = object->VXsize;
+    src.Ysize = 1;
+    src.VYsize = 1;
+    src.Xorigin = 0;
+    src.Yorigin = 0;
+    src.Bitmap = alloca(object->VXsize * object->PixelBytes);
+    src.PixelBytes = object->PixelBytes;
+    color = htole32(color);
+    for (i = 0; i < object->VXsize; i++)
+      bcopy(&color, src.Bitmap + i * object->PixelBytes, object->PixelBytes);
+    for (i = 0; i < object->VYsize; i++)
+      __VGLBitmapCopy(&src, 0, 0, object, 0, i, object->VYsize, 1);
     break;
 
   case VIDBUF8X: