From owner-freebsd-hackers Tue Jun 3 19:15:24 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id TAA10801 for hackers-outgoing; Tue, 3 Jun 1997 19:15:24 -0700 (PDT) Received: from spoon.beta.com (root@[199.165.180.33]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id TAA10790 for ; Tue, 3 Jun 1997 19:15:19 -0700 (PDT) Received: from spoon.beta.com (mcgovern@localhost [127.0.0.1]) by spoon.beta.com (8.8.5/8.8.5) with ESMTP id WAA21318 for ; Tue, 3 Jun 1997 22:29:07 -0400 (EDT) Message-Id: <199706040229.WAA21318@spoon.beta.com> To: hackers@freebsd.org Subject: Need help with fastest way to move data... Date: Tue, 03 Jun 1997 22:29:06 -0400 From: "Brian J. McGovern" Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I've completed a prototype device driver for the Cyclades Cyclom-Z card, and I'm hoping to make it available to everyone by the end of the week. Unfortunately, its under-performing a 16550 UART, and I suspect that its due to how I'm doing I/O with the card. Currently, I'm moving a byte at a time to the card (ie - read - I loop, moving a byte at a time until the Xmit/Recv buffer is full/empty for each TX or RX interrupt I get). Obviously, on a PCI bus (not to mention internally), this is terribly inefficient. With the 32 bit bus, I'm hoping to be able to move 4 characters at a time, and thereby increase performace of this chunk of code by 3-4 times. The question I have is what is the best way to do this? I'm having some problems with q_to_b() locking up the system (I'm not quite sure why, I just know it is), but I'm not even sure if this is the best way to move the data. A port on a card (there are 8 ports per card) has a 4KB receive buffer, and a 2KB transmit buffer. Both are "ring buffers", with transmit and receive pointer pairs (head and tail) in a seperate structure just ahead of the ring buffers. Due to the nature of the clists that I'm moving the data in and out of, I'm making no assumption as to its positioning in memory. I'm also under the assumption that things will be "most efficient" if the destination on the PCI bus (and locally) is aligned on a 32-bit boundary. Therefore, what I was considering doing was moving a character at a time, until the buffer offset and'ed with 0x3 was 0 (ie - offset & 0x03), which would mean that my buffer pointer was now on a long boundary. If there are less than 4 I would then use q_to_b() to move remaining bytes (up to 4) in to an unsigned long (ie - something like: bytesmvd = q_to_b(&tp->t_outq, (unsigned char *)&longint, MIN(bytes_left, 4)); Then, I'd transfer the 4 bytes with something like: memcpy((void *)buffer_base + offset, (void *)longint, bytesmvd); Receive would be something similar. Move a byte at a time to l_rint until offset & 0x03 was 0, then memcpy up to 4 bytes at a time to the long int, then loop for one to N bytes, passing each to l_rint in turn. Now, the big question... Is this the most efficient way to do this? Does memcpy and the like work best on long-aligned values. Would it be even MORE efficient to use larger structures, given sufficient data to move? I'm curious to hear comments, and see if anyone has any truely cool ideas. -Brian