Date: Fri, 4 Sep 1998 19:26:53 -0400 (EDT) From: Bill Paul <wpaul@skynet.ctr.columbia.edu> To: hackers@FreeBSD.ORG Subject: Questions for networking gurus Message-ID: <199809042326.TAA13218@skynet.ctr.columbia.edu>
next in thread | raw e-mail | index | archive | help
I'm trying to work out a packet reception strategy for the RealTek ethernet controller. The RealTek chip is a bus master, which normally is a good thing because with the right sort of interface you can eliminate buffer copies within the driver code. Unfortunately, the RealTek is designed in such a way that it's almost impossible to avoid buffer copies even with bus master capability. I think there is a way to do it, but I'm not sure if it will work correctly as it depends a lot on how the higher protocol layers work, and I'm a bit fuzzy on some of the details. The RealTek chip works roughly as follows. The driver allocates a buffer area of 8K, 16K, 32K or 64K in size, depending on the setting of certain bits in the receive config register. This buffer resides in system memory, independent of the chip's packer FIFO memory. The driver then gives the chip the base address of this buffer area and activates the receiver. When the chip receives a packet, it writes a 32-bit header value into the driver's buffer area and then copies the packet immediately after the header. (The header contains the packet length and some status bits). The chip rounds up its address pointer to a 32-bit boundary and then writes the next header and packet, and so on until it hits the end of the buffer area. The driver is supposed to pass the packets along to the higher level protocols and advance a special register to indicate how much of the receive buffer has been processed. It is possible for the chip to wrap a packet from the end of the receive area back to the beginning again assuming the driver has freed up space at the beginning of the buffer area. This is sort of an oddball mechanism given the way other devices work. With most bus master chips, you have a descriptor mechanism that allows the driver to pre-allocate individual packet buffers (in our case mbuf clusters) and provide the buffer addresses to the chip. The chip can then DMA incoming packets straight into the mbufs, and the driver can just pass them along. This completely eliminates the need for buffer copies, in exchange for some wasted space since an mbuf cluster buffer is 2048 bytes in size and an ethernet frame is never larger than about 1500 bytes. You can't do that with the RealTek chip though. Here, you have only one contiguous receive buffer, and you can't predict the sizes or offsets of the packets within the buffer. The simplest way to deal with this is to use m_devget() to copy packet data out of the receive buffer and into mbufs and hand those off to the upper layers. This is silly though, because it defeats the purpose of the bus master capability, which is to avoid having the CPU perform buffer copies. One possibly alternative was proposed to me recently by Matthew Dodd, which is to use the external data pointers in mbufs. The idea works like this: - The chip receives a packet and copies it into the RX buffer region. - The driver allocates a single mbuf and sets the M_EXT flag in its header. - The driver determines the start address of the packet within the buffer region and puts that address into the external data pointer of the mbuf. - The driver sets the mbuf's data length to the length of the packet and specifies a free() routine to use to deallocate the external storage when the mbuf is released. (We don't want to actually free the buffer space though, so the free routine can be a no-op.) - Lastly, the driver passes the mbuf to ether_input() for processing. This avoids a buffer copy by using an mbuf to 'encapsulate' the packet data as an external data region, but it creates a problem: once the driver ties a portion of the receive buffer to an mbuf, it can't allow that portion of the buffer to be overwritten by the chip's DMA engine until the mbuf has been released. Otherwise, the packet data will be corrupted while another part of the kernel is fiddling with it. It may be possible to get around this by pre-allocating several receive buffer areas: if all of the space in the first region has been tied to mbufs and remains unreleased, the driver can reload the chip's receive buffer address register with ther address of another buffer. Assuming that all of the space in the first region will be released eventually, it should be possible to provide room for the chip to DMA new frames while allowing the protocols time to process existing frames in previous buffers. However this brings up the following questions: 1) Does using external data regions with mbufs like this actually work? I know it works with mbuf clusters, but that's sort of a special case. I remember reading somewhere, possibly in TCP/IP Illustrated Vol. 2, that there were bugs that prevented this from working correctly 100% of the time except for the mbuf cluster case. Has this been fixed, or are there still pitfalls? 2) What's the longtest time than an mbuf chain with received packet data will survive inside the kernel? The driver has to allocate enough memory so that it can continue handling data from the chip while waiting for previous buffers to be freed by the protocols, but if an mbuf can get hung up inside the protocols for a very long time (or worse, be locked indefinitely), then the buffer allocation would be ridiculously large. This would outweight the benefit of avoiding copies. So, does this scheme sound sensible or should I just swallow my pride and settle for using m_devget(). It would be nice to find a way to actually squeeze some decent performance out of this gawdawful device just to spite the designers. If anybody has tried to do something like this before, or is familiar with the guts of the BSD networking code, I'd appreciate any insights. -Bill -- ============================================================================= -Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu Work: wpaul@ctr.columbia.edu | Center for Telecommunications Research Home: wpaul@skynet.ctr.columbia.edu | Columbia University, New York City ============================================================================= "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" ============================================================================= To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809042326.TAA13218>