Date: Mon, 15 Mar 1999 00:20:34 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Bill Paul <wpaul@skynet.ctr.columbia.edu> Cc: hackers@FreeBSD.ORG Subject: Re: Fifabit ethernet -- what am I doing wrong? Message-ID: <199903150820.AAA96533@apollo.backplane.com> References: <199903142036.PAA24852@skynet.ctr.columbia.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
:Okay, here's an update. I've been reading all the replies to this :thread, but it turns out that my main problem is that, as expected, :I was screwing something up. : :The Tigon has a PCI state register which lets you configure several :aspects operation. Two of the parameters are PCI read max and PCI write :max, which force termination of PCI reads or writes at a specified :boundaries 4 bytes, 16, 32, 64, 128, 256, 1K. I had originally set :the read and write max values for 32. It turns out that disabling :these settings (by making them 0) yields _much_ better performance. 32 bytes is only 4 PCI clocks -- a serious waste of PCI burst bandwidth. However, you should get good results with 128, 256, or 1K. You don't want completely unlimited unless you don't mind the card completely hogging the PCI bus for long periods of time. :ring fast enough). Normally the Tigon has both the PCI read and write :DMA channels active at the same time, but you can force only one to :be active at a time by setting a bit in the operating mode register. :The manual recommends _not_ doing this, but setting it yielded yet :another jump in performance. That's very interesting. Perhaps it is trying to interleave read and write requests and is blowing the burst transfers in so doing. You ought to be able to mess around with the burst length such that you can leave the DMA set for simultaniously-enabled operation. :Right now I can transmit UDP packets at around 55MB/sec with the :normal MTU and can get 91MB/sec by setting the MTU to 9000 (using :jumbo frames). TCP speed has improved too, but not quite so much as I :expected (I can get 40MB/sec with a window size of 64K and normal :MTU, 66MB/sec with jumbo frames). I still need to experiment with :the various tuning options though. I also haven't really tried :checksum offloading yet. : :-Bill These are impressive numbers. If you can increase the size of the TX and RX ring, you may be able to get even better performance. If you can avoid doing *any* PCI I/O in the heavy-load rx/tx ring case you can bump up performance another notch. That is, if the card can continue to go around the ring ( assuming you are able to process requests quickly enough such that the card never hits a 'ring full' condition and auto-disable it's DMA ), then all you may have to touch is main memory. I'm not familiar with this particular card so these sorts of features might not even exist :-( Any PCI I/O op you do will stall the cpu significantly. This is the same sort of problem that limits SCSI transactional capabilities. -Matt Matthew Dillon <dillon@backplane.com> :-- :============================================================================= :-Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199903150820.AAA96533>