Date: Wed, 17 Feb 1999 11:51:18 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Kevin Day <toasty@home.dragondata.com> Cc: dyson@iquest.net, tlambert@primenet.com, mike@smith.net.au, hackers@FreeBSD.ORG Subject: Re: vm_page_zero_fill Message-ID: <199902171951.LAA10456@apollo.backplane.com> References: <199902171902.NAA25290@home.dragondata.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:The system I'm working on is a embedded, highly graphical 2D/3D product. :These systems will not be connected to the internet, nor will anyone have :keyboard/telnet/terminal/whatever access to them. They're about as secure as :they're going to get, so my concerns are mostly speed over security. : :In looking with some logic analyzers, we're seeing that we're nearly out of :PCI bandwidth, and we're hitting the memory very hard too. 99% of our run :time is spent ferrying data from ram into the graphic device. : :Because of the nature of the product, we're needing more and more :'real-time' like operation. The delay from when a user does something, until :... : :After things still being slower than I wanted, I pulled out a logic :analyzer. In watching memory accesses on the analyzer, we saw a lot of :zeroing going on, especially after exec()'ing another application. (This :... : :Currently, the time spent loading/preparing the new application is a bit :long, so I was looking at ways to shrink that down. That's where this Ahh. A couple of things. First, I presume that the amount of memory in the machine is not an issue... that you have enough to hold all the programs pretty much resident. In that case, simply preload the executables. That is, rather then take the latency hit when the user hits a button, take the latency hit when the user is idle and just tell the program to 'go' ( through a pipe ) when the user hits the button. Second, if you aren't already using a Xeon with its largest L2 cache configuration, you should probably be using a Xeon with its largest L2 cache configuration. Intel cpu's tend to fall on their face with DATA-memory-intensive applications due to their undersized caches. The undersized cache works ok for instructions because instructions are pretty compact, but it does not work well for data. If the box you are using does not have a 100MHz memory bus, you need to get one that does. :While I don't want to get accused of not trying to figure this one out on my :own.... Suppose I mmap a large (2MB or more) file. Should any zero'ing be :going on when I touch those pages for the first time? From the analyzer, it :looks like it's zeroing pages before putting what it read from the disk into :them, but as you know, figuring out what's really going on by watching a :logic analyzer is a form of witchcraft... If this is the case, turning this :off would greatly help me. :) It should not be zeroing pages before doing full reads into them. That is pretty well optimized, usually. Third, Memory->PCI transfers are best done with DMA ( as you already know ). For a frame store, you can eek out additiona l PCI bus speed by messing with the burst transfer length ( especially if the cpu is not heavily involved and can afford to stall a little more ). You should be able to push 120 MBytes/sec on a PCI bus by tuning the DMA burst. The PCI card should have a FIFO big enough to accomodate the burst, too. If you do a large transfer to a PCI card's frame buffer with memcpy() ( or equivalent ), you eat double the memory bandwidth plus blow away the data cache on the cpu. Fourth, if you are doing direct frame store from disk to a PCI card, you may wish to consider building a custom piece of hardware / firmware to actually use the SCSI bus to transfer the data directly ( i.e. put the frame store *on* the SCSI bus and have it master the data directly from the drives without host intervention ). This is a rather more complex solution. Fifth - double-wide (64 bit wide) PCI busses or AGP busses. AGP can certainly be done on a PC. I'm not sure what is available in regards to 64 bit PCI busses. However, both these options are departures from the norm and may not be cost effective. -Matt Matthew Dillon <dillon@backplane.com> :(If I'm not being clear enough, imagine mmap'ing a movie, and memcpy'ing it :into a frame buffer at 60fps, to get an idea of the kind of data I'm going :through) : :I hope this sort of explained my application, although I'm sure there are :arguments either way if this is really going to help me or not. : :Thanks again, : :Kevin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199902171951.LAA10456>