Date: Tue, 28 Apr 2009 16:58:57 -0700 From: Julian Elischer <julian@elischer.org> To: Robert Noland <rnoland@FreeBSD.org> Cc: freebsd-hackers@freebsd.org, Julian Bangert <julidaoc@online.de>, Kevin Day <toasty@dragondata.com> Subject: Re: Question about adding flags to mmap system call / NVIDIA amd64 driver implementation Message-ID: <49F79841.9030702@elischer.org> In-Reply-To: <1240962328.2021.10.camel@wombat.2hip.net> References: <op.us35euemeer2kn@server2go> <EC226ED2-3EF6-4102-8186-6F7B68AFC809@dragondata.com> <1240962328.2021.10.camel@wombat.2hip.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Robert Noland wrote: > On Tue, 2009-04-28 at 16:48 -0500, Kevin Day wrote: >> On Apr 28, 2009, at 3:19 PM, Julian Bangert wrote: >> >>> Hello, >>> >>> I am currently trying to work a bit on the remaining "missing >>> feature" that NVIDIA requires ( http://wiki.freebsd.org/NvidiaFeatureRequests >>> or a back post in this ML) - the improved mmap system call. you might check with jhb (john Baldwin) as I think (from his p4 work) that he may be doing something in this area in p4. >>> For now, I am trying to extend the current system call and >>> implementation to add cache control ( the type of memory caching >>> used) . This feature inherently is very architecture specific- but >>> it can lead to enormous performance improvements for memmapped >>> devices ( useful for drivers, etc). I would do this at the user site >>> by adding 3 flags to the mmap system call (MEM_CACHE__ATTR1 to >>> MEM_CACHE__ATTR3 ) which are a single octal digit corresponding to >>> the various caching options ( like Uncacheable,Write Combining, >>> etc... ) with the same numbers as the PAT_* macros from i386/include/ >>> specialreg.h except that the value 0 ( PAT_UNCACHEABLE ) is replaced >>> with value 2 ( undefined), whereas value 0 ( all 3 flags cleared) is >>> assigned the meaning "feature not used, use default cache control". >>> For each cache behaviour there would of course also be a macro >>> expanding to the rigth combination of these flags for enhanced >>> useability. >>> >>> The mmap system call would, if any of these flags are set, decode >>> them and get a corresponding PAT_* value, perform the mapping and >>> then call into the pmap module to modify the cache attributes for >>> every page. >> Have you looked at mem(4) yet? >> >> Several architectures allow attributes to be associated with >> ranges of >> physical memory. These attributes can be manipulated via >> ioctl() calls >> performed on /dev/mem. Declarations and data types are to be >> found in >> <sys/memrange.h>. >> >> The specific attributes, and number of programmable ranges may >> vary >> between architectures. The full set of supported attributes is: >> >> MDF_UNCACHEABLE >> The region is not cached. >> >> MDF_WRITECOMBINE >> Writes to the region may be combined or performed out of >> order. >> >> MDF_WRITETHROUGH >> Writes to the region are committed synchronously. >> >> MDF_WRITEBACK >> Writes to the region are committed asynchronously. >> >> MDF_WRITEPROTECT >> The region cannot be written to. >> >> This requires knowledge of the physical addresses, but I believe >> that's probably already necessary for what it sounds like you're >> trying to accomplish. >> >> Back in the FreeBSD-3.0 days, I was writing a custom driver for an AGP >> graphics controller, and setting the MTRR flags for the exposed buffer >> was a definite improvement (200-1200% faster in most cases). > > This is MTRR, which is what we currently do, when we can. The issue is > that often times the BIOS maps ranges in a way that prevents us from > using MTRR. This is generally ideal for things like agp and > framebuffers when it works, since they have a specific physical range > that you want to work with. > > With PCI(E) cards it isn't as cut and dry... In the ATI and Nouveau > cases, we map scatter gather pages into the GART, which generally are > allocated using contigmalloc behind the scenes, so it is also possible > for it to work in that case. Moving forward, we may actually be mapping > random pages into and out of the GART (GEM / TTM). In those cases we > really don't have a large contiguous range that we could set MTRR on. > Intel CPUs are limited to 8 MTRR registers for the entire system also, > so that can become an issue quickly if you are trying to manipulate > several areas of memory. With PAT we can manipulate the caching > properties on a page level. PAT also allows for some overlap conditions > that MTRR won't, such as mapping a page write-combining on top on an > UNCACHEABLE MTRR. > > jhb@ has started some work on this, since I've been badgering him about > this recently as well. > > robert. > >> -- Kevin >> >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49F79841.9030702>