From owner-freebsd-hackers@FreeBSD.ORG Thu Jun 29 11:12:35 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1B57916A415 for ; Thu, 29 Jun 2006 11:12:35 +0000 (UTC) (envelope-from CZander@nvidia.com) Received: from HQEMGATE02.nvidia.com (hqemgate02.nvidia.com [216.228.112.143]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7913343D53 for ; Thu, 29 Jun 2006 11:12:34 +0000 (GMT) (envelope-from CZander@nvidia.com) Received: from hqemfe02.nvidia.com (Not Verified[172.16.227.92]) by HQEMGATE02.nvidia.com id ; Thu, 29 Jun 2006 04:15:08 -0700 Received: from nvidia.com ([172.16.228.84]) by hqemfe02.nvidia.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 29 Jun 2006 04:12:18 -0700 Date: Thu, 29 Jun 2006 13:12:31 +0200 From: Christian Zander To: freebsd-hackers@freebsd.org Message-ID: <20060629111231.GA692@wolf.nvidia.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-NVConfidentiality: public X-OriginalArrivalTime: 29 Jun 2006 11:12:18.0642 (UTC) FILETIME=[E28ECB20:01C69B6C] Subject: NVIDIA FreeBSD kernel feature requests X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Christian Zander List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2006 11:12:35 -0000 Hi all, NVIDIA has been looking at ways to improve its graphics driver for the FreeBSD i386 platform, as well as investigating the possibility of adding support for the FreeBSD amd64 platform, and identified a number of obstacles. Some progress has been made to resolve them, and NVIDIA would like to summarize the current status. We would also like to thank John Baldwin and Doug Rabson for their valuable help. This summary makes an attempt to describe the kernel interfaces needed by the NVIDIA FreeBSD i386 graphics driver to achieve feature parity with the Linux/Solaris graphics drivers, and/or required to make support for the FreeBSD amd64 platform feasible. It also describes some of the technical difficulties encountered by NVIDIA during the FreeBSD i386 graphics driver's development, how these problems have been worked around and what could be done to solve them better. While the following is focused on the NVIDIA FreeBSD graphics drivers, we believe the interfaces discussed below are generally applicable to any modern high performance graphics driver. The interfaces in question can be loosely categorized into the different classes reliability, compatibility and performance: Reliability: The NVIDIA graphics driver needs to be able to create uncached kernel and user mappings of I/O memory, such as NVIDIA GPU registers. The FreeBSD kernel does not currently provide the interfaces necessary to specify the memory type when creating such mappings, which makes it difficult for the NVIDIA graphics driver to guarantee that the correct memory type is selected. Kernel mappings of I/O memory can be created with the pmap_mapdev() interface, user mappings are created with mmap(2). On FreeBSD i386 and on FreeBSD amd64, the effective memory type of mappings created with either interface is determined by a given system's MTRR configuration by default, which will specify the correct UC memory type in most, but not in all cases. MTRR configurations with non-UC memory ranges overlapping I/O memory mapped via pmap_mapdev() or mmap(2) can result in the incorrect memory type being selected, which can impair reliability. To reduce the likelihood of problems, the FreeBSD i386 driver updates the mappings returned by pmap_mapdev() with the PCD/PWT flags to force use of the UC memory type. On FreeBSD amd64, the presence of a large static mapping using 2MB pages makes this approach unfeasible. In the case of user mappings, limited control over the memory type can be exerted with the help of MTRRs, but their lack of flexibility greatly reduces the feasibility of this approach. 1) The NVIDIA FreeBSD graphics driver is in need of new a interface that supports the creation of UC kernel mappings on FreeBSD i386 and on FreeBSD amd64. John Baldwin is working on a new interface, pmap_mapdev_attr(), which will allow the NVIDIA graphics driver to create UC kernel mappings on FreeBSD i386 and on FreeBSD amd64; the implementation on the latter platform will handle the direct mapping transparently. 2) As described above, user mappings of I/O memory are created via the mmap(2) interface and the FreeBSD device pager; unfortunately, drivers do not currently have control over the memory type used. The NVIDIA FreeBSD graphics driver needs to be able to specify the memory type used for user mappings created via mmap(2). This interface is also important for high performance graphics (see 'Performance' below). Compatibility: 1) The NVIDIA graphics driver needs to be able to set the memory type of the kernel mapping of memory allocated with malloc()/contigmalloc() to UC, which presents essentially the same problems as those outlined above for I/O memory mappings. The ability to change the memory type is necessary to avoid aliasing problems when the memory is mapped into the AGP aperture, which is accessed via WC user mappings. If the creation of UC/WC user mappings becomes possible for system memory in the future (see below), the ability to change the memory type of the associated kernel mappings to UC will be important for the same reason. Newer NVIDIA FreeBSD i386 graphics drivers manually update the memory type of the kernel mappings of malloc() allocated memory using the approach described for kernel mappings above. This is not feasible on FreeBSD amd64 due to the static direct mapping (see above). The NVIDIA FreeBSD graphics driver needs an interface that allows it to change the memory type of the kernel mapping(s) of system memory allocated with malloc()/contigmalloc(). The interface should flush CPU and TLB caches, when necessary. John Baldwin is working on pmap_change_attr() for FreeBSD i386 and for FreeBSD amd64, which will allow specifying the desired memory types for kernel mappings created with e.g. malloc()/contigmalloc(). 2) The NVIDIA graphics driver needs to map different types of memory into the address spaces of user clients, most commonly: a) NVIDIA graphics device registers b) NVIDIA graphics device frame buffer memory c) AGP memory allocations (mapped via the AGP aperture) d) DMA system memory allocations This is currently done via mmap(2) and the device pager, i.e. the user client performs a private ioctl(2) to allocate memory (this step is specific to the b) - d) memory types), then calls mmap(2) to obtain a user mapping of the memory. The NVIDIA graphics driver's d_mmap() callback is invoked first to check the logical mmap(2) offset(s), then again to return the associated page frame number(s) when the mapping is accessed for the first time. The device pager mechanism works well for a) - c), but not for d). The system memory allocations are frequently very large (several MB) and need to be allocated physically non-contiguous. This leads to problems with the d_mmap() interface: - d_mmap() is called per page with logical offsets computed based on the mmap(2) base offset provided by the client and the current page's position within the allocation, but no context information is provided to d_mmap(). The NVIDIA FreeBSD graphics driver can look up the associated system memory allocation and determine the page frame number(s) for a given logical offset only if a linear address range is associated with each system memory allocation, in which case the start address can serve as the mmap(2) offset used by the client and the logical offsets can be compared with each allocation's linear address range. Since the memory itself is not physically contiguous, the physical addresses of pages in the allocation can not be used as mmap(2) offsets, a different address range needs to be used. The FreeBSD i386 driver currently allocates its system memory with malloc() and derives the address range used with mmap(2) from the allocation's kernel virtual address range. This allocation of DMA system memory with malloc() is problematic on FreeBSD i386 PAE and FreeBSD amd64 systems with more than 4GB of RAM and older NVIDIA GPUs limited to 32-bit DMA, since malloc() doesn't currently allow drivers to specify allocation constraints, like contigmalloc() does, i.e. it may allocate physical memory that can not be addressed by such GPUs. Further, since the physical addresses of non-contiguous allocations can not be used as mmap(2) offsets for system memory, but need to be used for a) - c), the logical and physical addresses used as mmap(2) offsets can potentially be confused by d_mmap(). The NVIDIA graphics driver tries to minimize this risk, but can not avoid it completely without a significant performance penalty. - The device pager was designed for I/O memory regions and it assumes that d_mmap() will always return the same page frame number for a given logical offset. As a result, d_mmap() is invoked exactly once for any given logical offset by default. In case of system memory allocations, however, the physical page backing a given offset may change as the malloc()'d memory is freed/reallocated. The NVIDIA FreeBSD graphics driver needs to manually invalidate the translation cache to work around this problem. It does so with the msync() system call, which was extended for this purpose in FreeBSD 4.7 and again in FreeBSD 4.9 and 5.2.1. This leads to performance problems on some configurations. The NVIDIA FreeBSD graphics driver needs a different interface to make the mapping of system memory allocations via mmap(2) simpler. If the d_mmap() callback was extended to be called with the base offset in addition to the current offset, the first two of the problems detailed above would no longer be an issue; the NVIDIA graphics driver would then be able to use physical addresses as mmap(2) offsets for a) - d). The new interface may not require a FreeBSD specific ioctl(2), as this would break compatibility with the NVIDIA Linux OpenGL library used in the FreeBSD Linux ABI compatibility environment. 3) To be able to support FreeBSD i386 PAE and FreeBSD amd64 systems with more than 4GB of physical memory and NVIDIA GPUs that are limited to 32-bit DMA, the NVIDIA FreeBSD graphics driver will need to be updated to allocate memory from within the first 4GB of memory. Unfortunately, this is not feasible with the current interfaces. The malloc() interface does not allow the caller to specify allocation constraints and while contigmalloc() does, its usefulness is currently limited. This is because DMA memory can't realistically be allocated contiguously, except if the allocations are very small, and because a contiguous address range is needed for mmap(2), as described above, which would need to be maintained seperately for contigmalloc() memory allocations. The introduction of an malloc() variant that allows the specification of allocation constraints would solve the addressing problem, but due to the problems caused by using logical and physical addresses for mmap(2), a different solution would be preferred. By making it possible to use physical addresses exclusively as mmap(2) offsets, as described above, the NVIDIA FreeBSD graphics driver could use the contigmalloc() interface to allocate the invidiual pages in the larger non-contiguous allocations. If contigmalloc() were used, the NVIDIA FreeBSD graphics driver would need to be able to create contiguous virtual mappings spanning more than one page within larger virtually non-contiguous allocations; this functionality had best be implemented in the FreeBSD kernel. The 'vmap()' kernel interface does this on Linux. It takes an array of pages and maps them into a single contiguous address range. Performance: 1) For optimal PCI-E performance and improved compatibility with systems where MTRR memory ranges do not provide sufficient flexibility, the NVIDIA FreeBSD graphics driver needs to be able to specify the memory type used for user mappings created with mmap(2). John Baldwin is working on PAT support for FreeBSD, which will be used by the pmap_mapdev_attr() and pmap_change_attr() kernel interfaces referred to above. This support can provide the desired flexibility if the d_mmap() interface is extended or complemented with a new one, allowing drivers to take advantage of the PAT support. In order to provide optimal PCI-E performance, NVIDIA FreeBSD graphics drivers need to be able to create WC system memory mappings. 2) The device pager mechanism is page fault based, which incurs noticable overhead due to the large number of user/kernel context switches. This can result in significant performance penalties with very large or numerous kernel mappings. It also currently requires the use of the msync() workaround (see above), which incurs additional overhead. Performance with the NVIDIA FreeBSD graphics driver would benefit from an mmap(2) interface that is independent of the device pager and allows the mappings' page tables to be prebuilt. The Linux and Solaris operating systems support such interfaces. 3) On Linux and Solaris, the NVIDIA graphics driver can maintain per open instance data, i.e. data that is specific to the processes' file descriptors associated with NVIDIA character special files. This is useful primarily to achieve optimal results with the driver's internal notification mechanism, which is used to implement Sync-to-VBlank functionality, among other things. On these two operating systems, the NVIDIA graphics driver can selectively wake threads select(2)'ing the device files (/dev/nvidia0..N). The NVIDIA FreeBSD graphics driver can only maintain per device state at the moment. It wakes all processes waiting on /dev/nvidiaX, and needs to traverse a per device event list for each of these processes to check whether an event was delivered for each one of them, which incurs some overhead. The logic also can't currently guarantee correct delivery of events to different threads in the same process. Future versions of the NVIDIA FreeBSD graphics driver are likely to employ the notification mechanism more aggressively, to better support composited X desktop functionality. Summary of Tasks: # Task: implement pmap_mapdev_attr() on FreeBSD i386 and on FreeBSD amd64. Motivation: allows reliable creation of kernel mappings of I/O memory with specific cache attributes (with per-page granularity). Priority: gates FreeBSD amd64 support. Status: is being implemented for i386 and amd64 (work is being done to allow easily breaking down 2MB pages). # Task: design/implement better mmap(2) mechanism for mapping memory to user space (context information, cache attributes). Motivation: allows reliable creation of user mappings of DMA and I/O memory and support for systems with more than 4GB of RAM. Priority: gates improved FreeBSD i386 support (PCI-E performance, SLI support, improved reliability); gates FreeBSD amd64 support. Status: has not been started, pending. # Task: implement pmap_change_attr() on FreeBSD i386 and on FreeBSD amd64. Motivation: allows prevention of cache coherency problems. Priority: gates FreeBSD amd64 support. Status: is being implemented for i386 and amd64. # Task: implement vmap()-like kernel interface. Motivation: allows creation of contiguous kernel mappings of parts of or complete non-contiguous DMA/system memory allocations. Priority: gates support for systems with more than 4GB of RAM. Status: has not been started. # Task: implement mechanism to allow character drivers to maintain per-open instance data (e.g. like the Linux kernel's 'struct file *'). Motivation: allows per thread NVIDIA notification delivery; also reduces CPU overhead for notification delivery from the NVIDIA kernel module to the X driver and to OpenGL. Priority: should translate to improved X/OpenGL performance. Status: has not been started. Thanks, -- christian zander ch?zander@nvidia.com