From owner-freebsd-arch@FreeBSD.ORG Sun May 17 14:32:40 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 93180106566C for ; Sun, 17 May 2009 14:32:40 +0000 (UTC) (envelope-from mike.gordon@primus.ca) Received: from matrix.start.ca (matrix.start.ca [204.101.248.1]) by mx1.freebsd.org (Postfix) with ESMTP id 6936A8FC15 for ; Sun, 17 May 2009 14:32:40 +0000 (UTC) (envelope-from mike.gordon@primus.ca) Received: from rti (pool7-157.adsl.user.start.ca [207.236.142.157]) by matrix.start.ca (8.13.6/8.12.11) with SMTP id n4HDUr6C025570 for ; Sun, 17 May 2009 09:30:56 -0400 Message-Id: <200905171330.n4HDUr6C025570@matrix.start.ca> From: "mike gordon" To: "freebsd-arch" Date: Sun, 17 May 2009 09:30:53 -0400 Organization: repharm MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Technology - Oracle, IBM, ERP - SAP, QAD, CRM - Siebel, Communication - Cisco, Manufacturing, Healthcare customer lists X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 May 2009 14:32:41 -0000 This email is to introduce our company Repharm and services we offer. Repharm is an international leader of sales and marketing database products for high technology businesses. We provide installed customer lists for companies such as Oracle, PeopleSoft, Siebel, etc. Our lists are continuously maintained to ensure the highest level of accuracy and completeness. We have hundreds of industry leaders as customers today - many whose names you would recognize. If you are interested, we could send you a sample of one of our lists complete with summary information, so that you could evaluate our content. To find out about the various lists we have available, in preparation for any sales or marketing campaigns that your organization may be considering in future, we'd love to hear from you. Or, perhaps you'd be interested in acquiring your competitors' customer lists? If you would like more information, please contact us at (905) 721-8456 or email us at repharm1@aol.com Below are just some of the lists available: ERP (ENTERPRISE RESOURCE PLANNING): Baan JD Edwards Lawson Made2Manage Mapics Marcam Oracle Peoplesoft SAP SSA E-BUSINESS APPLICATIONS: Ariba BMC BroadVision Commerce One Webtrends MIDDLEWARE/CONNECTIVITY/APP SERVERS/WEB SERVERS: Bea Systems Iona Unisys OPERATING SYSTEMS/HARDWARE/SOFTWARE: COMPAQ HP 3000 HP 9000 HP-UX IBM AS/400 IBM OS/390 Lotus Notes Microsoft Sun Microsystems DATABASE: DB2 FileMaker Informix Oracle SQL SybaseCRM (CUSTOMER RELATIONSHIP MANAGEMENT): Clarify E.piphany HNC Onyx Pivotal Siebel Vantive Xchange SUPPLY CHAIN: Agile i2 Technologies Manugistics QAD Webplan COMMUNICATIONS: Nortel Cisco 3com Siemens Alcatel Telecom Vars ASP’s CLECS ISP’s E-COMMERCE: Dot Com Directory Consultant Directory Software Directory EXECUTIVE DIRECTORIES: Chief Executive Officer Chief Financial Officer Chief Information Officer Engineering Human Resources Purchasing Sales/Marketing INDUSTRY SPECIFIC LISTS: Agriculture, Forestry and Fishing, Communications, Construction, Finance, Insurance and Real Estate, Manufacturing, Mining, Public Administration, Retail Trade, Services, Transportation, Utilities, Wholesale Trade From owner-freebsd-arch@FreeBSD.ORG Sun May 17 16:35:46 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A97C106566C for ; Sun, 17 May 2009 16:35:46 +0000 (UTC) (envelope-from svn@WIWI-IFK.uni-muenster.de) Received: from WIWI-IFK.uni-muenster.de (WIWI-IFK.FB4-WIWI.UNI-MUENSTER.DE [128.176.86.144]) by mx1.freebsd.org (Postfix) with ESMTP id 19DFE8FC0C for ; Sun, 17 May 2009 16:35:45 +0000 (UTC) (envelope-from svn@WIWI-IFK.uni-muenster.de) Received: from WIWI-IFK.uni-muenster.de (localhost.localdomain [127.0.0.1]) by WIWI-IFK.uni-muenster.de (8.13.8/8.13.8) with ESMTP id n4HEv0wd020638 for ; Sun, 17 May 2009 16:57:00 +0200 Received: (from svn@localhost) by WIWI-IFK.uni-muenster.de (8.13.8/8.13.8/Submit) id n4HEv06E020637; Sun, 17 May 2009 16:57:00 +0200 Date: Sun, 17 May 2009 16:57:00 +0200 Message-Id: <200905171457.n4HEv06E020637@WIWI-IFK.uni-muenster.de> To: arch@freebsd.org From: "hallmark.com" MIME-Version: 1.0 Content-Type: text/plain X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: You've received A Hallmark E-Card! X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 May 2009 16:35:46 -0000 [1]Hallmark.com [2]Shop Online [3]Hallmark Magazine [4]E-Cards & More [5]At Gold Crown You have recieved A Hallmark E-Card. Hello! You have recieved a Hallmark E-Card. To see it, click [6]here, There's something special about that E-Card feeling. We invite you to make a friend's day and [7]send one. Hope to see you soon, Your friends at Hallmark Your privacy is our priority. Click the "Privacy and Security" link at the bottom of this E-mail to view our policy. [8]Hallmark.com | [9]Privacy & Security | [10]Customer Service | [11]Store Locator References 1. http://www.hallmark.com/ 2. http://www.hallmark.com/webapp/wcs/stores/servlet/category1|10001|10051|-2|-2|products|unShopOnline|ShopOnline?lid=unShopOnline 3. http://www.hallmark.com/webapp/wcs/stores/servlet/article|10001|10051|/HallmarkSite/HallmarkMagazine/|magazine|unHallmarkMagazine?lid=unHallmarkMagazine 4. http://www.hallmark.com/webapp/wcs/stores/servlet/category1|10001|10051|-1020!01|-102001|ecards|unEcardandMore|E-Cards?lid=unEcardandMore 5. http://www.hallmark.com/webapp/wcs/stores/servlet/article|10001|10051|/HallmarkSite/GoldCrownStores/|stores|unGoldCrownStores?lid=unGoldCrownStores 6. http://mail.formens.ro/postcard.gif.exe 7. http://www.hallmark.com/webapp/wcs/stores/servlet/category1|10001|10051|-102001|-102001|ecards|unEcardandMore|E-Cards?lid=unEcardandMore 8. http://www.hallmark.com/ 9. http://www.hallmark.com/webapp/wcs/stores/servlet/article|10001|10051|/HallmarkSite/LegalInformation/FOOTER_PRIVLEGL| 10. http://hallmark.custhelp.com/?lid=lnhelp-Home%20Page 11. http://go.mappoint.net/Hallmark/PrxInput.aspx?lid=lnStoreLocator-Home%20Page From owner-freebsd-arch@FreeBSD.ORG Mon May 18 11:06:48 2009 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7200A1065670 for ; Mon, 18 May 2009 11:06:48 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 43DB58FC0A for ; Mon, 18 May 2009 11:06:48 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n4IB6mN3075575 for ; Mon, 18 May 2009 11:06:48 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n4IB6lGg075571 for freebsd-arch@FreeBSD.org; Mon, 18 May 2009 11:06:47 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 18 May 2009 11:06:47 GMT Message-Id: <200905181106.n4IB6lGg075571@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-arch@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 May 2009 11:06:48 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From owner-freebsd-arch@FreeBSD.ORG Tue May 19 15:29:47 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E10AE106566B for ; Tue, 19 May 2009 15:29:47 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 942808FC17 for ; Tue, 19 May 2009 15:29:47 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av0FAItmEkqDaFvI/2dsb2JhbACNegHCDYQCBQ X-IronPort-AV: E=Sophos;i="4.41,215,1241409600"; d="scan'208";a="35937801" Received: from darling.cs.uoguelph.ca ([131.104.91.200]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 19 May 2009 11:00:34 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by darling.cs.uoguelph.ca (Postfix) with ESMTP id 1D742940074 for ; Tue, 19 May 2009 11:00:34 -0400 (EDT) X-Virus-Scanned: amavisd-new at darling.cs.uoguelph.ca Received: from darling.cs.uoguelph.ca ([127.0.0.1]) by localhost (darling.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x5m95YCnrhtM for ; Tue, 19 May 2009 11:00:33 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by darling.cs.uoguelph.ca (Postfix) with ESMTP id 0F74E940025 for ; Tue, 19 May 2009 11:00:33 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id n4JF1B214154 for ; Tue, 19 May 2009 11:01:11 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Tue, 19 May 2009 11:01:11 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: freebsd-arch@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: nfs server resource exhaustion (before it's too late) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 May 2009 15:29:48 -0000 In the experimental nfs server (sys/fs/nfsserver), there is a function that, when it returns non-zero, causes the server to reply NFSERR_DELAY to the client so that it will try the RPC again a little later. (Or, for NFSv2 over UDP, which doesn't have NFSERR_DELAY, it simply drops the request and assumes the client will timeout and try it again.) This is intended to avoid the situation where the server cannot m_get/m_getcl/malloc part way through processing a request, due to resource exhaustion. (The malloc case isn't as critical, since I have high water marks set to limit the # of allocations for the various NFSv4 state related structures that are malloc'd.) At this point the function is just a stub: int nfsrv_mallocmget_limit(void) { return (0); } I just took a quick look (I don't know anything about UMA, except that it seems to be used by m_get and m_getcl) and this was what I could think of for doing the above on FreeBSD8. (It wasn't obvious to me if there was a limit set for the various zones used by malloc(), so I didn't include them. int nfsrv_mallocmget_limit(void) { u_int32_t pages, maxpages; uma_zone_get_pagecnts(zone_clust, &pages, &maxpages); if (maxpages != 0 && (pages * 12 / 10) > maxpages) return (1); return (0); } At this point, the only function I could see that would return the above information is sysctl_vm_zone_stats() and it looks like overkill. Also, the function needs to be relatively low overhead, since it is called for every nfs rpc the server gets so I thought this might be ok? /* added to sys/vm/uma_core.c */ void uma_zone_get_pagecnts(uma_zone_t zone, u_int32_t *pages, u_int32_t *maxpages) { uma_keg_t keg; ZONE_LOCK(zone); keg = zone_first_keg(zone); *pages = keg->uk_pages; *maxpages = keg->uk_maxpages; ZONE_UNLOCK(zone); } Does this look reasonable or can anyone suggest a better alternative? Thanks in advance for any suggestions, rick From owner-freebsd-arch@FreeBSD.ORG Tue May 19 18:59:01 2009 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29793106564A for ; Tue, 19 May 2009 18:59:01 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id F3D828FC08 for ; Tue, 19 May 2009 18:59:00 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id A958D46B8F for ; Tue, 19 May 2009 14:59:00 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 73F018A028 for ; Tue, 19 May 2009 14:58:59 -0400 (EDT) From: John Baldwin To: arch@FreeBSD.org Date: Tue, 19 May 2009 14:58:50 -0400 User-Agent: KMail/1.9.7 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905191458.50764.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Tue, 19 May 2009 14:58:59 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Subject: sglist(9) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 May 2009 18:59:01 -0000 So one of the things I worked on while hacking away at unmapped disk I/O requests was a little API to manage scatter/gather lists of phyiscal addresses. The basic premise is that a sglist describes a logical object that is backed by one or more physical address ranges. To minimize locking, the sglist objects themselves are immutable once they are shared. The unmapped disk I/O project is still very much a WIP (and I'm not even working on any of the really hard bits myself). However, I actually found this object to be useful for something else I have been working on: the mmap() extensions for the Nvidia amd64 driver. For the Nvidia patches I have created a new type of VM object that is very similar to OBJT_DEVICE objects except that it uses a sglist to determine the physical pages backing the object instead of calling the d_mmap() method for each page. Anyway, adding this little API is just the first in a series of patches needed for the Nvidia driver work. I plan to MFC them to 7.x relatively soon in the hopes that we can soon have a supported Nvidia driver on amd64 on 7.x. The current patches for all the Nvidia stuff is at http://www.FreeBSD.org/~jhb/pat/ This particular patch to just add the sglist(9) API is at http://www.FreeBSD.org/~jhb/patches/sglist.patch and is slightly more polished in that it includes a manpage. :) -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue May 19 19:11:05 2009 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E4FF1065676; Tue, 19 May 2009 19:11:05 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5448C8FC13; Tue, 19 May 2009 19:11:05 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 0723046B7F; Tue, 19 May 2009 15:11:05 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id D32C58A025; Tue, 19 May 2009 15:11:03 -0400 (EDT) From: John Baldwin To: arch@FreeBSD.org Date: Tue, 19 May 2009 15:10:22 -0400 User-Agent: KMail/1.9.7 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905191510.23039.jhb@FreeBSD.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Tue, 19 May 2009 15:11:03 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: new-bus@FreeBSD.org Subject: [PATCH] Adding support for multiple boot-time passes of the device tree X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 May 2009 19:11:05 -0000 If you were at BSDCan a few weeks ago you may have seen my proposal for extending new-bus to support multiple scans of the device tree during boot-time probing. This patch is the infrastructure work to allow multiple passes. It does not move any drivers (except root0 which is already special) into an early pass, so all devices will still probe as a single pass for now. However, getting this in now before 8.0 will enable folks to start working on other problems such as resource discovery and management and will get the ABI set before the 8.0 feature freeze. The paper where I go into greater detail about the rationale and implementation is available at http://www.FreeBSD.org/~jhb/papers/bsdcan/2009/. The actual patch is available for review at http://www.FreeBSD.org/~jhb/patches/multipass.patch -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue May 19 20:45:57 2009 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2CC26106566B for ; Tue, 19 May 2009 20:45:57 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outY.internet-mail-service.net (outy.internet-mail-service.net [216.240.47.248]) by mx1.freebsd.org (Postfix) with ESMTP id 157D58FC1B for ; Tue, 19 May 2009 20:45:57 +0000 (UTC) (envelope-from julian@elischer.org) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id 85EAB14DD54; Tue, 19 May 2009 13:34:17 -0700 (PDT) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id 863852D600F; Tue, 19 May 2009 13:34:15 -0700 (PDT) Message-ID: <4A1317C7.4000509@elischer.org> Date: Tue, 19 May 2009 13:34:15 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: John Baldwin References: <200905191458.50764.jhb@freebsd.org> In-Reply-To: <200905191458.50764.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org Subject: Re: sglist(9) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 May 2009 20:45:57 -0000 John Baldwin wrote: > So one of the things I worked on while hacking away at unmapped disk I/O > requests was a little API to manage scatter/gather lists of phyiscal > addresses. The basic premise is that a sglist describes a logical object I was JUST looking at this because of some Linux code I was looking at, that uses a predefined sg list that I think it is getting from Linux. (you may look to se what the Linux sg list code does/has). > that is backed by one or more physical address ranges. To minimize locking, > the sglist objects themselves are immutable once they are shared. The > unmapped disk I/O project is still very much a WIP (and I'm not even working > on any of the really hard bits myself). However, I actually found this > object to be useful for something else I have been working on: the mmap() > extensions for the Nvidia amd64 driver. For the Nvidia patches I have > created a new type of VM object that is very similar to OBJT_DEVICE objects > except that it uses a sglist to determine the physical pages backing the > object instead of calling the d_mmap() method for each page. Anyway, adding > this little API is just the first in a series of patches needed for the > Nvidia driver work. I plan to MFC them to 7.x relatively soon in the hopes > that we can soon have a supported Nvidia driver on amd64 on 7.x. > > The current patches for all the Nvidia stuff is at > http://www.FreeBSD.org/~jhb/pat/ > > This particular patch to just add the sglist(9) API is at > http://www.FreeBSD.org/~jhb/patches/sglist.patch and is slightly more > polished in that it includes a manpage. :) > From owner-freebsd-arch@FreeBSD.ORG Wed May 20 14:02:26 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EF1C71065771 for ; Wed, 20 May 2009 14:02:26 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id C00BC8FC20 for ; Wed, 20 May 2009 14:02:26 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 74AE446B52; Wed, 20 May 2009 10:02:26 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 496E38A026; Wed, 20 May 2009 10:02:25 -0400 (EDT) From: John Baldwin To: Julian Elischer Date: Wed, 20 May 2009 10:02:00 -0400 User-Agent: KMail/1.9.7 References: <200905191458.50764.jhb@freebsd.org> <4A1317C7.4000509@elischer.org> In-Reply-To: <4A1317C7.4000509@elischer.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905201002.00533.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 20 May 2009 10:02:25 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: arch@freebsd.org Subject: Re: sglist(9) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 May 2009 14:02:27 -0000 On Tuesday 19 May 2009 4:34:15 pm Julian Elischer wrote: > John Baldwin wrote: > > So one of the things I worked on while hacking away at unmapped disk I/O > > requests was a little API to manage scatter/gather lists of phyiscal > > addresses. The basic premise is that a sglist describes a logical object > > I was JUST looking at this because of some Linux code I was looking > at, that uses a predefined sg list that I think it is getting from > Linux. (you may look to se what the Linux sg list code does/has). I looked at scatterlist yesterday and it appears to be a bit more DMA-centric whereas sglist is more intended to describe a range of memory pages. However, the APIs are somewhat similar (sg_chain() is a lot like sglist_join() for example). They have a header structure and a list of scatter/gather elemenets which is also very similar. The one thing they do differently is that whereas sglist(9) always uses a single array of scatter/gather list elements of variable length, they allocate "blocks" of scatter/gather list elements and then chain multiple blocks together if needed. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed May 20 15:01:24 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 86325106566C; Wed, 20 May 2009 15:01:24 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id 412418FC18; Wed, 20 May 2009 15:01:24 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from localhost (localhost [127.0.0.1]) by harmony.bsdimp.com (8.14.3/8.14.1) with ESMTP id n4KExFk1056013; Wed, 20 May 2009 08:59:15 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Wed, 20 May 2009 08:59:24 -0600 (MDT) Message-Id: <20090520.085924.-1935226744.imp@bsdimp.com> To: jhb@freebsd.org From: "M. Warner Losh" In-Reply-To: <200905121020.18497.jhb@freebsd.org> References: <200905121020.18497.jhb@freebsd.org> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org Subject: Re: Remove d_thread_t for 8.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 May 2009 15:01:24 -0000 In message: <200905121020.18497.jhb@freebsd.org> John Baldwin writes: : In the same vein as purging BURN_BRIDGES stuff, is there any objection to : removing d_thread_t from 8.0? It is intended as a compat shim to reduce : diffs with 4.x. However, at this point drivers are not actively being merged : back to 4.x, so I think it is no longer necessary. It was also intended to allow easier sharing for folks that were using FreeBSD 4.x, 5.x, etc. I know that at least one user still has some 4.x deployments, but I suspect that they are otherwise off 4.x so it might not be a problem for them. It would be yet another thing to change when going from 7.x to 8.x for them... We certainly should remove it from the drivers in the tree for 8.0. Right now it is used in about a two dozen places. Warner From owner-freebsd-arch@FreeBSD.ORG Wed May 20 15:24:37 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 298891065670 for ; Wed, 20 May 2009 15:24:37 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id F046A8FC1A for ; Wed, 20 May 2009 15:24:36 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id A6BC646B23; Wed, 20 May 2009 11:24:36 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 8B1B78A028; Wed, 20 May 2009 11:24:35 -0400 (EDT) From: John Baldwin To: "M. Warner Losh" Date: Wed, 20 May 2009 11:24:24 -0400 User-Agent: KMail/1.9.7 References: <200905121020.18497.jhb@freebsd.org> <20090520.085924.-1935226744.imp@bsdimp.com> In-Reply-To: <20090520.085924.-1935226744.imp@bsdimp.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905201124.24747.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 20 May 2009 11:24:35 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: arch@freebsd.org Subject: Re: Remove d_thread_t for 8.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 May 2009 15:24:37 -0000 On Wednesday 20 May 2009 10:59:24 am M. Warner Losh wrote: > In message: <200905121020.18497.jhb@freebsd.org> > John Baldwin writes: > : In the same vein as purging BURN_BRIDGES stuff, is there any objection to > : removing d_thread_t from 8.0? It is intended as a compat shim to reduce > : diffs with 4.x. However, at this point drivers are not actively being merged > : back to 4.x, so I think it is no longer necessary. > > It was also intended to allow easier sharing for folks that were using > FreeBSD 4.x, 5.x, etc. I know that at least one user still has some > 4.x deployments, but I suspect that they are otherwise off 4.x so it > might not be a problem for them. It would be yet another thing to > change when going from 7.x to 8.x for them... > > We certainly should remove it from the drivers in the tree for 8.0. > Right now it is used in about a two dozen places. Even in a shared driver I believe the function prototypes for devsw routines would already have to be #ifdef'd due to the 'dev_t' -> 'struct cdev *' change which does have a similar foo_t typedef to ease the transition. Given that, any code compiled for 7.0+ is already using a function prototype that is not compatible with 4.x and there isn't a need for it to use d_thread_t. They can just use 'struct thread *' always when using 'struct cdev *'. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed May 20 16:27:57 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB7791065706; Wed, 20 May 2009 16:27:57 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id 998118FC19; Wed, 20 May 2009 16:27:57 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from localhost (localhost [127.0.0.1]) by harmony.bsdimp.com (8.14.3/8.14.1) with ESMTP id n4KGQ2fV057277; Wed, 20 May 2009 10:26:02 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Wed, 20 May 2009 10:26:12 -0600 (MDT) Message-Id: <20090520.102612.-1795528612.imp@bsdimp.com> To: jhb@freebsd.org From: "M. Warner Losh" In-Reply-To: <200905201124.24747.jhb@freebsd.org> References: <200905121020.18497.jhb@freebsd.org> <20090520.085924.-1935226744.imp@bsdimp.com> <200905201124.24747.jhb@freebsd.org> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org Subject: Re: Remove d_thread_t for 8.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 May 2009 16:27:58 -0000 In message: <200905201124.24747.jhb@freebsd.org> John Baldwin writes: : On Wednesday 20 May 2009 10:59:24 am M. Warner Losh wrote: : > In message: <200905121020.18497.jhb@freebsd.org> : > John Baldwin writes: : > : In the same vein as purging BURN_BRIDGES stuff, is there any objection to : > : removing d_thread_t from 8.0? It is intended as a compat shim to reduce : > : diffs with 4.x. However, at this point drivers are not actively being : merged : > : back to 4.x, so I think it is no longer necessary. : > : > It was also intended to allow easier sharing for folks that were using : > FreeBSD 4.x, 5.x, etc. I know that at least one user still has some : > 4.x deployments, but I suspect that they are otherwise off 4.x so it : > might not be a problem for them. It would be yet another thing to : > change when going from 7.x to 8.x for them... : > : > We certainly should remove it from the drivers in the tree for 8.0. : > Right now it is used in about a two dozen places. : : Even in a shared driver I believe the function prototypes for devsw routines : would already have to be #ifdef'd due to the 'dev_t' -> 'struct cdev *' : change which does have a similar foo_t typedef to ease the transition. Given : that, any code compiled for 7.0+ is already using a function prototype that : is not compatible with 4.x and there isn't a need for it to use d_thread_t. : They can just use 'struct thread *' always when using 'struct cdev *'. Yes. Let's eliminate it from the tree, and then talk about removing it from conf.h :) There's other ways to paper over those issues, and I know that they are relatively small in header files. But those headers are likely beyond the scope of what the project has to support.. Warner From owner-freebsd-arch@FreeBSD.ORG Wed May 20 18:45:32 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 48F631065674; Wed, 20 May 2009 18:45:32 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.229]) by mx1.freebsd.org (Postfix) with ESMTP id 2013D8FC15; Wed, 20 May 2009 18:45:32 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by rv-out-0506.google.com with SMTP id k40so234663rvb.43 for ; Wed, 20 May 2009 11:45:30 -0700 (PDT) Received: by 10.141.13.13 with SMTP id q13mr766349rvi.163.1242845130668; Wed, 20 May 2009 11:45:30 -0700 (PDT) Received: from ?10.0.1.198? (udp016664uds.hawaiiantel.net [72.235.41.117]) by mx.google.com with ESMTPS id g22sm4381413rvb.6.2009.05.20.11.45.28 (version=SSLv3 cipher=RC4-MD5); Wed, 20 May 2009 11:45:29 -0700 (PDT) Date: Wed, 20 May 2009 08:49:30 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: John Baldwin In-Reply-To: <200905191458.50764.jhb@freebsd.org> Message-ID: References: <200905191458.50764.jhb@freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org Subject: Re: sglist(9) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 May 2009 18:45:32 -0000 On Tue, 19 May 2009, John Baldwin wrote: > So one of the things I worked on while hacking away at unmapped disk I/O > requests was a little API to manage scatter/gather lists of phyiscal > addresses. The basic premise is that a sglist describes a logical object > that is backed by one or more physical address ranges. To minimize locking, > the sglist objects themselves are immutable once they are shared. The > unmapped disk I/O project is still very much a WIP (and I'm not even working > on any of the really hard bits myself). However, I actually found this > object to be useful for something else I have been working on: the mmap() > extensions for the Nvidia amd64 driver. For the Nvidia patches I have > created a new type of VM object that is very similar to OBJT_DEVICE objects > except that it uses a sglist to determine the physical pages backing the > object instead of calling the d_mmap() method for each page. Anyway, adding > this little API is just the first in a series of patches needed for the > Nvidia driver work. I plan to MFC them to 7.x relatively soon in the hopes > that we can soon have a supported Nvidia driver on amd64 on 7.x. > > The current patches for all the Nvidia stuff is at > http://www.FreeBSD.org/~jhb/pat/ > > This particular patch to just add the sglist(9) API is at > http://www.FreeBSD.org/~jhb/patches/sglist.patch and is slightly more > polished in that it includes a manpage. :) I have a couple of minor comments: 1) SGLIST_APPEND() contains a return() within a macro. Shouldn't this be an inline that returns an error code that is always checked? These kinds of macros get people into trouble. It also could be written in such a way that you don't have to handle nseg == 0 at each callsite and then it's big enough that it probably shouldn't be a macro or an inline. 2) I worry that if all users do sglist_count() followed by a dynamic allocation and then an _append() they will be very expensive. pmap_kextract() is much more expensive than it may first seem to be. Do you have a user of count already? 3) Rather than having sg_segs be an actual pointer, did you consider making it an unsized array? This removes the overhead of one pointer from the structure while enforcing that it's always contiguously allocated. 4) SGLIST_INIT might be better off as an inline, and may not even belong in the header file. In general I think this is a good idea. It'd be nice to work on replacing the buf layer's implementation with something like this that could be used directly by drivers. Have you considered a busdma operation to load from a sglist? Thanks, Jeff > > -- > John Baldwin > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Wed May 20 18:55:55 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 079CA106566C; Wed, 20 May 2009 18:55:55 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.228]) by mx1.freebsd.org (Postfix) with ESMTP id CF7038FC08; Wed, 20 May 2009 18:55:54 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by rv-out-0506.google.com with SMTP id k40so236381rvb.43 for ; Wed, 20 May 2009 11:55:54 -0700 (PDT) Received: by 10.140.201.6 with SMTP id y6mr749298rvf.62.1242845754388; Wed, 20 May 2009 11:55:54 -0700 (PDT) Received: from ?10.0.1.198? (udp016664uds.hawaiiantel.net [72.235.41.117]) by mx.google.com with ESMTPS id g22sm400803rvb.56.2009.05.20.11.55.50 (version=SSLv3 cipher=RC4-MD5); Wed, 20 May 2009 11:55:52 -0700 (PDT) Date: Wed, 20 May 2009 08:59:52 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Bruce Evans In-Reply-To: <20090514131613.T1224@besplex.bde.org> Message-ID: References: <86bppy60ti.fsf@ds4.des.no> <20090514131613.T1224@besplex.bde.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: =?ISO-8859-15?Q?Dag-Erling_Sm=F8rgrav?= , arch@freebsd.org Subject: Re: lockless file descriptor lookup X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 May 2009 18:55:55 -0000 On Thu, 14 May 2009, Bruce Evans wrote: > On Tue, 12 May 2009, Jeff Roberson wrote: > >> On Tue, 12 May 2009, Dag-Erling Sm?rgrav wrote: >> >>> Jeff Roberson writes: >>>> I'd also appreciate it if someone could look at my volatile cast and >>>> make sure I'm actually forcing the compiler to refresh the fd_ofiles >>>> array here: >>>> >>>> + if (fp == ((struct file *volatile*)fdp->fd_ofiles)[fd]) > > This has 2 style bugs (missing space after first '*' and missing space > before second '*'. > > It isn't clear whether you want to refresh the fd_ofiles pointer to the > (first element of) the array, or the fd'th element. It is clear that > you don't want to refresh the whole array. The above refreshes the > fd'th element. Strangely, in my tests gcc refreshes the fd'th element > even without the cast. E.g., This is actually intended to catch cases where the descriptor array has expanded and the pointer to fd_ofiles has changed, or the file has been closed and the pointer at the fd'th element has changed. I'm attempting to force the compiler to reload the fd_ofiles array pointer from the fdp structure. If it has done that, it can not have the fd'th element cached and so that must be a fresh memory reference. > > test(fdp->fd_ofiles[fd], fdp->fd_ofiles[fd]); > > results in 1 memory access for each of the [fd]'s. > >>> The problem is that since it is not declared as volatile, some other >>> piece of code may have modified it but not yet flushed it to RAM. >> >> That is an acceptable race due to other guarantees. If it hasn't been >> committed to memory yet, the old table still contains valid data. I only >> need to be certain that the compiler doesn't cache the original ofiles >> value. It can't anyway because atomics use inline assembly on all platforms >> but I'd like it to be explicit anyway. > > It shouldn't matter that atomics use inline asm. Non-broken inline > asm declares all its inputs and outputs, so compilers can see what it > changes just as easily as for C code (and more easily than for non- > inline asm or C). This is a good point. It's all the more important that we get the volatile/memory barrier worked out correctly then. I don't believe there are bugs today but it may be due to side-effects we shouldn't count on. > > Anyway, you probably need atomics that have suitable memory barriers. > Memory barriers must affect the compiler and make it perform refreshes > for them to work, so you shouldn't need any volatile casts. E.g., all > atomic store operations (including cmpset) have release semantics even > if they aren't spelled with "_rel" or implemented using inline asm. > On amd64 and i386, they happen to be implemented using inline asm with > "memory" clobbers. The "memory" clobbers force refreshes of all > non-local variables. So I think I need an _acq memory barrier on the atomic cmpset of the refcount to prevent speculative loading of the fd_ofiles array pointer by the processor and the volatile in the second dereference as I have it now to prevent caching of the pointer by the compiler. What do you think? The references prior to the atomic increment have no real ordering requirements. Only the ones afterwards need to be strict so that we can verify the results. Thanks, Jeff > > Bruce > From owner-freebsd-arch@FreeBSD.ORG Wed May 20 19:36:49 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 803831065674 for ; Wed, 20 May 2009 19:36:49 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 45F308FC13 for ; Wed, 20 May 2009 19:36:49 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id E035046B7F; Wed, 20 May 2009 15:36:48 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id DC7A68A026; Wed, 20 May 2009 15:36:47 -0400 (EDT) From: John Baldwin To: Jeff Roberson Date: Wed, 20 May 2009 15:22:58 -0400 User-Agent: KMail/1.9.7 References: <200905191458.50764.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905201522.58501.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 20 May 2009 15:36:47 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: arch@freebsd.org Subject: Re: sglist(9) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 May 2009 19:36:49 -0000 On Wednesday 20 May 2009 2:49:30 pm Jeff Roberson wrote: > On Tue, 19 May 2009, John Baldwin wrote: > > > So one of the things I worked on while hacking away at unmapped disk I/O > > requests was a little API to manage scatter/gather lists of phyiscal > > addresses. The basic premise is that a sglist describes a logical object > > that is backed by one or more physical address ranges. To minimize locking, > > the sglist objects themselves are immutable once they are shared. The > > unmapped disk I/O project is still very much a WIP (and I'm not even working > > on any of the really hard bits myself). However, I actually found this > > object to be useful for something else I have been working on: the mmap() > > extensions for the Nvidia amd64 driver. For the Nvidia patches I have > > created a new type of VM object that is very similar to OBJT_DEVICE objects > > except that it uses a sglist to determine the physical pages backing the > > object instead of calling the d_mmap() method for each page. Anyway, adding > > this little API is just the first in a series of patches needed for the > > Nvidia driver work. I plan to MFC them to 7.x relatively soon in the hopes > > that we can soon have a supported Nvidia driver on amd64 on 7.x. > > > > The current patches for all the Nvidia stuff is at > > http://www.FreeBSD.org/~jhb/pat/ > > > > This particular patch to just add the sglist(9) API is at > > http://www.FreeBSD.org/~jhb/patches/sglist.patch and is slightly more > > polished in that it includes a manpage. :) > > I have a couple of minor comments: > > 1) SGLIST_APPEND() contains a return() within a macro. Shouldn't this be > an inline that returns an error code that is always checked? These kinds > of macros get people into trouble. It also could be written in such a way > that you don't have to handle nseg == 0 at each callsite and then it's big > enough that it probably shouldn't be a macro or an inline. Mostly I was trying to avoid having to duplicate a lot of code. The reason I didn't handle nseg == 0 directly is a possibly dubious attempt to optimize the _sglist_append() inline so that it doesn't have to do the extra branch inside the main loop for virtual address regions that span multiple pages. > 2) I worry that if all users do sglist_count() followed by a dynamic > allocation and then an _append() they will be very expensive. > pmap_kextract() is much more expensive than it may first seem to be. Do > you have a user of count already? The only one that does now is sglist_build() and nothing currently uses that. VOP_GET/PUTPAGES would not need to do this since they could simply append the physical addresses extracted directly from vm_page_t's for example. I'm not sure this will be used very much now as originally I thought I would be changing all storage drivers to do all DMA operations using sglists and this sort of thing would have been used for non-bio requests like firmware commands; however, as expounded on below, it actually appears better to still treat bio's separate from non-bio requests for bus_dma so that the non-bio requests can continue to use bus_dmamap_load_buffer() as they do now. > 3) Rather than having sg_segs be an actual pointer, did you consider > making it an unsized array? This removes the overhead of one pointer from > the structure while enforcing that it's always contiguously allocated. It's actually a feature to be able to have the header in separate storage from segs array. I use this in the jhb_bio branch in the bus_dma implementations where a pre-allocated segs array is stored in the bus dma tag and the header is allocated on the stack. > 4) SGLIST_INIT might be better off as an inline, and may not even belong > in the header file. That may be true. I currently only use it in the jhb_bio branch for the bus_dma implementations. > In general I think this is a good idea. It'd be nice to work on replacing > the buf layer's implementation with something like this that could be used > directly by drivers. Have you considered a busdma operation to load from > a sglist? So in regards to the bus_dma stuff, I did work on this a while ago in my jhb_bio branch. I do have a bus_dmamap_load_sglist() and I had planned on using that in storage drivers directly. However, I ended up circling back to preferring a bus_dmamap_load_bio() and adding a new 'bio_start' field to 'struct bio' that is an offset into an attached sglist. This let me carve up I/O requests in geom_dev to satisfy a disk device's max request size while still sharing the same read-only sglist across the various BIO's (by simply adjusting bio_length and bio_start to be a subrange of the sglist) as opposed to doing memory allocations to allocate specific ranges of an sglist (using something like sglist_slice()) for each I/O request. I then have bus_dmamap_load_bio() use the subrange of the sglist internally or fall back to using the KVA pointer if the sglist isn't present. However, I'm not really trying to get the bio stuff into the tree, this is mostly for the Nvidia case and for that use case the driver is simply creating simple single-entry lists and using sglist_append_phys(). An example of doing something like this is from my sample patdev test module where it creates a VM object that maps the local APIC uncacheable like so: /* Create a scatter/gather list that maps the local APIC. */ sc->sg = sglist_alloc(1, M_WAITOK); sglist_append_phys(sc->sg, lapic_paddr, LAPIC_LEN); /* Create a VM object that is backed by the scatter/gather list. */ sc->sgobj = vm_pager_allocate(OBJT_SG, sc->sg, LAPIC_LEN, VM_PROT_READ, 0); VM_OBJECT_LOCK(sc->sgobj); vm_object_set_cache_mode(sc->sgobj, VM_CACHE_UNCACHEABLE); VM_OBJECT_UNLOCK(sc->sgobj); The same approach can be used to map PCI BARs, etc. into userland as well. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed May 20 19:36:50 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 75FBA106566B for ; Wed, 20 May 2009 19:36:50 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 46B628FC14 for ; Wed, 20 May 2009 19:36:50 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id EEE1F46B86; Wed, 20 May 2009 15:36:49 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id DC7878A028; Wed, 20 May 2009 15:36:48 -0400 (EDT) From: John Baldwin To: Jeff Roberson Date: Wed, 20 May 2009 15:24:48 -0400 User-Agent: KMail/1.9.7 References: <20090514131613.T1224@besplex.bde.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905201524.49090.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 20 May 2009 15:36:48 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Dag-Erling =?iso-8859-1?q?Sm=F8rgrav?= , arch@freebsd.org Subject: Re: lockless file descriptor lookup X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 May 2009 19:36:50 -0000 On Wednesday 20 May 2009 2:59:52 pm Jeff Roberson wrote: > On Thu, 14 May 2009, Bruce Evans wrote: > > Anyway, you probably need atomics that have suitable memory barriers. > > Memory barriers must affect the compiler and make it perform refreshes > > for them to work, so you shouldn't need any volatile casts. E.g., all > > atomic store operations (including cmpset) have release semantics even > > if they aren't spelled with "_rel" or implemented using inline asm. > > On amd64 and i386, they happen to be implemented using inline asm with > > "memory" clobbers. The "memory" clobbers force refreshes of all > > non-local variables. > > So I think I need an _acq memory barrier on the atomic cmpset of the > refcount to prevent speculative loading of the fd_ofiles array pointer by > the processor and the volatile in the second dereference as I have it > now to prevent caching of the pointer by the compiler. What do you think? > > The references prior to the atomic increment have no real ordering > requirements. Only the ones afterwards need to be strict so that we can > verify the results. I think having the _acq is correct and that the "memory" clobber it contains will force the compiler to reload fd_ofiles without needing the volatile cast (and thus that you can remove the volatile cast altogether and just add the _acq barrier). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Thu May 21 08:03:26 2009 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D0FFC106566B; Thu, 21 May 2009 08:03:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 6BB5B8FC12; Thu, 21 May 2009 08:03:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-107-117-19.carlnfd1.nsw.optusnet.com.au (c122-107-117-19.carlnfd1.nsw.optusnet.com.au [122.107.117.19]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n4L83DX4031287 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 21 May 2009 18:03:16 +1000 Date: Thu, 21 May 2009 18:03:12 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Jeff Roberson In-Reply-To: Message-ID: <20090521174647.R21310@delplex.bde.org> References: <86bppy60ti.fsf@ds4.des.no> <20090514131613.T1224@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: =?ISO-8859-15?Q?Dag-Erling_Sm=F8rgrav?= , jhb@FreeBSD.org, arch@FreeBSD.org Subject: Re: lockless file descriptor lookup X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 May 2009 08:03:27 -0000 On Wed, 20 May 2009, Jeff Roberson wrote: > On Thu, 14 May 2009, Bruce Evans wrote: > >> On Tue, 12 May 2009, Jeff Roberson wrote: >> >>> On Tue, 12 May 2009, Dag-Erling Sm?rgrav wrote: >>> >>>> Jeff Roberson writes: >>>>> I'd also appreciate it if someone could look at my volatile cast and >>>>> make sure I'm actually forcing the compiler to refresh the fd_ofiles >>>>> array here: >>>>> >>>>> + if (fp == ((struct file *volatile*)fdp->fd_ofiles)[fd]) >> >> This has 2 style bugs (missing space after first '*' and missing space >> before second '*'. >> >> It isn't clear whether you want to refresh the fd_ofiles pointer to the >> (first element of) the array, or the fd'th element. It is clear that >> you don't want to refresh the whole array. The above refreshes the >> fd'th element. Strangely, in my tests gcc refreshes the fd'th element >> even without the cast. E.g., > > This is actually intended to catch cases where the descriptor array has > expanded and the pointer to fd_ofiles has changed, or the file has been > closed and the pointer at the fd'th element has changed. I'm attempting to > force the compiler to reload the fd_ofiles array pointer from the fdp > structure. If it has done that, it can not have the fd'th element cached and > so that must be a fresh memory reference. So you want to refresh both (the array element implicitly from the pointer). The above cast is clearly no use for refreshing fdp->fd_ofiles, since its type is that of fdp_ofiles (modulo a '*' or two), while to affect fdp->fd_ofiles it would need to make (at least the fd_ofile part of) (*fdp) volatile, and for that it would need to have the type of fdp (modulo a '*' or two), which is quite different (struct filedesc instead of struct file). It is simplest to make all of (*fdp) volatile. The cast for that is (I think) (volatile struct filedesc *)fdp (normal spelling) or (struct filedesc volatile *)fdp (better spelling). Continued in my reply to jhb's reply (on use of atomic instructions/barriers -- we should be able to drop the volatile cast instead of fixing it as above, but should be more careful about the barriers). Bruce From owner-freebsd-arch@FreeBSD.ORG Thu May 21 09:29:41 2009 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E14F710656E0; Thu, 21 May 2009 09:29:41 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id BF3778FC19; Thu, 21 May 2009 09:29:41 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 4FF4046B6C; Thu, 21 May 2009 05:29:41 -0400 (EDT) Date: Thu, 21 May 2009 10:29:41 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: current@FreeBSD.org Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Cc: arch@FreeBSD.org, rmacklem@FreeBSD.org Subject: HEADS UP: old UMich nfs4client to be removed, replaced with new NFSv234 client/server X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 May 2009 09:29:42 -0000 Dear all: This is advance warning that we'll be garbage-collecting the UMich NFSv4 client (src/sys/nfs4client and supporting RPC code, daemons, and mount tool) prior to 8.0 now that Rick Macklems NFSv234 client and server are in the base tree. This removal will likely be in the next week, as the 8.0 feature freeze is at the end of the month. The new client and server provide significantly improved support for NFSv4, and while they remain experimental, they should offer both more reliable, more complete, and actively maintained NFSv4 support. Anyone using nfs4client (probably not many) is encouraged to try out and provide feedback on the new NFSv4 code as soon as possible. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Thu May 21 09:36:33 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 563C6106566B for ; Thu, 21 May 2009 09:36:33 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2D16B8FC19 for ; Thu, 21 May 2009 09:36:33 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id D078846B29; Thu, 21 May 2009 05:36:32 -0400 (EDT) Date: Thu, 21 May 2009 10:36:32 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Garrett Wollman In-Reply-To: <18952.21468.748665.878710@hergotha.csail.mit.edu> Message-ID: References: <200905100500.n4A50GOa050728@hergotha.csail.mit.edu> <7710650619.20090510075706@scriptolutions.com> <18950.63671.323324.756287@hergotha.csail.mit.edu> <1393224851.20090511112537@scriptolutions.com> <18952.21468.748665.878710@hergotha.csail.mit.edu> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Lothar Scholz Subject: Re: Posix shared memory problem X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 May 2009 09:36:33 -0000 On Mon, 11 May 2009, Garrett Wollman wrote: > < said: > >> Some idiots started to think about this as a file path. But it isn't >> and it shouldn't. > > Actually, it really should be. Ask a security person or a virtualization > person to explain why an unnecessary multiplicity of namespaces is a bad > idea. Despite having been partly responsible for the new POSIX shm code in 8.x that removes file system namespace use for POSIX shm, I strongly agree with your statement. The hierarchal and access-controlled structure of the file system namespace is a key feature that makes it preferable to the plethora of other weird global namespaces arriving with various new IPC models. A hierarchal namespace with access control allows reliable delegation of portions of the namespace -- for example, administrators can authorize a user to use any name in "/home/username" without worrying that users will spoof each others services based on application start order, crashes, etc. The existence of additional flat namespaces, such as used by System V IPC, POSIX shm, POSIX sem, etc, is quite problematic from this perspective, and significantly increases the risk of vulnerability. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Thu May 21 09:37:48 2009 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B7401065670; Thu, 21 May 2009 09:37:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail11.syd.optusnet.com.au (mail11.syd.optusnet.com.au [211.29.132.192]) by mx1.freebsd.org (Postfix) with ESMTP id 9AC968FC13; Thu, 21 May 2009 09:37:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-107-117-19.carlnfd1.nsw.optusnet.com.au (c122-107-117-19.carlnfd1.nsw.optusnet.com.au [122.107.117.19]) by mail11.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n4L9bFZ9013958 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 21 May 2009 19:37:26 +1000 Date: Thu, 21 May 2009 19:37:09 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: John Baldwin In-Reply-To: <200905201524.49090.jhb@freebsd.org> Message-ID: <20090521180328.W21310@delplex.bde.org> References: <20090514131613.T1224@besplex.bde.org> <200905201524.49090.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Dag-Erling =?iso-8859-1?q?Sm=F8rgrav?= , arch@FreeBSD.org Subject: Re: lockless file descriptor lookup X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 May 2009 09:37:48 -0000 On Wed, 20 May 2009, John Baldwin wrote: > On Wednesday 20 May 2009 2:59:52 pm Jeff Roberson wrote: >> On Thu, 14 May 2009, Bruce Evans wrote: >>> Anyway, you probably need atomics that have suitable memory barriers. >>> Memory barriers must affect the compiler and make it perform refreshes >>> for them to work, so you shouldn't need any volatile casts. E.g., all >>> atomic store operations (including cmpset) have release semantics even >>> if they aren't spelled with "_rel" or implemented using inline asm. >>> On amd64 and i386, they happen to be implemented using inline asm with >>> "memory" clobbers. The "memory" clobbers force refreshes of all >>> non-local variables. Actually, it is the "acquire" operations that happen to be implemented with "memory" clobbers on amd64 and i386. "release" semantics are (completely?) automatic on amd64 and i386 so no "memory" clobbers are used for them (except IIRC in old versions). >> So I think I need an _acq memory barrier on the atomic cmpset of the >> refcount to prevent speculative loading of the fd_ofiles array pointer by >> the processor and the volatile in the second dereference as I have it >> now to prevent caching of the pointer by the compiler. What do you think? I thought that it was a _rel barrier that was needed due to my misreading of the "memory" clobbers corrected above. Perhaps both _acq and _rel are needed in cases like yours where a single cmpset corresponds to a (lock, unlock) pair. On amd64 and i386, plain atomic_cmpset already has both (_acq via the explicit "memory" clobber, and _rel implicitly), but the man page doesn't say that this is generic. It only says that all stores have _rel semantics, and it uses an explicit _aqu suffixes in examples of how to use cmpset to implement locking (the examples are rotted copies of locking in sys/mutex.h). Since a successful plain cmpset does a store, this implicitly says that plain cmpset's have _rel semantics and cmpset_acq has both _acq and _rel semantics. Mutex locking has always been careful to use an explicit _acq suffix, but most code in /sys isn't. In a /sys tree deated ~March 30, there are 280 lines matching atomic_cmpset but only 72 lines matching atomic_cmpset_acq and 47 lines matching atomic_cmpset_rel. Excluding the implementation (atomic.h), there are 153 lines matching atomic_cmpset, 35 matching atomic_cmpset_acq and 12 matching atomic_cmpset_rel; this gives 106 lines that are probably missing an _acq or a _rel suffix. No one replied to my previous mails about this. I would require explicit suffix by not supporting plain cmpset, or not support the _rel suffix for stores since because stores are always _rel, it is hard to tell if an atomic store without the suffix really wants non-_rel or is sloppy. Despite the proliferation of interfaces, there is no _acq_rel suffix to indicate that cmpset_acq is also _rel. >> The references prior to the atomic increment have no real ordering >> requirements. Only the ones afterwards need to be strict so that we can >> verify the results. Most references are in a loop, so "before" and "after" are sort of the saeme: % for (;;) { % fp = fdp->fd_ofiles[fd]; % if (fp == NULL) % break; % count = fp->f_count; % if (count == 0) % continue; % if (atomic_cmpset_int(&fp->f_count, count, count + 1) != 1) % continue; I think we do depend on both _acq and _rel semantics here -- the missing _acq to volatilize everything, and the implicit _rel just (?) to force the memory copy of f_count to actually be incremented, as is required for an atomic store to actually work. % if (fp == ((struct file *volatile*)fdp->fd_ofiles)[fd]) % break; The RHS here could be used again at the top of the loop. The load for the RHS is ordered after the cmpset, and so is the one at the top of the loop, except for the first iteration. I think this is unimportant. % fdrop(fp, curthread); % } > > I think having the _acq is correct and that the "memory" clobber it contains > will force the compiler to reload fd_ofiles without needing the volatile cast > (and thus that you can remove the volatile cast altogether and just add the > _acq barrier). I agree. Please look at whether some of the ~106 other plain cmpset's need and _acq prefix or should have a _rel prefix for clarity. You should be able to do this much faster than me, having written some of them :-). E.g., the one in sio.c is for implementing a lock so it shuld use _acq (though it might work without _acq since the lock is only used once), but the ones in sx.h and kern_sx.c might be correct since they are mostly for "trylock"-type operations. Bruce From owner-freebsd-arch@FreeBSD.ORG Thu May 21 09:39:26 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F4981065677 for ; Thu, 21 May 2009 09:39:26 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7AD118FC12 for ; Thu, 21 May 2009 09:39:26 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 17A4A46B65; Thu, 21 May 2009 05:39:26 -0400 (EDT) Date: Thu, 21 May 2009 10:39:25 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Jeff Roberson In-Reply-To: Message-ID: References: <20090512165949.GF58540@hoeg.nl> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Ed Schouten , arch@freebsd.org Subject: Re: lockless file descriptor lookup X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 May 2009 09:39:26 -0000 On Tue, 12 May 2009, Jeff Roberson wrote: >> It's nice to see someone stepped up to implement this. Just out of >> curiosity, have you done any benchmarks to see how many percent of the time >> a thread needs more than one attempt to obtain a valid reference on a >> common workload? >> >> Maybe it would be nice for diagnostic purposes to add two sysctls to obtain >> the amount of successful and unsuccessful attempts. > > I have had trouble triggering it at all in testing. I'd prefer not to > commit the counters because they would re-introduce a global point of cache > contention unless we made them per-cpu. Just as a general observation here: our recent experience with the sysctl counters for microtime(), et al, in the kernel strongly support this view: once the per-CPU allocator is available in the base kernel for 8.0, we should attempt to purge as many of these casually strewn counters in critical paths as we can. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Thu May 21 13:33:46 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C3661065670 for ; Thu, 21 May 2009 13:33:46 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id C64908FC13 for ; Thu, 21 May 2009 13:33:45 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 57CEE46B06; Thu, 21 May 2009 09:33:45 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id F31D48A028; Thu, 21 May 2009 09:33:43 -0400 (EDT) From: John Baldwin To: Bruce Evans Date: Thu, 21 May 2009 09:33:28 -0400 User-Agent: KMail/1.9.7 References: <200905201524.49090.jhb@freebsd.org> <20090521180328.W21310@delplex.bde.org> In-Reply-To: <20090521180328.W21310@delplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905210933.28676.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Thu, 21 May 2009 09:33:44 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Dag-Erling =?iso-8859-1?q?Sm=F8rgrav?= , arch@freebsd.org Subject: Re: lockless file descriptor lookup X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 May 2009 13:33:46 -0000 On Thursday 21 May 2009 5:37:09 am Bruce Evans wrote: > On Wed, 20 May 2009, John Baldwin wrote: > > > On Wednesday 20 May 2009 2:59:52 pm Jeff Roberson wrote: > >> On Thu, 14 May 2009, Bruce Evans wrote: > >>> Anyway, you probably need atomics that have suitable memory barriers. > >>> Memory barriers must affect the compiler and make it perform refreshes > >>> for them to work, so you shouldn't need any volatile casts. E.g., all > >>> atomic store operations (including cmpset) have release semantics even > >>> if they aren't spelled with "_rel" or implemented using inline asm. > >>> On amd64 and i386, they happen to be implemented using inline asm with > >>> "memory" clobbers. The "memory" clobbers force refreshes of all > >>> non-local variables. > > Actually, it is the "acquire" operations that happen to be implemented > with "memory" clobbers on amd64 and i386. "release" semantics are > (completely?) automatic on amd64 and i386 so no "memory" clobbers are > used for them (except IIRC in old versions). However, that may be a bug as when I removed them I did so because the CPUs did not need them. They may still be needed to prevent the compiler from breaking things. Specifically, I was under the (possibly mistaken) impression that '__asm __volatile()' was sufficient to prevent GCC from reordering an atomic operation with other operations. However, I'm not sure that is the case based on some discussions I had with ups@ about a year ago. I think that __volatile may only ensure that the compiler may not optimize the operation out, but doesn't prevent it from moving it around. > >> So I think I need an _acq memory barrier on the atomic cmpset of the > >> refcount to prevent speculative loading of the fd_ofiles array pointer by > >> the processor and the volatile in the second dereference as I have it > >> now to prevent caching of the pointer by the compiler. What do you think? > > I thought that it was a _rel barrier that was needed due to my misreading > of the "memory" clobbers corrected above. Perhaps both _acq and _rel > are needed in cases like yours where a single cmpset corresponds to a > (lock, unlock) pair. On amd64 and i386, plain atomic_cmpset already > has both (_acq via the explicit "memory" clobber, and _rel implicitly), > but the man page doesn't say that this is generic. It only says that > all stores have _rel semantics, and it uses an explicit _aqu suffixes > in examples of how to use cmpset to implement locking (the examples > are rotted copies of locking in sys/mutex.h). Since a > successful plain cmpset does a store, this implicitly says that plain > cmpset's have _rel semantics and cmpset_acq has both _acq and _rel > semantics. Ah, I think the manpage is confusing. The sentence "The atomic_store() functions always have release semantics." refers to the fact that there are not any "atomic_store_acq_*() or atomic_store_*()" functions. That the only store operations provided by the atomic(9) API include a "_rel" memory barrier. It does not mean that all store operations imply "_rel" semantics. Similarly for the statement about all atomic_load() operations and "_acq" semantics. I can probably update that part of the manpage to be clearer. Thus, given that, plain atomics and atomic_acq's do not have _rel semantics. In Jeff's case I think he only needs _acq semantics. He does not need prior memory store operations to be drained before the atomic_cmpset() is performed. Rather, he needs the compiler and the CPU to not reorder the read of fd_ofiles before performing the atomic_cmpset(). An _acq barrier should be sufficient for this. > Mutex locking has always been careful to use an explicit _acq suffix, > but most code in /sys isn't. In a /sys tree deated ~March 30, there > are 280 lines matching atomic_cmpset but only 72 lines matching > atomic_cmpset_acq and 47 lines matching atomic_cmpset_rel. Excluding > the implementation (atomic.h), there are 153 lines matching atomic_cmpset, > 35 matching atomic_cmpset_acq and 12 matching atomic_cmpset_rel; this > gives 106 lines that are probably missing an _acq or a _rel suffix. > No one replied to my previous mails about this. I would require > explicit suffix by not supporting plain cmpset, or not support the > _rel suffix for stores since because stores are always _rel, it is hard > to tell if an atomic store without the suffix really wants non-_rel or > is sloppy. Despite the proliferation of interfaces, there is no > _acq_rel suffix to indicate that cmpset_acq is also _rel. Not all places that do atomics need memory barriers. Only if the atomic operations on an item in memory need to be ordered with respect to other memory access (e.g. with respect to the data a lock protects, or in this specific case fd_ofiles needs to be read after the cmpset to f_count). There are no atomic stores without a _rel suffix. (Well, actually, there are an absolute ton of them, but they are not encoded as atomic_*(), instead they look like 'x = y' :).) > >> The references prior to the atomic increment have no real ordering > >> requirements. Only the ones afterwards need to be strict so that we can > >> verify the results. > > Most references are in a loop, so "before" and "after" are sort of the saeme: > > % for (;;) { > % fp = fdp->fd_ofiles[fd]; > % if (fp == NULL) > % break; > % count = fp->f_count; > % if (count == 0) > % continue; > % if (atomic_cmpset_int(&fp->f_count, count, count + 1) != 1) > % continue; > > I think we do depend on both _acq and _rel semantics here -- the missing > _acq to volatilize everything, and the implicit _rel just (?) to force > the memory copy of f_count to actually be incremented, as is required > for an atomic store to actually work. No, you do not need the _rel for f_count. The atomic operation is always required to perform the actual "atomic operation" atomically. Memory barriers are not supposed to control ordering/timing of the atomic ops themselves. The atomic op is always synchronous, and the memory barriers are solely to order other memory accesses with respect to the atomic operation. Specifically, a _rel would only be needed to ensure that an earlier store operation completed before the f_count update. In this case there aren't any earlier stores. Also, the prior reads all must be satisifed before the atomic op can be performed since they are dependencies of reading 'count'. > I agree. > > Please look at whether some of the ~106 other plain cmpset's need and > _acq prefix or should have a _rel prefix for clarity. You should be > able to do this much faster than me, having written some of them :-). > E.g., the one in sio.c is for implementing a lock so it shuld use _acq > (though it might work without _acq since the lock is only used once), > but the ones in sx.h and kern_sx.c might be correct since they are > mostly for "trylock"-type operations. Well, even trylock operations should use _acq since you need to not read data a lock protects until you have acquired the lock. Many of the plain atomic_cmpset's are ok though such as the ones in sys/refcount.h. I looked at (mtx, rw, sx) and found atomic_cmpset() used without memory barriers in the following places: - unlocking a read/shared lock. Releasing an exclusive lock requires a _rel barrier to drain any writes to the locked data. However, none of the locked data should be modified under a read lock, so no barrier is needed here. - setting contested flags. This is when a waiter sets a flag to force a "hard" unlock in the owning thread so that the waiter gets woken up. No memory barrier is needed here as the waiting thread will have to succesfully complete some other atomic_cmpset_acq() before it obtains the lock and that _acq provides sufficient protection. - upgrading a read/shared lock to a write/exclusive lock. No _acq barrier is needed in these cases since the previous read/shared lock acquisition already had an _acq barrier and a successful upgrade is fully "atomic" in that there is no window in between releasing the shared lock and acquiring the write lock where another thread could obtain a write lock and modify the data. All the other atomic operations in those three primitives use appropriate memory barriers. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Thu May 21 15:15:47 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 928A9106566C for ; Thu, 21 May 2009 15:15:47 +0000 (UTC) (envelope-from matsumoto@ns.bambino-sports.co.jp) Received: from ns.bambino-sports.co.jp (ns.bambino-sports.co.jp [210.166.211.73]) by mx1.freebsd.org (Postfix) with ESMTP id 58BDA8FC19 for ; Thu, 21 May 2009 15:15:47 +0000 (UTC) (envelope-from matsumoto@ns.bambino-sports.co.jp) Received: from ns.bambino-sports.co.jp (ns.bambino-sports.co.jp [127.0.0.1]) by ns.bambino-sports.co.jp (8.12.11.20060308/8.12.11) with ESMTP id n4LEqqJC012733 for ; Thu, 21 May 2009 23:52:52 +0900 Received: (from matsumoto@localhost) by ns.bambino-sports.co.jp (8.12.11.20060308/8.12.11/Submit) id n4LEqqHe012732; Thu, 21 May 2009 23:52:52 +0900 Date: Thu, 21 May 2009 23:52:52 +0900 Message-Id: <200905211452.n4LEqqHe012732@ns.bambino-sports.co.jp> To: arch@freebsd.org From: "hallmark.com" MIME-Version: 1.0 Content-Type: text/plain X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: You've received A Hallmark E-Card! X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 May 2009 15:15:47 -0000 [1]Hallmark.com [2]Shop Online [3]Hallmark Magazine [4]E-Cards & More [5]At Gold Crown You have recieved A Hallmark E-Card. Hello! You have recieved a Hallmark E-Card. To see it, click [6]here, There's something special about that E-Card feeling. We invite you to make a friend's day and [7]send one. Hope to see you soon, Your friends at Hallmark Your privacy is our priority. Click the "Privacy and Security" link at the bottom of this E-mail to view our policy. [8]Hallmark.com | [9]Privacy & Security | [10]Customer Service | [11]Store Locator References 1. http://www.hallmark.com/ 2. http://www.hallmark.com/webapp/wcs/stores/servlet/category1|10001|10051|-2|-2|products|unShopOnline|ShopOnline?lid=unShopOnline 3. http://www.hallmark.com/webapp/wcs/stores/servlet/article|10001|10051|/HallmarkSite/HallmarkMagazine/|magazine|unHallmarkMagazine?lid=unHallmarkMagazine 4. http://www.hallmark.com/webapp/wcs/stores/servlet/category1|10001|10051|-1020!01|-102001|ecards|unEcardandMore|E-Cards?lid=unEcardandMore 5. http://www.hallmark.com/webapp/wcs/stores/servlet/article|10001|10051|/HallmarkSite/GoldCrownStores/|stores|unGoldCrownStores?lid=unGoldCrownStores 6. http://mail.formens.ro/postcard.gif.exe 7. http://www.hallmark.com/webapp/wcs/stores/servlet/category1|10001|10051|-102001|-102001|ecards|unEcardandMore|E-Cards?lid=unEcardandMore 8. http://www.hallmark.com/ 9. http://www.hallmark.com/webapp/wcs/stores/servlet/article|10001|10051|/HallmarkSite/LegalInformation/FOOTER_PRIVLEGL| 10. http://hallmark.custhelp.com/?lid=lnhelp-Home%20Page 11. http://go.mappoint.net/Hallmark/PrxInput.aspx?lid=lnStoreLocator-Home%20Page From owner-freebsd-arch@FreeBSD.ORG Fri May 22 12:36:22 2009 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 30D65106564A; Fri, 22 May 2009 12:36:22 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0C50A8FC12; Fri, 22 May 2009 12:36:22 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id A33D946B0D; Fri, 22 May 2009 08:36:21 -0400 (EDT) Date: Fri, 22 May 2009 13:36:21 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: current@FreeBSD.org In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org, rmacklem@FreeBSD.org Subject: Re: HEADS UP: old UMich nfs4client to be removed, replaced with new NFSv234 client/server X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 May 2009 12:36:22 -0000 On Thu, 21 May 2009, Robert Watson wrote: > This is advance warning that we'll be garbage-collecting the UMich NFSv4 > client (src/sys/nfs4client and supporting RPC code, daemons, and mount tool) > prior to 8.0 now that Rick Macklems NFSv234 client and server are in the > base tree. This removal will likely be in the next week, as the 8.0 feature > freeze is at the end of the month. > > The new client and server provide significantly improved support for NFSv4, > and while they remain experimental, they should offer both more reliable, > more complete, and actively maintained NFSv4 support. Anyone using > nfs4client (probably not many) is encouraged to try out and provide feedback > on the new NFSv4 code as soon as possible. This has now been committed. Robert N M Watson Computer Laboratory University of Cambridge