From owner-freebsd-arch@FreeBSD.ORG Sun Oct 19 03:37:59 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 79196CA6; Sun, 19 Oct 2014 03:37:59 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [IPv6:2001:470:1f05:b76::196]) by mx1.freebsd.org (Postfix) with ESMTP id 53EC5F2C; Sun, 19 Oct 2014 03:37:59 +0000 (UTC) Received: from AlfredMacbookAir.local (162-222-103-181.webpass.net [162.222.103.181]) by elvis.mu.org (Postfix) with ESMTPSA id 6FFA7341F830; Sat, 18 Oct 2014 20:37:53 -0700 (PDT) Message-ID: <54433211.6000303@freebsd.org> Date: Sat, 18 Oct 2014 20:37:53 -0700 From: Alfred Perlstein Organization: FreeBSD User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: "Simon J. Gerraty" Subject: Re: XML Output: libxo - provide single API to output TXT, XML, JSON and HTML References: <201408141640.s7EGe422096656@idle.juniper.net> <53ED57F2.5020808@mu.org> <20140815053604.9E40B580A2@chaos.jnpr.net> <53EDB0EF.6090902@mu.org> <20140815173830.93832580A2@chaos.jnpr.net> <53EEA74B.9070107@mu.org> <20140816045254.5F47E580A2@chaos.jnpr.net> In-Reply-To: <20140816045254.5F47E580A2@chaos.jnpr.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcel Moolenaar , Phil Shafer , John-Mark Gurney , arch@freebsd.org, Poul-Henning Kamp , freebsd-arch , Konstantin Belousov X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Oct 2014 03:37:59 -0000 Guys, It's now been 2 months since the last discussion about this. Where is the code from Juniper? Is there a project branch? I am thinking of just rolling forward the GSoC project and committing it. Let me know if that sounds alright. -Alfred On 8/15/14, 9:52 PM, Simon J. Gerraty wrote: > On Fri, 15 Aug 2014 17:35:23 -0700, Alfred Perlstein writes: >>>> How many programs have been successfully converted over to libxo at this >>>> point? >>> How is that relevant to any of this discussion? >>> >> Well, it speaks towards the vision of getting this done in a timely >> manner. As I said there is a GSOC project that has a ton of code >> already done. > Yes but as previously pointed out, the approach taken is far from ideal, > [we previously rejected the idea of trying to contribute that approach] > I think libxo will provide a much better result. > >> If this libxo is ready to go in, it should go in and we > No objection here. > There are a small number of apps that we particularly want converted, > which we would propose as examples. > > The purpose of this thread was to illicit feedback on the idea and guage > acceptance of the proposed API - which you have to admit isn't as cosy > and comforting as printf, but is pretty palatable considering the > functionality provided. > > On that front I think we are looking good. > There has been very useful discussion on a number of points. > I don't think I have spotted any fundamental objection to the idea. > > It is probably easier for Phil to commit to our internal mirror. > We can take the next steps from there. > >> should get towards converting more utils to using it. However if we are >> going to perpetually add frameworky things, but not convert over >> userland tools to the actual framework, then that is a potential problem >> worth calling out. > Indeed. Again that's why I prefer to see this (the library at least) > done by someone who's been doing this sort of thing successfuly for > ages. > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Sun Oct 19 09:24:42 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 74BFCF; Sun, 19 Oct 2014 09:24:42 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bon0130.outbound.protection.outlook.com [157.56.111.130]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4E31EE4A; Sun, 19 Oct 2014 09:24:38 +0000 (UTC) Received: from CO2PR05CA028.namprd05.prod.outlook.com (10.141.241.156) by BL2PR05MB114.namprd05.prod.outlook.com (10.255.232.24) with Microsoft SMTP Server (TLS) id 15.0.1054.13; Sun, 19 Oct 2014 03:53:19 +0000 Received: from BL2FFO11FD008.protection.gbl (2a01:111:f400:7c09::175) by CO2PR05CA028.outlook.office365.com (2a01:111:e400:1429::28) with Microsoft SMTP Server (TLS) id 15.0.1054.13 via Frontend Transport; Sun, 19 Oct 2014 03:53:18 +0000 Received: from P-EMF01-SAC.jnpr.net (66.129.239.15) by BL2FFO11FD008.mail.protection.outlook.com (10.173.161.4) with Microsoft SMTP Server (TLS) id 15.0.1049.20 via Frontend Transport; Sun, 19 Oct 2014 03:53:17 +0000 Received: from magenta.juniper.net (172.17.27.123) by P-EMF01-SAC.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.146.0; Sat, 18 Oct 2014 20:53:16 -0700 Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id s9J3rFR88619; Sat, 18 Oct 2014 20:53:15 -0700 (PDT) (envelope-from sjg@juniper.net) Received: from chaos (localhost [127.0.0.1]) by chaos.jnpr.net (Postfix) with ESMTP id C9B11580A3; Sat, 18 Oct 2014 20:53:14 -0700 (PDT) To: Alfred Perlstein Subject: Re: XML Output: libxo - provide single API to output TXT, XML, JSON and HTML In-Reply-To: <54433211.6000303@freebsd.org> References: <201408141640.s7EGe422096656@idle.juniper.net> <53ED57F2.5020808@mu.org> <20140815053604.9E40B580A2@chaos.jnpr.net> <53EDB0EF.6090902@mu.org> <20140815173830.93832580A2@chaos.jnpr.net> <53EEA74B.9070107@mu.org> <20140816045254.5F47E580A2@chaos.jnpr.net> <54433211.6000303@freebsd.org> Comments: In-reply-to: Alfred Perlstein message dated "Sat, 18 Oct 2014 20:37:53 -0700." From: "Simon J. Gerraty" X-Mailer: MH-E 8.0.3; nmh 1.3; GNU Emacs 22.3.1 Date: Sat, 18 Oct 2014 20:53:14 -0700 Message-ID: <9286.1413690794@chaos> MIME-Version: 1.0 Content-Type: text/plain X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:66.129.239.15; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(189002)(199003)(24454002)(50466002)(48376002)(102836001)(33716001)(57986006)(93886004)(85306004)(64706001)(47776003)(20776003)(77156001)(80022003)(46102003)(15975445006)(84676001)(50226001)(4396001)(97736003)(81156004)(15202345003)(21056001)(85852003)(106466001)(87936001)(87286001)(19580405001)(19580395003)(76176999)(69596002)(44976005)(107046002)(50986999)(76482002)(88136002)(6806004)(89996001)(76506005)(31966008)(110136001)(62966002)(104166001)(105596002)(92726001)(93916002)(95666004)(117636001)(86362001)(92566001)(120916001)(99396003)(42262002)(62816006); DIR:OUT; SFP:1102; SCL:1; SRVR:BL2PR05MB114; H:P-EMF01-SAC.jnpr.net; FPR:; MLV:sfv; PTR:InfoDomainNonexistent; MX:1; A:1; LANG:en; X-Microsoft-Antispam: UriScan:; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:BL2PR05MB114; X-Exchange-Antispam-Report-Test: UriScan:; X-Forefront-PRVS: 0369E8196C Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.15 as permitted sender) Authentication-Results: spf=softfail (sender IP is 66.129.239.15) smtp.mailfrom=sjg@juniper.net; X-OriginatorOrg: juniper.net Cc: Marcel Moolenaar , Phil Shafer , John-Mark Gurney , arch@freebsd.org, Poul-Henning Kamp , freebsd-arch , Konstantin Belousov X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Oct 2014 09:24:42 -0000 Alfred Perlstein wrote: > Where is the code from Juniper? Its currently in github, hang on see if I can find url. See http://juniper.github.io/libxo/libxo-manual.html Marcel is planing to pull it into contrib/ - was trying to ping the GSoC folk last week. > I am thinking of just rolling forward the GSoC project and committing > it. Let me know if that sounds alright. Of course it doesn't. From owner-freebsd-arch@FreeBSD.ORG Sun Oct 19 09:40:57 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3BCE6311; Sun, 19 Oct 2014 09:40:57 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [IPv6:2001:470:1f05:b76::196]) by mx1.freebsd.org (Postfix) with ESMTP id 226CFF5C; Sun, 19 Oct 2014 09:40:57 +0000 (UTC) Received: from [10.10.32.141] (162-222-103-181.webpass.net [162.222.103.181]) by elvis.mu.org (Postfix) with ESMTPSA id 60E0F341F830; Sun, 19 Oct 2014 02:40:56 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: XML Output: libxo - provide single API to output TXT, XML, JSON and HTML From: Alfred Perlstein X-Mailer: iPhone Mail (12A405) In-Reply-To: <9286.1413690794@chaos> Date: Sun, 19 Oct 2014 02:40:55 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <201408141640.s7EGe422096656@idle.juniper.net> <53ED57F2.5020808@mu.org> <20140815053604.9E40B580A2@chaos.jnpr.net> <53EDB0EF.6090902@mu.org> <20140815173830.93832580A2@chaos.jnpr.net> <53EEA74B.9070107@mu.org> <20140816045254.5F47E580A2@chaos.jnpr.net> <54433211.6000303@freebsd.org> <9286.1413690794@chaos> To: "Simon J. Gerraty" Cc: Marcel Moolenaar , Phil Shafer , John-Mark Gurney , Alfred Perlstein , "arch@freebsd.org" , Poul-Henning Kamp , freebsd-arch , Konstantin Belousov X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Oct 2014 09:40:57 -0000 > On Oct 18, 2014, at 8:53 PM, Simon J. Gerraty wrote: >=20 > Alfred Perlstein wrote: >> Where is the code from Juniper? >=20 > Its currently in github, hang on see if I can find url. > See http://juniper.github.io/libxo/libxo-manual.html >=20 > Marcel is planing to pull it into contrib/ - was trying to ping the GSoC > folk last week. No need to ping gsoc folk if libxo is a real thing. Just get it under a proj= ect branch.=20 >=20 >> I am thinking of just rolling forward the GSoC project and committing >> it. Let me know if that sounds alright. >=20 > Of course it doesn't. Neither does no indication of forward progress. Glad things are on the move a= gain.=20 Please commit the project branch now.=20 -alfred From owner-freebsd-arch@FreeBSD.ORG Sun Oct 19 17:13:11 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 625A0753 for ; Sun, 19 Oct 2014 17:13:11 +0000 (UTC) Received: from mail-wg0-x233.google.com (mail-wg0-x233.google.com [IPv6:2a00:1450:400c:c00::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id F2C5CE19 for ; Sun, 19 Oct 2014 17:13:10 +0000 (UTC) Received: by mail-wg0-f51.google.com with SMTP id b13so3925600wgh.22 for ; Sun, 19 Oct 2014 10:13:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=G5X8iGV0G5CfSN6oJqwi5XYRUfviHeyE6tV10qXiP+o=; b=DW6EweR3f+zwWf0skfQFnngZBqF3miFRwhd5vqQsyai+aEMeF5//abTHkIzbNAVD3L Jpq++OAAI0hmTnBc6Bb2u5s6Nzdp8v3hneaJiAwUhLNnHyrn6z7fC5NigP6g+wPdxEJa PLHZ5g65k/gzW1y7oApSETKlXzwDojdPCiBqit0TInACqrbqEm/U99lzNP8i+V3QSLhi er7rWtLZgd9INzKx/kKUtmMYtpeiyIPmdPSuzWEP/lKftvdac/JV2RlVkknqXOaQ2aMA dFjyo8rsoOMmq9+BbDgsB/PEVt6hk42X0+i5AG5m//UFmzOcLpYK39eC//NNzx1qrUgb S/Mg== MIME-Version: 1.0 X-Received: by 10.180.77.229 with SMTP id v5mr13677686wiw.59.1413738789261; Sun, 19 Oct 2014 10:13:09 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.216.106.136 with HTTP; Sun, 19 Oct 2014 10:13:09 -0700 (PDT) Date: Sun, 19 Oct 2014 10:13:09 -0700 X-Google-Sender-Auth: 8EimTY67j6F8z8Hx4f4zVaybNSo Message-ID: Subject: note: cpu ID type is now 'int', please help me find any last vestiges of the old ways From: Adrian Chadd To: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Oct 2014 17:13:11 -0000 Hi, I've just smacked the last obvious places where the CPU ID type was smaller than an int (signed) and NOCPU is now -1. We don't have any obvious hardware support for anything with > 254 logical CPUs just at the moment but I'm sure that during the lifecycle of 11 we're going to start seeing devices with > 254 logical CPUs. I'd like to try and find / squish any remaining code in /usr/src that uses a different type for CPU identification. I'd really appreciate any help I can get in updating things sooner rather than later. If there are any other ABI related changes I'd like to get them into -HEAD far before the release point so things can shake out. So, this is a call for help. I'm doing this (mostly) in my spare time with donated hardware, so I'm thankful for any other help people can provide before 11-REL is cut. Thanks! -adrian From owner-freebsd-arch@FreeBSD.ORG Tue Oct 21 02:51:29 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 34456D6E; Tue, 21 Oct 2014 02:51:29 +0000 (UTC) Received: from na01-bl2-obe.outbound.protection.outlook.com (mail-bl2on0122.outbound.protection.outlook.com [65.55.169.122]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 60E19FFD; Tue, 21 Oct 2014 02:51:27 +0000 (UTC) Received: from BY2PR05CA030.namprd05.prod.outlook.com (10.141.250.20) by BL2PR05MB115.namprd05.prod.outlook.com (10.255.232.25) with Microsoft SMTP Server (TLS) id 15.0.1054.13; Tue, 21 Oct 2014 02:17:39 +0000 Received: from BN1BFFO11FD026.protection.gbl (2a01:111:f400:7c10::1:120) by BY2PR05CA030.outlook.office365.com (2a01:111:e400:2c5f::20) with Microsoft SMTP Server (TLS) id 15.0.1054.13 via Frontend Transport; Tue, 21 Oct 2014 02:17:39 +0000 Received: from P-EMF01-SAC.jnpr.net (66.129.239.15) by BN1BFFO11FD026.mail.protection.outlook.com (10.58.144.89) with Microsoft SMTP Server (TLS) id 15.0.1049.20 via Frontend Transport; Tue, 21 Oct 2014 02:17:38 +0000 Received: from magenta.juniper.net (172.17.27.123) by P-EMF01-SAC.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.146.0; Mon, 20 Oct 2014 19:17:37 -0700 Received: from idle.juniper.net (idleski.juniper.net [172.25.4.26]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id s9L2HYR74955; Mon, 20 Oct 2014 19:17:34 -0700 (PDT) (envelope-from phil@juniper.net) Received: from idle.juniper.net (localhost [127.0.0.1]) by idle.juniper.net (8.14.4/8.14.3) with ESMTP id s9L2HECn097421; Mon, 20 Oct 2014 22:17:15 -0400 (EDT) (envelope-from phil@idle.juniper.net) Message-ID: <201410210217.s9L2HECn097421@idle.juniper.net> To: Alfred Perlstein Subject: Re: XML Output: libxo - provide single API to output TXT, XML, JSON and HTML In-Reply-To: <54433211.6000303@freebsd.org> Date: Mon, 20 Oct 2014 22:17:14 -0400 From: Phil Shafer MIME-Version: 1.0 Content-Type: text/plain X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:66.129.239.15; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(199003)(189002)(164054003)(87936001)(54356999)(50986999)(6806004)(68736004)(19580395003)(92726001)(47776003)(15202345003)(85306004)(20776003)(21056001)(46102003)(44976005)(80022003)(102836001)(92566001)(4396001)(103666002)(85852003)(31966008)(53416004)(50466002)(105596002)(48376002)(81156004)(99396003)(86362001)(76506005)(110136001)(15975445006)(106466001)(95666004)(120916001)(76482002)(107046002); DIR:OUT; SFP:1102; SCL:1; SRVR:BL2PR05MB115; H:P-EMF01-SAC.jnpr.net; FPR:; MLV:sfv; PTR:InfoDomainNonexistent; A:1; MX:1; LANG:en; X-Microsoft-Antispam: UriScan:; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:BL2PR05MB115; X-Exchange-Antispam-Report-Test: UriScan:; X-Forefront-PRVS: 0371762FE7 Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.15 as permitted sender) Authentication-Results: spf=softfail (sender IP is 66.129.239.15) smtp.mailfrom=phil@juniper.net; X-OriginatorOrg: juniper.net Cc: Marcel Moolenaar , John-Mark Gurney , "Simon J. Gerraty" , arch@freebsd.org, Poul-Henning Kamp , freebsd-arch , Konstantin Belousov X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2014 02:51:29 -0000 Alfred Perlstein writes: >It's now been 2 months since the last discussion about this. >Where is the code from Juniper? Code's still in github, if you want to take a look. It's been pretty much done since the end of August: https://github.com/Juniper/libxo/graphs/contributors I've got patches to seven bsd utilities here: https://github.com/Juniper/libxo/tree/master/patches Docs are still here: http://juniper.github.io/libxo/libxo-manual.html but I've also written man pages as well (back in Sept). At this point, I think it's completed. I was tempted on the plane ride home last Thursday to add a flag for calling humanize_number(), but I resisted. Marcel's working to get it into freebsd. Thanks, Phil From owner-freebsd-arch@FreeBSD.ORG Tue Oct 21 03:57:45 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 938AE552 for ; Tue, 21 Oct 2014 03:57:45 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 80ED5905 for ; Tue, 21 Oct 2014 03:57:45 +0000 (UTC) Received: from Alfreds-MacBook-Pro.local (64-60-248-106.static-ip.telepacific.net [64.60.248.106]) by elvis.mu.org (Postfix) with ESMTPSA id A4CEC341F83E for ; Mon, 20 Oct 2014 20:57:39 -0700 (PDT) Message-ID: <5445D9B3.4000303@mu.org> Date: Mon, 20 Oct 2014 20:57:39 -0700 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: freebsd-arch@freebsd.org Subject: Re: XML Output: libxo - provide single API to output TXT, XML, JSON and HTML References: <201410210217.s9L2HECn097421@idle.juniper.net> In-Reply-To: <201410210217.s9L2HECn097421@idle.juniper.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2014 03:57:45 -0000 Thanks Phil! Any chance the patches could be applied against a forked FreeBSD tree from http://github.com/freebsd/freebsd and committed there? Or perhaps a phabricator request? -Alfred On 10/20/14 7:17 PM, Phil Shafer wrote: > Alfred Perlstein writes: >> It's now been 2 months since the last discussion about this. >> Where is the code from Juniper? > Code's still in github, if you want to take a look. It's been > pretty much done since the end of August: > > https://github.com/Juniper/libxo/graphs/contributors > > I've got patches to seven bsd utilities here: > > https://github.com/Juniper/libxo/tree/master/patches > > Docs are still here: > > http://juniper.github.io/libxo/libxo-manual.html > > but I've also written man pages as well (back in Sept). At this > point, I think it's completed. I was tempted on the plane ride > home last Thursday to add a flag for calling humanize_number(), but > I resisted. > > Marcel's working to get it into freebsd. > > Thanks, > Phil > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Tue Oct 21 09:46:00 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 370614F9 for ; Tue, 21 Oct 2014 09:46:00 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 76B69FB6 for ; Tue, 21 Oct 2014 09:45:59 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9L9jdbx082810 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 21 Oct 2014 12:45:39 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9L9jdbx082810 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s9L9jdgl082807 for arch@freebsd.org; Tue, 21 Oct 2014 12:45:39 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 21 Oct 2014 12:45:39 +0300 From: Konstantin Belousov To: arch@freebsd.org Subject: RfC: fueword(9) and casueword(9) Message-ID: <20141021094539.GA1877@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED, FREEMAIL_FROM, NML_ADSP_CUSTOM_MED, T_FILL_THIS_FORM_SHORT autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2014 09:46:00 -0000 FreeBSD provides the fuword(9) family of functions to fetch a word from the userspace. Functions return the value read, or -1 on failure (i.e. when faulted on access). This KPI has flaw, which makes it impossible to distinguish -1 read from usermode vs. the fault. As John Baldwin pointed out, fuword(9) cannot be replaced by copyin(9), since fuword(9) is atomic for aligned data, while copyin(9) is typically implemented as byte copy. I wanted to fix this wart for long time, below is the prototyped patch, which adds fueword(9) family of functions. They take the address of variable where to put the value read, and return 0 on success, -1 on failure. In similar way, casueword(9) fixes casuword(9). The tricky part of the patch are the changes to kern_umtx.c, where the logic of the loops in the lock acquire routines is delicate and care must be taken to not obliterate possible errors from the suspension check or signal test on loop retry. I only implemented fueword(9) and casueword(9) for x86 and powerpc. The fuword(9) and casuword(9) are reimplemented as wrappers around e-variants. For arm, mips and sparc, where I do not know or do not remember the assembler anymore, I made a hack to provide deficient fueword(9), which calls fuword(9) and thus still mixing -1 from userspace and fault. See NO_FUEWORD in machine/param.h; hopefully arch maintainers will fix the remaining places. Some users of fuword(9) are still left, in particular in aio and dtrace. Patch was only lightly tested on x86 for now. Comments and fixes are welcomed. share/man/man9/Makefile | 10 +- share/man/man9/casuword.9 | 95 +++++++ share/man/man9/fetch.9 | 73 +++++- sys/amd64/amd64/support.S | 97 ++++--- sys/amd64/ia32/ia32_syscall.c | 12 +- sys/arm/include/param.h | 4 + sys/compat/freebsd32/freebsd32_misc.c | 11 +- sys/i386/i386/support.s | 42 +-- sys/i386/i386/trap.c | 11 +- sys/kern/kern_exec.c | 26 +- sys/kern/kern_umtx.c | 469 ++++++++++++++++++++++------------ sys/kern/subr_uio.c | 172 +++++++++++++ sys/kern/vfs_acl.c | 8 +- sys/mips/include/param.h | 4 + sys/net/if_spppsubr.c | 3 +- sys/powerpc/powerpc/copyinout.c | 109 +++++--- sys/sparc64/include/param.h | 4 + sys/sys/systm.h | 13 +- 18 files changed, 881 insertions(+), 282 deletions(-) diff --git a/share/man/man9/Makefile b/share/man/man9/Makefile index bc21dc6..0464aca 100644 --- a/share/man/man9/Makefile +++ b/share/man/man9/Makefile @@ -581,6 +581,9 @@ MLINKS+=condvar.9 cv_broadcast.9 \ MLINKS+=config_intrhook.9 config_intrhook_disestablish.9 \ config_intrhook.9 config_intrhook_establish.9 MLINKS+=contigmalloc.9 contigfree.9 +MLINKS+=casuword.9 casueword.9 \ + casuword.9 casueword32.9 \ + casuword.9 casuword32.9 MLINKS+=copy.9 copyin.9 \ copy.9 copyin_nofault.9 \ copy.9 copyinstr.9 \ @@ -688,7 +691,12 @@ MLINKS+=fetch.9 fubyte.9 \ fetch.9 fuword.9 \ fetch.9 fuword16.9 \ fetch.9 fuword32.9 \ - fetch.9 fuword64.9 + fetch.9 fuword64.9 \ + fetch.9 fuebyte.9 \ + fetch.9 fueword.9 \ + fetch.9 fueword16.9 \ + fetch.9 fueword32.9 \ + fetch.9 fueword64.9 MLINKS+=firmware.9 firmware_get.9 \ firmware.9 firmware_put.9 \ firmware.9 firmware_register.9 \ diff --git a/share/man/man9/casuword.9 b/share/man/man9/casuword.9 new file mode 100644 index 0000000..34a0f1d --- /dev/null +++ b/share/man/man9/casuword.9 @@ -0,0 +1,95 @@ +.\" Copyright (c) 2014 The FreeBSD Foundation +.\" All rights reserved. +.\" +.\" Part of this documentation was written by +.\" Konstantin Belousov under sponsorship +.\" from the FreeBSD Foundation. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd October 21, 2014 +.Dt CASU 9 +.Os +.Sh NAME +.Nm casueword , +.Nm casueword32 , +.Nm casuword , +.Nm casuword32 +.Nd fetch, compare and store data from user-space +.Sh SYNOPSIS +.In sys/types.h +.In sys/systm.h +.Ft int +.Fn casueword "volatile u_long *base" "u_long oldval" "u_long *oldvalp" "u_long newval" +.Ft int +.Fn casueword32 "volatile uint32_t *base" "uint32_t oldval" "uint32_t *oldvalp" "uint32_t newval" +.Ft u_long +.Fn casuword "volatile u_long *base" "u_long oldval" "u_long newval" +.Ft uint32_t +.Fn casuword32 "volatile uint32_t *base" "uint32_t oldval" "uint32_t newval" +.Sh DESCRIPTION +The +.Nm +functions are designed to perform atomic compare-and-swap operation on +the value in the usermode memory of the current process. +.Pp +The +.Nm +routines reads the value from user memory with address +.Pa base , +and compare the value read with +.Pa oldval . +If the values are equal, +.Pa newval +is written to the +.Pa *base . +In case of +.Fn casueword32 +and +.Fn casueword , +old value is stored into the (kernel-mode) variable pointed by +.Pa *oldvalp . +The userspace value must be naturally aligned. +.Pp +The callers of +.Fn casuword +and +.Fn casuword32 +functions cannot distinguish between -1 read from +userspace and function failure. +.Sh RETURN VALUES +The +.Fn casuword +and +.Fn casuword32 +functions return the data fetched or -1 on failure. +The +.Fn casueword +and +.Fn casueword32 +functions return 0 on success and -1 on failure. +.Sh SEE ALSO +.Xr atomic 9 , +.Xr fetch 9 , +.Xr store 9 diff --git a/share/man/man9/fetch.9 b/share/man/man9/fetch.9 index ccf6866..abf2edd 100644 --- a/share/man/man9/fetch.9 +++ b/share/man/man9/fetch.9 @@ -34,7 +34,7 @@ .\" .\" $FreeBSD$ .\" -.Dd October 5, 2009 +.Dd October 21, 2014 .Dt FETCH 9 .Os .Sh NAME @@ -44,11 +44,15 @@ .Nm fuword , .Nm fuword16 , .Nm fuword32 , -.Nm fuword64 +.Nm fuword64 , +.Nm fuebyte , +.Nm fueword , +.Nm fueword16 , +.Nm fueword32 , +.Nm fueword64 .Nd fetch data from user-space .Sh SYNOPSIS .In sys/types.h -.In sys/time.h .In sys/systm.h .Ft int .Fn fubyte "const void *base" @@ -60,18 +64,31 @@ .Fn fuword32 "const void *base" .Ft int64_t .Fn fuword64 "const void *base" +.Ft int +.Fn fuebyte "const void *base" "int *val" +.Ft long +.Fn fueword "const void *base" "long *val" +.Ft int +.Fn fueword16 "void *base" "int *val" +.Ft int32_t +.Fn fueword32 "const void *base" "int32_t *val" +.Ft int64_t +.Fn fueword64 "const void *base" "int64_t *val" .In sys/resourcevar.h .Ft int .Fn fuswintr "void *base" .Sh DESCRIPTION The .Nm -functions are designed to copy small amounts of data from user-space. +functions are designed to copy small amounts of data from user-space +of the current process. +If read is successfull, it is performed atomically. +The data read must be naturally aligned. .Pp The .Nm routines provide the following functionality: -.Bl -tag -width "fuswintr()" +.Bl -tag -width "fueword32()" .It Fn fubyte Fetches a byte of data from the user-space address .Pa base . @@ -91,11 +108,55 @@ Fetches 64 bits of data from the user-space address Fetches a short word of data from the user-space address .Pa base . This function is safe to call during an interrupt context. +.It Fn fuebyte +Fetches a byte of data from the user-space address +.Pa base +and stores the result in the variable pointed by +.Pa val . +.It Fn fueword +Fetches a word of data from the user-space address +.Pa base +and stores the result in the variable pointed by +.Pa val . +.It Fn fueword16 +Fetches 16 bits of data from the user-space address +.Pa base +and stores the result in the variable pointed by +.Pa val . +.It Fn fueword32 +Fetches 32 bits of data from the user-space address +.Pa base +and stores the result in the variable pointed by +.Pa val . +.It Fn fuword64 +Fetches 64 bits of data from the user-space address +.Pa base +and stores the result in the variable pointed by +.Pa val . .El +.Pp +The callers of +.Fn fuw* +set of functions cannot distinguish between -1 read from +userspace and function failure. .Sh RETURN VALUES The -.Nm +.Fn fubyte , +.Fn fuword , +.Fn fuword16 , +.Fn fuword32 , +.Fn fuword64 , +and +.Fn fuswintr functions return the data fetched or -1 on failure. +The +.Fn fuebyte , +.Fn fueword , +.Fn fueword16 , +.Fn fueword32 +and +.Fn fueword64 +functions return 0 on success and -1 on failure. .Sh SEE ALSO .Xr copy 9 , .Xr store 9 diff --git a/sys/amd64/amd64/support.S b/sys/amd64/amd64/support.S index 4897367..b90c87b 100644 --- a/sys/amd64/amd64/support.S +++ b/sys/amd64/amd64/support.S @@ -312,12 +312,13 @@ copyin_fault: END(copyin) /* - * casuword32. Compare and set user integer. Returns -1 or the current value. - * dst = %rdi, old = %rsi, new = %rdx + * casueword32. Compare and set user integer. Returns -1 on fault, + * 0 if access was successfull. Old value is written to *oldp. + * dst = %rdi, old = %esi, oldp = %rdx, new = %ecx */ -ENTRY(casuword32) - movq PCPU(CURPCB),%rcx - movq $fusufault,PCB_ONFAULT(%rcx) +ENTRY(casueword32) + movq PCPU(CURPCB),%r8 + movq $fusufault,PCB_ONFAULT(%r8) movq $VM_MAXUSER_ADDRESS-4,%rax cmpq %rax,%rdi /* verify address is valid */ @@ -327,26 +328,34 @@ ENTRY(casuword32) #ifdef SMP lock #endif - cmpxchgl %edx,(%rdi) /* new = %edx */ + cmpxchgl %ecx,(%rdi) /* new = %ecx */ /* * The old value is in %eax. If the store succeeded it will be the * value we expected (old) from before the store, otherwise it will - * be the current value. + * be the current value. Save %eax into %esi to prepare the return + * value. */ + movl %eax,%esi + xorl %eax,%eax + movq %rax,PCB_ONFAULT(%r8) - movq PCPU(CURPCB),%rcx - movq $0,PCB_ONFAULT(%rcx) + /* + * Access the oldp after the pcb_onfault is cleared, to correctly + * catch corrupted pointer. + */ + movl %esi,(%rdx) /* oldp = %rdx */ ret -END(casuword32) +END(casueword32) /* - * casuword. Compare and set user word. Returns -1 or the current value. - * dst = %rdi, old = %rsi, new = %rdx + * casueword. Compare and set user long. Returns -1 on fault, + * 0 if access was successfull. Old value is written to *oldp. + * dst = %rdi, old = %rsi, oldp = %rdx, new = %rcx */ -ENTRY(casuword) - movq PCPU(CURPCB),%rcx - movq $fusufault,PCB_ONFAULT(%rcx) +ENTRY(casueword) + movq PCPU(CURPCB),%r8 + movq $fusufault,PCB_ONFAULT(%r8) movq $VM_MAXUSER_ADDRESS-4,%rax cmpq %rax,%rdi /* verify address is valid */ @@ -356,28 +365,28 @@ ENTRY(casuword) #ifdef SMP lock #endif - cmpxchgq %rdx,(%rdi) /* new = %rdx */ + cmpxchgq %rcx,(%rdi) /* new = %rcx */ /* - * The old value is in %eax. If the store succeeded it will be the + * The old value is in %rax. If the store succeeded it will be the * value we expected (old) from before the store, otherwise it will * be the current value. */ - - movq PCPU(CURPCB),%rcx - movq $fusufault,PCB_ONFAULT(%rcx) - movq $0,PCB_ONFAULT(%rcx) + movq %rax,%rsi + xorl %eax,%eax + movq %rax,PCB_ONFAULT(%r8) + movq %rsi,(%rdx) ret -END(casuword) +END(casueword) /* * Fetch (load) a 64-bit word, a 32-bit word, a 16-bit word, or an 8-bit - * byte from user memory. All these functions are MPSAFE. - * addr = %rdi + * byte from user memory. + * addr = %rdi, valp = %rsi */ -ALTENTRY(fuword64) -ENTRY(fuword) +ALTENTRY(fueword64) +ENTRY(fueword) movq PCPU(CURPCB),%rcx movq $fusufault,PCB_ONFAULT(%rcx) @@ -385,13 +394,15 @@ ENTRY(fuword) cmpq %rax,%rdi /* verify address is valid */ ja fusufault - movq (%rdi),%rax - movq $0,PCB_ONFAULT(%rcx) + xorl %eax,%eax + movq (%rdi),%r11 + movq %rax,PCB_ONFAULT(%rcx) + movq %r11,(%rsi) ret END(fuword64) END(fuword) -ENTRY(fuword32) +ENTRY(fueword32) movq PCPU(CURPCB),%rcx movq $fusufault,PCB_ONFAULT(%rcx) @@ -399,10 +410,12 @@ ENTRY(fuword32) cmpq %rax,%rdi /* verify address is valid */ ja fusufault - movl (%rdi),%eax - movq $0,PCB_ONFAULT(%rcx) + xorl %eax,%eax + movl (%rdi),%r11d + movq %rax,PCB_ONFAULT(%rcx) + movl %r11d,(%rsi) ret -END(fuword32) +END(fueword32) /* * fuswintr() and suswintr() are specialized variants of fuword16() and @@ -418,7 +431,7 @@ ENTRY(fuswintr) END(suswintr) END(fuswintr) -ENTRY(fuword16) +ENTRY(fueword16) movq PCPU(CURPCB),%rcx movq $fusufault,PCB_ONFAULT(%rcx) @@ -426,12 +439,14 @@ ENTRY(fuword16) cmpq %rax,%rdi ja fusufault - movzwl (%rdi),%eax - movq $0,PCB_ONFAULT(%rcx) + xorl %eax,%eax + movzwl (%rdi),%r11d + movq %rax,PCB_ONFAULT(%rcx) + movl %r11d,(%rsi) ret -END(fuword16) +END(fueword16) -ENTRY(fubyte) +ENTRY(fuebyte) movq PCPU(CURPCB),%rcx movq $fusufault,PCB_ONFAULT(%rcx) @@ -439,10 +454,12 @@ ENTRY(fubyte) cmpq %rax,%rdi ja fusufault - movzbl (%rdi),%eax - movq $0,PCB_ONFAULT(%rcx) + xorl %eax,%eax + movzbl (%rdi),%r11d + movq %rax,PCB_ONFAULT(%rcx) + movl %r11d,(%rsi) ret -END(fubyte) +END(fuebyte) ALIGN_TEXT fusufault: diff --git a/sys/amd64/ia32/ia32_syscall.c b/sys/amd64/ia32/ia32_syscall.c index 0cdec6f..92249f9 100644 --- a/sys/amd64/ia32/ia32_syscall.c +++ b/sys/amd64/ia32/ia32_syscall.c @@ -110,7 +110,7 @@ ia32_fetch_syscall_args(struct thread *td, struct syscall_args *sa) struct proc *p; struct trapframe *frame; caddr_t params; - u_int32_t args[8]; + u_int32_t args[8], tmp; int error, i; p = td->td_proc; @@ -126,7 +126,10 @@ ia32_fetch_syscall_args(struct thread *td, struct syscall_args *sa) /* * Code is first argument, followed by actual args. */ - sa->code = fuword32(params); + error = fueword32(params, &tmp); + if (error == -1) + return (EFAULT); + sa->code = tmp; params += sizeof(int); } else if (sa->code == SYS___syscall) { /* @@ -135,7 +138,10 @@ ia32_fetch_syscall_args(struct thread *td, struct syscall_args *sa) * We use a 32-bit fetch in case params is not * aligned. */ - sa->code = fuword32(params); + error = fueword32(params, &tmp); + if (error == -1) + return (EFAULT); + sa->code = tmp; params += sizeof(quad_t); } if (p->p_sysent->sv_mask) diff --git a/sys/arm/include/param.h b/sys/arm/include/param.h index 4a64607..6267154 100644 --- a/sys/arm/include/param.h +++ b/sys/arm/include/param.h @@ -149,4 +149,8 @@ #define pgtok(x) ((x) * (PAGE_SIZE / 1024)) +#ifdef _KERNEL +#define NO_FUEWORD 1 +#endif + #endif /* !_ARM_INCLUDE_PARAM_H_ */ diff --git a/sys/compat/freebsd32/freebsd32_misc.c b/sys/compat/freebsd32/freebsd32_misc.c index d909a71..98948d3 100644 --- a/sys/compat/freebsd32/freebsd32_misc.c +++ b/sys/compat/freebsd32/freebsd32_misc.c @@ -1832,16 +1832,21 @@ freebsd32_sysctl(struct thread *td, struct freebsd32_sysctl_args *uap) { int error, name[CTL_MAXNAME]; size_t j, oldlen; + uint32_t tmp; if (uap->namelen > CTL_MAXNAME || uap->namelen < 2) return (EINVAL); error = copyin(uap->name, name, uap->namelen * sizeof(int)); if (error) return (error); - if (uap->oldlenp) - oldlen = fuword32(uap->oldlenp); - else + if (uap->oldlenp) { + error = fueword32(uap->oldlenp, &tmp); + oldlen = tmp; + } else { oldlen = 0; + } + if (error != 0) + return (EFAULT); error = userland_sysctl(td, name, uap->namelen, uap->old, &oldlen, 1, uap->new, uap->newlen, &j, SCTL_MASK32); diff --git a/sys/i386/i386/support.s b/sys/i386/i386/support.s index c126f78..2a0d7a6 100644 --- a/sys/i386/i386/support.s +++ b/sys/i386/i386/support.s @@ -389,16 +389,16 @@ copyin_fault: ret /* - * casuword. Compare and set user word. Returns -1 or the current value. + * casueword. Compare and set user word. Returns -1 on fault, + * 0 on non-faulting access. The current value is in *oldp. */ - -ALTENTRY(casuword32) -ENTRY(casuword) +ALTENTRY(casueword32) +ENTRY(casueword) movl PCPU(CURPCB),%ecx movl $fusufault,PCB_ONFAULT(%ecx) movl 4(%esp),%edx /* dst */ movl 8(%esp),%eax /* old */ - movl 12(%esp),%ecx /* new */ + movl 16(%esp),%ecx /* new */ cmpl $VM_MAXUSER_ADDRESS-4,%edx /* verify address is valid */ ja fusufault @@ -416,17 +416,20 @@ ENTRY(casuword) movl PCPU(CURPCB),%ecx movl $0,PCB_ONFAULT(%ecx) + movl 12(%esp),%edx /* oldp */ + movl %eax,(%edx) + xorl %eax,%eax ret -END(casuword32) -END(casuword) +END(casueword32) +END(casueword) /* * Fetch (load) a 32-bit word, a 16-bit word, or an 8-bit byte from user - * memory. All these functions are MPSAFE. + * memory. */ -ALTENTRY(fuword32) -ENTRY(fuword) +ALTENTRY(fueword32) +ENTRY(fueword) movl PCPU(CURPCB),%ecx movl $fusufault,PCB_ONFAULT(%ecx) movl 4(%esp),%edx /* from */ @@ -436,9 +439,12 @@ ENTRY(fuword) movl (%edx),%eax movl $0,PCB_ONFAULT(%ecx) + movl 8(%esp),%edx + movl %eax,(%edx) + xorl %eax,%eax ret -END(fuword32) -END(fuword) +END(fueword32) +END(fueword) /* * fuswintr() and suswintr() are specialized variants of fuword16() and @@ -454,7 +460,7 @@ ENTRY(fuswintr) END(suswintr) END(fuswintr) -ENTRY(fuword16) +ENTRY(fueword16) movl PCPU(CURPCB),%ecx movl $fusufault,PCB_ONFAULT(%ecx) movl 4(%esp),%edx @@ -464,10 +470,13 @@ ENTRY(fuword16) movzwl (%edx),%eax movl $0,PCB_ONFAULT(%ecx) + movl 8(%esp),%edx + movl %eax,(%edx) + xorl %eax,%eax ret -END(fuword16) +END(fueword16) -ENTRY(fubyte) +ENTRY(fuebyte) movl PCPU(CURPCB),%ecx movl $fusufault,PCB_ONFAULT(%ecx) movl 4(%esp),%edx @@ -477,6 +486,9 @@ ENTRY(fubyte) movzbl (%edx),%eax movl $0,PCB_ONFAULT(%ecx) + movl 8(%esp),%edx + movl %eax,(%edx) + xorl %eax,%eax ret END(fubyte) diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c index 1d0d104..84d6ec3 100644 --- a/sys/i386/i386/trap.c +++ b/sys/i386/i386/trap.c @@ -1059,6 +1059,7 @@ cpu_fetch_syscall_args(struct thread *td, struct syscall_args *sa) struct proc *p; struct trapframe *frame; caddr_t params; + long tmp; int error; p = td->td_proc; @@ -1074,14 +1075,20 @@ cpu_fetch_syscall_args(struct thread *td, struct syscall_args *sa) /* * Code is first argument, followed by actual args. */ - sa->code = fuword(params); + error = fueword(params, &tmp); + if (error == -1) + return (EFAULT); + sa->code = tmp; params += sizeof(int); } else if (sa->code == SYS___syscall) { /* * Like syscall, but code is a quad, so as to maintain * quad alignment for the rest of the arguments. */ - sa->code = fuword(params); + error = fueword(params, &tmp); + if (error == -1) + return (EFAULT); + sa->code = tmp; params += sizeof(quad_t); } diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c index f2bbdaa..13a52e9 100644 --- a/sys/kern/kern_exec.c +++ b/sys/kern/kern_exec.c @@ -1091,7 +1091,7 @@ int exec_copyin_args(struct image_args *args, char *fname, enum uio_seg segflg, char **argv, char **envv) { - char *argp, *envp; + u_long argp, envp; int error; size_t length; @@ -1127,13 +1127,17 @@ exec_copyin_args(struct image_args *args, char *fname, /* * extract arguments first */ - while ((argp = (caddr_t) (intptr_t) fuword(argv++))) { - if (argp == (caddr_t) -1) { + for (;;) { + error = fueword(argv++, &argp); + if (error == -1) { error = EFAULT; goto err_exit; } - if ((error = copyinstr(argp, args->endp, - args->stringspace, &length))) { + if (argp == 0) + break; + error = copyinstr((void *)(uintptr_t)argp, args->endp, + args->stringspace, &length); + if (error != 0) { if (error == ENAMETOOLONG) error = E2BIG; goto err_exit; @@ -1149,13 +1153,17 @@ exec_copyin_args(struct image_args *args, char *fname, * extract environment strings */ if (envv) { - while ((envp = (caddr_t)(intptr_t)fuword(envv++))) { - if (envp == (caddr_t)-1) { + for (;;) { + error = fueword(envv++, &envp); + if (error == -1) { error = EFAULT; goto err_exit; } - if ((error = copyinstr(envp, args->endp, - args->stringspace, &length))) { + if (envp == 0) + break; + error = copyinstr((void *)(uintptr_t)envp, + args->endp, args->stringspace, &length); + if (error != 0) { if (error == ENAMETOOLONG) error = E2BIG; goto err_exit; diff --git a/sys/kern/kern_umtx.c b/sys/kern/kern_umtx.c index 7cfef38..9f58e67 100644 --- a/sys/kern/kern_umtx.c +++ b/sys/kern/kern_umtx.c @@ -847,6 +847,7 @@ do_wait(struct thread *td, void *addr, u_long id, struct abs_timeout timo; struct umtx_q *uq; u_long tmp; + uint32_t tmp32; int error = 0; uq = td->td_umtxq; @@ -860,18 +861,29 @@ do_wait(struct thread *td, void *addr, u_long id, umtxq_lock(&uq->uq_key); umtxq_insert(uq); umtxq_unlock(&uq->uq_key); - if (compat32 == 0) - tmp = fuword(addr); - else - tmp = (unsigned int)fuword32(addr); + if (compat32 == 0) { + error = fueword(addr, &tmp); + if (error != 0) + error = EFAULT; + } else { + error = fueword32(addr, &tmp32); + if (error == 0) + tmp = tmp32; + else + error = EFAULT; + } umtxq_lock(&uq->uq_key); - if (tmp == id) - error = umtxq_sleep(uq, "uwait", timeout == NULL ? - NULL : &timo); - if ((uq->uq_flags & UQF_UMTXQ) == 0) - error = 0; - else + if (error == 0) { + if (tmp == id) + error = umtxq_sleep(uq, "uwait", timeout == NULL ? + NULL : &timo); + if ((uq->uq_flags & UQF_UMTXQ) == 0) + error = 0; + else + umtxq_remove(uq); + } else if ((uq->uq_flags & UQF_UMTXQ) != 0) { umtxq_remove(uq); + } umtxq_unlock(&uq->uq_key); umtx_key_release(&uq->uq_key); if (error == ERESTART) @@ -908,11 +920,11 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, struct abs_timeout timo; struct umtx_q *uq; uint32_t owner, old, id; - int error = 0; + int error, rv; id = td->td_tid; uq = td->td_umtxq; - + error = 0; if (timeout != NULL) abs_timeout_init2(&timo, timeout); @@ -921,7 +933,9 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, * can fault on any access. */ for (;;) { - owner = fuword32(__DEVOLATILE(void *, &m->m_owner)); + rv = fueword32(__DEVOLATILE(void *, &m->m_owner), &owner); + if (error == -1) + return (EFAULT); if (mode == _UMUTEX_WAIT) { if (owner == UMUTEX_UNOWNED || owner == UMUTEX_CONTESTED) return (0); @@ -929,31 +943,31 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, /* * Try the uncontested case. This should be done in userland. */ - owner = casuword32(&m->m_owner, UMUTEX_UNOWNED, id); + rv = casueword32(&m->m_owner, UMUTEX_UNOWNED, + &owner, id); + /* The address was invalid. */ + if (rv == -1) + return (EFAULT); /* The acquire succeeded. */ if (owner == UMUTEX_UNOWNED) return (0); - /* The address was invalid. */ - if (owner == -1) - return (EFAULT); - /* If no one owns it but it is contested try to acquire it. */ if (owner == UMUTEX_CONTESTED) { - owner = casuword32(&m->m_owner, - UMUTEX_CONTESTED, id | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, + UMUTEX_CONTESTED, &owner, + id | UMUTEX_CONTESTED); + /* The address was invalid. */ + if (rv == -1) + return (EFAULT); if (owner == UMUTEX_CONTESTED) return (0); - /* The address was invalid. */ - if (owner == -1) - return (EFAULT); - - error = umtxq_check_susp(td); - if (error != 0) - return (error); + rv = umtxq_check_susp(td); + if (rv != 0) + return (rv); /* If this failed the lock has changed, restart. */ continue; @@ -985,10 +999,11 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, * either some one else has acquired the lock or it has been * released. */ - old = casuword32(&m->m_owner, owner, owner | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, owner, &old, + owner | UMUTEX_CONTESTED); /* The address was invalid. */ - if (old == -1) { + if (rv == -1) { umtxq_lock(&uq->uq_key); umtxq_remove(uq); umtxq_unbusy(&uq->uq_key); @@ -1033,16 +1048,16 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags) /* * Make sure we own this mtx. */ - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - if (owner == -1) + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner); + if (error == -1) return (EFAULT); if ((owner & ~UMUTEX_CONTESTED) != id) return (EPERM); if ((owner & UMUTEX_CONTESTED) == 0) { - old = casuword32(&m->m_owner, owner, UMUTEX_UNOWNED); - if (old == -1) + error = casueword32(&m->m_owner, owner, &old, UMUTEX_UNOWNED); + if (error == -1) return (EFAULT); if (old == owner) return (0); @@ -1064,14 +1079,14 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags) * there is zero or one thread only waiting for it. * Otherwise, it must be marked as contested. */ - old = casuword32(&m->m_owner, owner, + error = casueword32(&m->m_owner, owner, &old, count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED); umtxq_lock(&key); umtxq_signal(&key,1); umtxq_unbusy(&key); umtxq_unlock(&key); umtx_key_release(&key); - if (old == -1) + if (error == -1) return (EFAULT); if (old != owner) return (EINVAL); @@ -1091,14 +1106,16 @@ do_wake_umutex(struct thread *td, struct umutex *m) int error; int count; - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - if (owner == -1) + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner); + if (error == -1) return (EFAULT); if ((owner & ~UMUTEX_CONTESTED) != 0) return (0); - flags = fuword32(&m->m_flags); + error = fueword32(&m->m_flags, &flags); + if (error == -1) + return (EFAULT); /* We should only ever be in here for contested locks */ if ((error = umtx_key_get(m, TYPE_NORMAL_UMUTEX, GET_SHARE(flags), @@ -1110,16 +1127,20 @@ do_wake_umutex(struct thread *td, struct umutex *m) count = umtxq_count(&key); umtxq_unlock(&key); - if (count <= 1) - owner = casuword32(&m->m_owner, UMUTEX_CONTESTED, UMUTEX_UNOWNED); + if (count <= 1) { + error = casueword32(&m->m_owner, UMUTEX_CONTESTED, &owner, + UMUTEX_UNOWNED); + if (error == -1) + error = EFAULT; + } umtxq_lock(&key); - if (count != 0 && (owner & ~UMUTEX_CONTESTED) == 0) + if (error == 0 && count != 0 && (owner & ~UMUTEX_CONTESTED) == 0) umtxq_signal(&key, 1); umtxq_unbusy(&key); umtxq_unlock(&key); umtx_key_release(&key); - return (0); + return (error); } /* @@ -1162,41 +1183,49 @@ do_wake2_umutex(struct thread *td, struct umutex *m, uint32_t flags) * any memory. */ if (count > 1) { - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - while ((owner & UMUTEX_CONTESTED) ==0) { - old = casuword32(&m->m_owner, owner, - owner|UMUTEX_CONTESTED); + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), + &owner); + if (error == -1) + error = EFAULT; + while (error == 0 && (owner & UMUTEX_CONTESTED) == 0) { + error = casueword32(&m->m_owner, owner, &old, + owner | UMUTEX_CONTESTED); + if (error == -1) { + error = EFAULT; + break; + } if (old == owner) break; owner = old; - if (old == -1) - break; error = umtxq_check_susp(td); if (error != 0) break; } } else if (count == 1) { - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - while ((owner & ~UMUTEX_CONTESTED) != 0 && + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), + &owner); + if (error == -1) + error = EFAULT; + while (error == 0 && (owner & ~UMUTEX_CONTESTED) != 0 && (owner & UMUTEX_CONTESTED) == 0) { - old = casuword32(&m->m_owner, owner, - owner|UMUTEX_CONTESTED); + error = casueword32(&m->m_owner, owner, &old, + owner | UMUTEX_CONTESTED); + if (error == -1) { + error = EFAULT; + break; + } if (old == owner) break; owner = old; - if (old == -1) - break; error = umtxq_check_susp(td); if (error != 0) break; } } umtxq_lock(&key); - if (owner == -1) { - error = EFAULT; + if (error == EFAULT) { umtxq_signal(&key, INT_MAX); - } - else if (count != 0 && (owner & ~UMUTEX_CONTESTED) == 0) + } else if (count != 0 && (owner & ~UMUTEX_CONTESTED) == 0) umtxq_signal(&key, 1); umtxq_unbusy(&key); umtxq_unlock(&key); @@ -1576,7 +1605,7 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, struct umtx_q *uq; struct umtx_pi *pi, *new_pi; uint32_t id, owner, old; - int error; + int error, rv; id = td->td_tid; uq = td->td_umtxq; @@ -1619,7 +1648,12 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, /* * Try the uncontested case. This should be done in userland. */ - owner = casuword32(&m->m_owner, UMUTEX_UNOWNED, id); + rv = casueword32(&m->m_owner, UMUTEX_UNOWNED, &owner, id); + /* The address was invalid. */ + if (rv == -1) { + error = EFAULT; + break; + } /* The acquire succeeded. */ if (owner == UMUTEX_UNOWNED) { @@ -1627,16 +1661,15 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, break; } - /* The address was invalid. */ - if (owner == -1) { - error = EFAULT; - break; - } - /* If no one owns it but it is contested try to acquire it. */ if (owner == UMUTEX_CONTESTED) { - owner = casuword32(&m->m_owner, - UMUTEX_CONTESTED, id | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, + UMUTEX_CONTESTED, &owner, id | UMUTEX_CONTESTED); + /* The address was invalid. */ + if (rv == -1) { + error = EFAULT; + break; + } if (owner == UMUTEX_CONTESTED) { umtxq_lock(&uq->uq_key); @@ -1647,12 +1680,6 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, break; } - /* The address was invalid. */ - if (owner == -1) { - error = EFAULT; - break; - } - error = umtxq_check_susp(td); if (error != 0) break; @@ -1683,10 +1710,11 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, * either some one else has acquired the lock or it has been * released. */ - old = casuword32(&m->m_owner, owner, owner | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, owner, &old, + owner | UMUTEX_CONTESTED); /* The address was invalid. */ - if (old == -1) { + if (rv == -1) { umtxq_lock(&uq->uq_key); umtxq_unbusy(&uq->uq_key); umtxq_unlock(&uq->uq_key); @@ -1741,8 +1769,8 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags) /* * Make sure we own this mtx. */ - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - if (owner == -1) + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner); + if (error == -1) return (EFAULT); if ((owner & ~UMUTEX_CONTESTED) != id) @@ -1750,8 +1778,8 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags) /* This should be done in userland */ if ((owner & UMUTEX_CONTESTED) == 0) { - old = casuword32(&m->m_owner, owner, UMUTEX_UNOWNED); - if (old == -1) + error = casueword32(&m->m_owner, owner, &old, UMUTEX_UNOWNED); + if (error == -1) return (EFAULT); if (old == owner) return (0); @@ -1809,14 +1837,14 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags) * there is zero or one thread only waiting for it. * Otherwise, it must be marked as contested. */ - old = casuword32(&m->m_owner, owner, - count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED); + error = casueword32(&m->m_owner, owner, &old, + count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED); umtxq_lock(&key); umtxq_unbusy(&key); umtxq_unlock(&key); umtx_key_release(&key); - if (old == -1) + if (error == -1) return (EFAULT); if (old != owner) return (EINVAL); @@ -1835,7 +1863,7 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags, struct umtx_pi *pi; uint32_t ceiling; uint32_t owner, id; - int error, pri, old_inherited_pri, su; + int error, pri, old_inherited_pri, su, rv; id = td->td_tid; uq = td->td_umtxq; @@ -1853,7 +1881,12 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags, umtxq_busy(&uq->uq_key); umtxq_unlock(&uq->uq_key); - ceiling = RTP_PRIO_MAX - fuword32(&m->m_ceilings[0]); + rv = fueword32(&m->m_ceilings[0], &ceiling); + if (rv == -1) { + error = EFAULT; + goto out; + } + ceiling = RTP_PRIO_MAX - ceiling; if (ceiling > RTP_PRIO_MAX) { error = EINVAL; goto out; @@ -1874,17 +1907,16 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags, } mtx_unlock_spin(&umtx_lock); - owner = casuword32(&m->m_owner, - UMUTEX_CONTESTED, id | UMUTEX_CONTESTED); - - if (owner == UMUTEX_CONTESTED) { - error = 0; + rv = casueword32(&m->m_owner, + UMUTEX_CONTESTED, &owner, id | UMUTEX_CONTESTED); + /* The address was invalid. */ + if (rv == -1) { + error = EFAULT; break; } - /* The address was invalid. */ - if (owner == -1) { - error = EFAULT; + if (owner == UMUTEX_CONTESTED) { + error = 0; break; } @@ -1973,8 +2005,8 @@ do_unlock_pp(struct thread *td, struct umutex *m, uint32_t flags) /* * Make sure we own this mtx. */ - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - if (owner == -1) + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner); + if (error == -1) return (EFAULT); if ((owner & ~UMUTEX_CONTESTED) != id) @@ -2047,9 +2079,11 @@ do_set_ceiling(struct thread *td, struct umutex *m, uint32_t ceiling, uint32_t save_ceiling; uint32_t owner, id; uint32_t flags; - int error; + int error, rv; - flags = fuword32(&m->m_flags); + error = fueword32(&m->m_flags, &flags); + if (error == -1) + return (EFAULT); if ((flags & UMUTEX_PRIO_PROTECT) == 0) return (EINVAL); if (ceiling > RTP_PRIO_MAX) @@ -2064,10 +2098,18 @@ do_set_ceiling(struct thread *td, struct umutex *m, uint32_t ceiling, umtxq_busy(&uq->uq_key); umtxq_unlock(&uq->uq_key); - save_ceiling = fuword32(&m->m_ceilings[0]); + rv = fueword32(&m->m_ceilings[0], &save_ceiling); + if (rv == -1) { + error = EFAULT; + break; + } - owner = casuword32(&m->m_owner, - UMUTEX_CONTESTED, id | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, + UMUTEX_CONTESTED, &owner, id | UMUTEX_CONTESTED); + if (rv == -1) { + error = EFAULT; + break; + } if (owner == UMUTEX_CONTESTED) { suword32(&m->m_ceilings[0], ceiling); @@ -2077,12 +2119,6 @@ do_set_ceiling(struct thread *td, struct umutex *m, uint32_t ceiling, break; } - /* The address was invalid. */ - if (owner == -1) { - error = EFAULT; - break; - } - if ((owner & ~UMUTEX_CONTESTED) == id) { suword32(&m->m_ceilings[0], ceiling); error = 0; @@ -2129,8 +2165,8 @@ do_lock_umutex(struct thread *td, struct umutex *m, uint32_t flags; int error; - flags = fuword32(&m->m_flags); - if (flags == -1) + error = fueword32(&m->m_flags, &flags); + if (error == -1) return (EFAULT); switch(flags & (UMUTEX_PRIO_INHERIT | UMUTEX_PRIO_PROTECT)) { @@ -2164,9 +2200,10 @@ static int do_unlock_umutex(struct thread *td, struct umutex *m) { uint32_t flags; + int error; - flags = fuword32(&m->m_flags); - if (flags == -1) + error = fueword32(&m->m_flags, &flags); + if (error == -1) return (EFAULT); switch(flags & (UMUTEX_PRIO_INHERIT | UMUTEX_PRIO_PROTECT)) { @@ -2187,21 +2224,27 @@ do_cv_wait(struct thread *td, struct ucond *cv, struct umutex *m, { struct abs_timeout timo; struct umtx_q *uq; - uint32_t flags; - uint32_t clockid; + uint32_t flags, clockid, hasw; int error; uq = td->td_umtxq; - flags = fuword32(&cv->c_flags); + error = fueword32(&cv->c_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(cv, TYPE_CV, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); if ((wflags & CVWAIT_CLOCKID) != 0) { - clockid = fuword32(&cv->c_clockid); + error = fueword32(&cv->c_clockid, &clockid); + if (error == -1) { + umtx_key_release(&uq->uq_key); + return (EFAULT); + } if (clockid < CLOCK_REALTIME || clockid >= CLOCK_THREAD_CPUTIME_ID) { /* hmm, only HW clock id will work. */ + umtx_key_release(&uq->uq_key); return (EINVAL); } } else { @@ -2217,7 +2260,9 @@ do_cv_wait(struct thread *td, struct ucond *cv, struct umutex *m, * Set c_has_waiters to 1 before releasing user mutex, also * don't modify cache line when unnecessary. */ - if (fuword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters)) == 0) + error = fueword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters), + &hasw); + if (error == 0 && hasw == 0) suword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters), 1); umtxq_lock(&uq->uq_key); @@ -2276,7 +2321,9 @@ do_cv_signal(struct thread *td, struct ucond *cv) int error, cnt, nwake; uint32_t flags; - flags = fuword32(&cv->c_flags); + error = fueword32(&cv->c_flags, &flags); + if (error == -1) + return (EFAULT); if ((error = umtx_key_get(cv, TYPE_CV, GET_SHARE(flags), &key)) != 0) return (error); umtxq_lock(&key); @@ -2287,6 +2334,8 @@ do_cv_signal(struct thread *td, struct ucond *cv) umtxq_unlock(&key); error = suword32( __DEVOLATILE(uint32_t *, &cv->c_has_waiters), 0); + if (error == -1) + error = EFAULT; umtxq_lock(&key); } umtxq_unbusy(&key); @@ -2302,7 +2351,9 @@ do_cv_broadcast(struct thread *td, struct ucond *cv) int error; uint32_t flags; - flags = fuword32(&cv->c_flags); + error = fueword32(&cv->c_flags, &flags); + if (error == -1) + return (EFAULT); if ((error = umtx_key_get(cv, TYPE_CV, GET_SHARE(flags), &key)) != 0) return (error); @@ -2312,6 +2363,8 @@ do_cv_broadcast(struct thread *td, struct ucond *cv) umtxq_unlock(&key); error = suword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters), 0); + if (error == -1) + error = EFAULT; umtxq_lock(&key); umtxq_unbusy(&key); @@ -2329,10 +2382,12 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx uint32_t flags, wrflags; int32_t state, oldstate; int32_t blocked_readers; - int error; + int error, rv; uq = td->td_umtxq; - flags = fuword32(&rwlock->rw_flags); + error = fueword32(&rwlock->rw_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(rwlock, TYPE_RWLOCK, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); @@ -2345,15 +2400,22 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx wrflags |= URWLOCK_WRITE_WAITERS; for (;;) { - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), + &state); + if (rv == -1) { + umtx_key_release(&uq->uq_key); + return (EFAULT); + } + /* try to lock it */ while (!(state & wrflags)) { if (__predict_false(URWLOCK_READER_COUNT(state) == URWLOCK_MAX_READERS)) { umtx_key_release(&uq->uq_key); return (EAGAIN); } - oldstate = casuword32(&rwlock->rw_state, state, state + 1); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state + 1); + if (rv == -1) { umtx_key_release(&uq->uq_key); return (EFAULT); } @@ -2379,12 +2441,17 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx * re-read the state, in case it changed between the try-lock above * and the check below */ - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), + &state); + if (rv == -1) + error = EFAULT; /* set read contention bit */ - while ((state & wrflags) && !(state & URWLOCK_READ_WAITERS)) { - oldstate = casuword32(&rwlock->rw_state, state, state | URWLOCK_READ_WAITERS); - if (oldstate == -1) { + while (error == 0 && (state & wrflags) && + !(state & URWLOCK_READ_WAITERS)) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state | URWLOCK_READ_WAITERS); + if (rv == -1) { error = EFAULT; break; } @@ -2415,7 +2482,12 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx sleep: /* contention bit is set, before sleeping, increase read waiter count */ - blocked_readers = fuword32(&rwlock->rw_blocked_readers); + rv = fueword32(&rwlock->rw_blocked_readers, + &blocked_readers); + if (rv == -1) { + error = EFAULT; + break; + } suword32(&rwlock->rw_blocked_readers, blocked_readers+1); while (state & wrflags) { @@ -2431,18 +2503,31 @@ sleep: umtxq_unlock(&uq->uq_key); if (error) break; - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, + &rwlock->rw_state), &state); + if (rv == -1) { + error = EFAULT; + break; + } } /* decrease read waiter count, and may clear read contention bit */ - blocked_readers = fuword32(&rwlock->rw_blocked_readers); + error = fueword32(&rwlock->rw_blocked_readers, + &blocked_readers); + if (error == -1) { + error = EFAULT; + break; + } suword32(&rwlock->rw_blocked_readers, blocked_readers-1); if (blocked_readers == 1) { - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); - for (;;) { - oldstate = casuword32(&rwlock->rw_state, state, - state & ~URWLOCK_READ_WAITERS); - if (oldstate == -1) { + rv = fueword32(__DEVOLATILE(int32_t *, + &rwlock->rw_state), &state); + if (rv == -1) + error = EFAULT; + while (error == 0) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state & ~URWLOCK_READ_WAITERS); + if (rv == -1) { error = EFAULT; break; } @@ -2450,8 +2535,6 @@ sleep: break; state = oldstate; error = umtxq_check_susp(td); - if (error != 0) - break; } } @@ -2476,10 +2559,12 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo int32_t state, oldstate; int32_t blocked_writers; int32_t blocked_readers; - int error; + int error, rv; uq = td->td_umtxq; - flags = fuword32(&rwlock->rw_flags); + error = fueword32(&rwlock->rw_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(rwlock, TYPE_RWLOCK, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); @@ -2489,10 +2574,16 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo blocked_readers = 0; for (;;) { - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), + &state); + if (rv == -1) { + umtx_key_release(&uq->uq_key); + return (EFAULT); + } while (!(state & URWLOCK_WRITE_OWNER) && URWLOCK_READER_COUNT(state) == 0) { - oldstate = casuword32(&rwlock->rw_state, state, state | URWLOCK_WRITE_OWNER); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state | URWLOCK_WRITE_OWNER); + if (rv == -1) { umtx_key_release(&uq->uq_key); return (EFAULT); } @@ -2528,12 +2619,17 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo * re-read the state, in case it changed between the try-lock above * and the check below */ - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), + &state); + if (rv == -1) + error = EFAULT; - while (((state & URWLOCK_WRITE_OWNER) || URWLOCK_READER_COUNT(state) != 0) && - (state & URWLOCK_WRITE_WAITERS) == 0) { - oldstate = casuword32(&rwlock->rw_state, state, state | URWLOCK_WRITE_WAITERS); - if (oldstate == -1) { + while (error == 0 && ((state & URWLOCK_WRITE_OWNER) || + URWLOCK_READER_COUNT(state) != 0) && + (state & URWLOCK_WRITE_WAITERS) == 0) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state | URWLOCK_WRITE_WAITERS); + if (rv == -1) { error = EFAULT; break; } @@ -2561,7 +2657,12 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo continue; } sleep: - blocked_writers = fuword32(&rwlock->rw_blocked_writers); + rv = fueword32(&rwlock->rw_blocked_writers, + &blocked_writers); + if (rv == -1) { + error = EFAULT; + break; + } suword32(&rwlock->rw_blocked_writers, blocked_writers+1); while ((state & URWLOCK_WRITE_OWNER) || URWLOCK_READER_COUNT(state) != 0) { @@ -2577,17 +2678,34 @@ sleep: umtxq_unlock(&uq->uq_key); if (error) break; - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, + &rwlock->rw_state), &state); + if (rv == -1) { + error = EFAULT; + break; + } } - blocked_writers = fuword32(&rwlock->rw_blocked_writers); + if (error != 0) + break; + rv = fueword32(&rwlock->rw_blocked_writers, + &blocked_writers); + if (rv == -1) { + error = EFAULT; + break; + } suword32(&rwlock->rw_blocked_writers, blocked_writers-1); if (blocked_writers == 1) { - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, + &rwlock->rw_state), &state); + if (rv == -1) { + error = EFAULT; + break; + } for (;;) { - oldstate = casuword32(&rwlock->rw_state, state, - state & ~URWLOCK_WRITE_WAITERS); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state & ~URWLOCK_WRITE_WAITERS); + if (rv == -1) { error = EFAULT; break; } @@ -2603,7 +2721,12 @@ sleep: if (error != 0) break; } - blocked_readers = fuword32(&rwlock->rw_blocked_readers); + rv = fueword32(&rwlock->rw_blocked_readers, + &blocked_readers); + if (rv == -1) { + error = EFAULT; + break; + } } else blocked_readers = 0; @@ -2624,20 +2747,24 @@ do_rw_unlock(struct thread *td, struct urwlock *rwlock) struct umtx_q *uq; uint32_t flags; int32_t state, oldstate; - int error, q, count; + int error, rv, q, count; uq = td->td_umtxq; - flags = fuword32(&rwlock->rw_flags); + error = fueword32(&rwlock->rw_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(rwlock, TYPE_RWLOCK, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + error = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), &state); + if (error == -1) + return (EFAULT); if (state & URWLOCK_WRITE_OWNER) { for (;;) { - oldstate = casuword32(&rwlock->rw_state, state, - state & ~URWLOCK_WRITE_OWNER); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state & ~URWLOCK_WRITE_OWNER); + if (rv == -1) { error = EFAULT; goto out; } @@ -2655,9 +2782,9 @@ do_rw_unlock(struct thread *td, struct urwlock *rwlock) } } else if (URWLOCK_READER_COUNT(state) != 0) { for (;;) { - oldstate = casuword32(&rwlock->rw_state, state, - state - 1); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state - 1); + if (rv == -1) { error = EFAULT; goto out; } @@ -2719,7 +2846,9 @@ do_sem_wait(struct thread *td, struct _usem *sem, struct _umtx_time *timeout) int error; uq = td->td_umtxq; - flags = fuword32(&sem->_flags); + error = fueword32(&sem->_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(sem, TYPE_SEM, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); @@ -2732,14 +2861,14 @@ do_sem_wait(struct thread *td, struct _usem *sem, struct _umtx_time *timeout) umtxq_insert(uq); umtxq_unlock(&uq->uq_key); casuword32(__DEVOLATILE(uint32_t *, &sem->_has_waiters), 0, 1); - count = fuword32(__DEVOLATILE(uint32_t *, &sem->_count)); - if (count != 0) { + error = fueword32(__DEVOLATILE(uint32_t *, &sem->_count), &count); + if (error == -1 || count != 0) { umtxq_lock(&uq->uq_key); umtxq_unbusy(&uq->uq_key); umtxq_remove(uq); umtxq_unlock(&uq->uq_key); umtx_key_release(&uq->uq_key); - return (0); + return (error == -1 ? EFAULT : 0); } umtxq_lock(&uq->uq_key); umtxq_unbusy(&uq->uq_key); @@ -2770,7 +2899,9 @@ do_sem_wake(struct thread *td, struct _usem *sem) int error, cnt; uint32_t flags; - flags = fuword32(&sem->_flags); + error = fueword32(&sem->_flags, &flags); + if (error == -1) + return (EFAULT); if ((error = umtx_key_get(sem, TYPE_SEM, GET_SHARE(flags), &key)) != 0) return (error); umtxq_lock(&key); @@ -2788,6 +2919,8 @@ do_sem_wake(struct thread *td, struct _usem *sem) error = suword32( __DEVOLATILE(uint32_t *, &sem->_has_waiters), 0); umtxq_lock(&key); + if (error == -1) + error = EFAULT; } } umtxq_unbusy(&key); diff --git a/sys/kern/subr_uio.c b/sys/kern/subr_uio.c index f2e6e32..f142930 100644 --- a/sys/kern/subr_uio.c +++ b/sys/kern/subr_uio.c @@ -7,6 +7,11 @@ * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * + * Copyright (c) 2014 The FreeBSD Foundation + * + * Portions of this software were developed by Konstantin Belousov + * under sponsorship from the FreeBSD Foundation. + * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: @@ -438,3 +443,170 @@ copyout_unmap(struct thread *td, vm_offset_t addr, size_t sz) return (0); } + +#ifdef NO_FUEWORD +/* + * XXXKIB The temporal implementation of fue*() functions which do not + * handle usermode -1 properly, mixing it with the fault code. Keep + * this until MD code is written. Currently sparc64, mips and arm do + * not have proper implementation. + */ + +int +fuebyte(const void *base, int *val) +{ + int res; + + res = fubyte(base); + if (res == -1) + return (-1); + *val = res; + return (0); +} + +int +fueword(const void *base, long *val) +{ + long res; + + res = fuword(base); + if (res == -1) + return (-1); + *val = res; + return (0); +} + +int +fueword16(const void *base, int *val) +{ + int res; + + res = fuword16(base); + if (res == -1) + return (-1); + *val = res; + return (0); +} + +int +fueword32(const void *base, int32_t *val) +{ + int32_t res; + + res = fuword32(base); + if (res == -1) + return (-1); + *val = res; + return (0); +} + +#ifdef _LP64 +int +fueword64(const void *base, int64_t *val) +{ + int32_t res; + + res = fuword64(base); + if (res == -1) + return (-1); + *val = res; + return (0); +} +#endif + +int +casueword32(volatile uint32_t *base, uint32_t oldval, uint32_t *oldvalp, + uint32_t newval) +{ + int32_t ov; + + ov = casuword32(base, oldval, newval); + if (ov == -1) + return (-1); + *oldvalp = ov; + return (0); +} + +int +casueword(volatile u_long *p, u_long oldval, u_long *oldvalp, u_long newval) +{ + u_long ov; + + ov = casuword(p, oldval, newval); + if (ov == -1) + return (-1); + *oldvalp = ov; + return (0); +} +#else /* NO_FUEWORD */ +int +fubyte(const void *addr) +{ + int val, rv; + + rv = fuebyte(addr, &val); + return (rv == -1 ? -1 : val); +} + +int +fuword16(const void *addr) +{ + int val, rv; + + rv = fueword16(addr, &val); + return (rv == -1 ? -1 : val); +} + +int32_t +fuword32(const void *addr) +{ + int rv; + int32_t val; + + rv = fueword32(addr, &val); + return (rv == -1 ? -1 : val); +} + +#ifdef _LP64 +int64_t +fuword64(const void *addr) +{ + int rv; + int64_t val; + + rv = fueword64(addr, &val); + return (rv == -1 ? -1 : val); +} +#endif /* _LP64 */ + +long +fuword(const void *addr) +{ + long val; + int rv; + + rv = fueword(addr, &val); + return (rv == -1 ? -1 : val); +} + +uint32_t +casuword32(volatile uint32_t *addr, uint32_t old, uint32_t new) +{ + int rv; + uint32_t val; + + rv = casueword32(addr, old, &val, new); + return (rv == -1 ? -1 : val); +} + +u_long +casuword(volatile u_long *addr, u_long old, u_long new) +{ + int rv; + u_long val; + + rv = casueword(addr, old, &val, new); + return (rv == -1 ? -1 : val); +} + +#endif /* NO_FUEWORD */ diff --git a/sys/kern/vfs_acl.c b/sys/kern/vfs_acl.c index 93626fb..e9361e5 100644 --- a/sys/kern/vfs_acl.c +++ b/sys/kern/vfs_acl.c @@ -148,6 +148,7 @@ acl_copyin(void *user_acl, struct acl *kernel_acl, acl_type_t type) static int acl_copyout(struct acl *kernel_acl, void *user_acl, acl_type_t type) { + uint32_t am; int error; struct oldacl old; @@ -162,8 +163,11 @@ acl_copyout(struct acl *kernel_acl, void *user_acl, acl_type_t type) break; default: - if (fuword32((char *)user_acl + - offsetof(struct acl, acl_maxcnt)) != ACL_MAX_ENTRIES) + error = fueword32((char *)user_acl + + offsetof(struct acl, acl_maxcnt), &am); + if (error == -1) + return (EFAULT); + if (am != ACL_MAX_ENTRIES) return (EINVAL); error = copyout(kernel_acl, user_acl, sizeof(*kernel_acl)); diff --git a/sys/mips/include/param.h b/sys/mips/include/param.h index 2d1d7f1..90f3e6f 100644 --- a/sys/mips/include/param.h +++ b/sys/mips/include/param.h @@ -178,4 +178,8 @@ #define pgtok(x) ((x) * (PAGE_SIZE / 1024)) +#ifdef _KERNEL +#define NO_FUEWORD 1 +#endif + #endif /* !_MIPS_INCLUDE_PARAM_H_ */ diff --git a/sys/net/if_spppsubr.c b/sys/net/if_spppsubr.c index 9dc55c5..c0f8e39 100644 --- a/sys/net/if_spppsubr.c +++ b/sys/net/if_spppsubr.c @@ -5060,7 +5060,8 @@ sppp_params(struct sppp *sp, u_long cmd, void *data) * Check the cmd word first before attempting to fetch all the * data. */ - if ((subcmd = fuword(ifr->ifr_data)) == -1) { + rv = fueword(ifr->ifr_data, &subcmd); + if (rv == -1) { rv = EFAULT; goto quit; } diff --git a/sys/powerpc/powerpc/copyinout.c b/sys/powerpc/powerpc/copyinout.c index dcfab80..f593228 100644 --- a/sys/powerpc/powerpc/copyinout.c +++ b/sys/powerpc/powerpc/copyinout.c @@ -378,13 +378,12 @@ suword32(void *addr, int32_t word) #endif int -fubyte(const void *addr) +fuebyte(const void *addr, int *val) { struct thread *td; pmap_t pm; faultbuf env; u_char *p; - int val; td = curthread; pm = &td->td_proc->p_vmspace->vm_pmap; @@ -399,20 +398,19 @@ fubyte(const void *addr) return (-1); } - val = *p; + *val = *p; td->td_pcb->pcb_onfault = NULL; - return (val); + return (0); } -#ifdef __powerpc64__ -int32_t -fuword32(const void *addr) +int +fueword16(const void *addr, int *val) { struct thread *td; pmap_t pm; faultbuf env; - int32_t *p, val; + int16_t *p; td = curthread; pm = &td->td_proc->p_vmspace->vm_pmap; @@ -427,20 +425,19 @@ fuword32(const void *addr) return (-1); } - val = *p; + *val = *p; td->td_pcb->pcb_onfault = NULL; - return (val); + return (0); } -#endif -long -fuword(const void *addr) +int +fueword32(const void *addr, int32_t *val) { struct thread *td; pmap_t pm; faultbuf env; - long *p, val; + int32_t *p; td = curthread; pm = &td->td_proc->p_vmspace->vm_pmap; @@ -455,22 +452,71 @@ fuword(const void *addr) return (-1); } - val = *p; + *val = *p; td->td_pcb->pcb_onfault = NULL; - return (val); + return (0); } -#ifndef __powerpc64__ -int32_t -fuword32(const void *addr) +#ifdef __powerpc64__ +int +fueword64(const void *addr, int64_t *val) { - return ((int32_t)fuword(addr)); + struct thread *td; + pmap_t pm; + faultbuf env; + int64_t *p; + + td = curthread; + pm = &td->td_proc->p_vmspace->vm_pmap; + + if (setfault(env)) { + td->td_pcb->pcb_onfault = NULL; + return (-1); + } + + if (map_user_ptr(pm, addr, (void **)&p, sizeof(*p), NULL)) { + td->td_pcb->pcb_onfault = NULL; + return (-1); + } + + *val = *p; + + td->td_pcb->pcb_onfault = NULL; + return (0); } #endif -uint32_t -casuword32(volatile uint32_t *addr, uint32_t old, uint32_t new) +int +fueword(const void *addr, long *val) +{ + struct thread *td; + pmap_t pm; + faultbuf env; + long *p; + + td = curthread; + pm = &td->td_proc->p_vmspace->vm_pmap; + + if (setfault(env)) { + td->td_pcb->pcb_onfault = NULL; + return (-1); + } + + if (map_user_ptr(pm, addr, (void **)&p, sizeof(*p), NULL)) { + td->td_pcb->pcb_onfault = NULL; + return (-1); + } + + *val = *p; + + td->td_pcb->pcb_onfault = NULL; + return (0); +} + +int +casueword32(volatile uint32_t *addr, uint32_t old, uint32_t *oldvalp, + uint32_t new) { struct thread *td; pmap_t pm; @@ -507,18 +553,21 @@ casuword32(volatile uint32_t *addr, uint32_t old, uint32_t new) td->td_pcb->pcb_onfault = NULL; - return (val); + *oldvalp = val; + return (0); } #ifndef __powerpc64__ -u_long -casuword(volatile u_long *addr, u_long old, u_long new) +int +casueword(volatile u_long *addr, u_long old, u_long *oldvalp, u_long new) { - return (casuword32((volatile uint32_t *)addr, old, new)); + + return (casueword32((volatile uint32_t *)addr, old, + (uint32_t *)oldvalp, new)); } #else -u_long -casuword(volatile u_long *addr, u_long old, u_long new) +int +casueword(volatile u_long *addr, u_long old, u_long *oldvalp, u_long new) { struct thread *td; pmap_t pm; @@ -555,7 +604,7 @@ casuword(volatile u_long *addr, u_long old, u_long new) td->td_pcb->pcb_onfault = NULL; - return (val); + *oldvalp = val; + return (0); } #endif - diff --git a/sys/sparc64/include/param.h b/sys/sparc64/include/param.h index e59f2c4..46bacae 100644 --- a/sys/sparc64/include/param.h +++ b/sys/sparc64/include/param.h @@ -146,4 +146,8 @@ #define pgtok(x) ((unsigned long)(x) * (PAGE_SIZE / 1024)) +#ifdef _KERNEL +#define NO_FUEWORD 1 +#endif + #endif /* !_SPARC64_INCLUDE_PARAM_H_ */ diff --git a/sys/sys/systm.h b/sys/sys/systm.h index f4eae57..72fa245 100644 --- a/sys/sys/systm.h +++ b/sys/sys/systm.h @@ -254,16 +254,25 @@ int copyout_nofault(const void * __restrict kaddr, void * __restrict udaddr, int fubyte(const void *base); long fuword(const void *base); -int fuword16(void *base); +int fuword16(const void *base); int32_t fuword32(const void *base); int64_t fuword64(const void *base); +int fuebyte(const void *base, int *val); +int fueword(const void *base, long *val); +int fueword16(const void *base, int *val); +int fueword32(const void *base, int32_t *val); +int fueword64(const void *base, int64_t *val); int subyte(void *base, int byte); int suword(void *base, long word); int suword16(void *base, int word); int suword32(void *base, int32_t word); int suword64(void *base, int64_t word); uint32_t casuword32(volatile uint32_t *base, uint32_t oldval, uint32_t newval); -u_long casuword(volatile u_long *p, u_long oldval, u_long newval); +u_long casuword(volatile u_long *p, u_long oldval, u_long newval); +int casueword32(volatile uint32_t *base, uint32_t oldval, uint32_t *oldvalp, + uint32_t newval); +int casueword(volatile u_long *p, u_long oldval, u_long *oldvalp, + u_long newval); void realitexpire(void *); From owner-freebsd-arch@FreeBSD.ORG Tue Oct 21 11:50:16 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EEC5AAE7 for ; Tue, 21 Oct 2014 11:50:16 +0000 (UTC) Received: from smtp.vangyzen.net (hotblack.vangyzen.net [IPv6:2607:fc50:1000:7400:216:3eff:fe72:314f]) by mx1.freebsd.org (Postfix) with ESMTP id D2DBAF85 for ; Tue, 21 Oct 2014 11:50:16 +0000 (UTC) Received: from marvin.lab.vangyzen.net (c-24-125-214-90.hsd1.va.comcast.net [24.125.214.90]) by smtp.vangyzen.net (Postfix) with ESMTPSA id 005F756467; Tue, 21 Oct 2014 06:50:15 -0500 (CDT) Message-ID: <54464876.80905@vangyzen.net> Date: Tue, 21 Oct 2014 07:50:14 -0400 From: Eric van Gyzen User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: RfC: fueword(9) and casueword(9) References: <20141021094539.GA1877@kib.kiev.ua> In-Reply-To: <20141021094539.GA1877@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2014 11:50:17 -0000 On 10/21/2014 05:45, Konstantin Belousov wrote: > @@ -921,7 +933,9 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, > * can fault on any access. > */ > for (;;) { > - owner = fuword32(__DEVOLATILE(void *, &m->m_owner)); > + rv = fueword32(__DEVOLATILE(void *, &m->m_owner), &owner); > + if (error == -1) > + return (EFAULT); rv is assigned, but error is tested. I only reviewed kern_umtx.c, and very quickly. I'll try to review more thoroughly later today. Eric From owner-freebsd-arch@FreeBSD.ORG Tue Oct 21 14:41:21 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C030F550 for ; Tue, 21 Oct 2014 14:41:21 +0000 (UTC) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 81BE47FE for ; Tue, 21 Oct 2014 14:41:20 +0000 (UTC) Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 974201046D55; Wed, 22 Oct 2014 01:41:13 +1100 (AEDT) Date: Wed, 22 Oct 2014 01:41:12 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov Subject: Re: RfC: fueword(9) and casueword(9) In-Reply-To: <20141021094539.GA1877@kib.kiev.ua> Message-ID: <20141022002825.H2080@besplex.bde.org> References: <20141021094539.GA1877@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=BdjhjNd2 c=1 sm=1 tr=0 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=MLaLIZIVHZiPqwcd0r8A:9 a=CjuIK1q_8ugA:10 Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2014 14:41:21 -0000 On Tue, 21 Oct 2014, Konstantin Belousov wrote: > FreeBSD provides the fuword(9) family of functions to fetch a word from > the userspace. Functions return the value read, or -1 on failure (i.e. > when faulted on access). This KPI has flaw, which makes it impossible > to distinguish -1 read from usermode vs. the fault. As John Baldwin > pointed out, fuword(9) cannot be replaced by copyin(9), since fuword(9) > is atomic for aligned data, while copyin(9) is typically implemented as > byte copy. copyin() is usually implemented as register-sized copy for aligned data, but may copy in any order, or more than once, or differently to reach alignment points. It is thus unsuitable for copying volatile data. It is also slow for small data. > I wanted to fix this wart for long time, below is the prototyped patch, > which adds fueword(9) family of functions. They take the address of > variable where to put the value read, and return 0 on success, -1 on > failure. In similar way, casueword(9) fixes casuword(9). This API is larger, slower, and harder to use. No better fix is evident, except for fubyte() and fuword16(). These never had a problem on any supported arch, since bytes are only 8 bits on all supported arches, and 16-bit ints are not supported on any arch, so -1 is always out of band. Not touching them is a better fix. You didn't change any of their callers, but pessimized their implementation to a wrapper around fue*(). (BTW, fuword16() and fuswintr() are misdocumented as taking a non-const void * arg.). Profiling counters use fuswintr(). You didn't touch that. In FreeBSD it is unimplemented for most or all arches. It returns -1 to indicated unimplemented. NetBSD implemented it long ago. It shouldn't be touched since it is 16 bits. However, it is a bug that profiling counters are only 16 bits. 16-bit counters overflow in 8 seconds at the default stathz of 8192. > The tricky part of the patch are the changes to kern_umtx.c, where the > logic of the loops in the lock acquire routines is delicate and care > must be taken to not obliterate possible errors from the suspension > check or signal test on loop retry. > ... > diff --git a/sys/kern/kern_umtx.c b/sys/kern/kern_umtx.c > index 7cfef38..9f58e67 100644 > --- a/sys/kern/kern_umtx.c > +++ b/sys/kern/kern_umtx.c > ... > @@ -921,7 +933,9 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, > * can fault on any access. > */ > for (;;) { > - owner = fuword32(__DEVOLATILE(void *, &m->m_owner)); > + rv = fueword32(__DEVOLATILE(void *, &m->m_owner), &owner); > + if (error == -1) > + return (EFAULT); > if (mode == _UMUTEX_WAIT) { > if (owner == UMUTEX_UNOWNED || owner == UMUTEX_CONTESTED) > return (0); A new API should try to fix these __DEVOLATILE() abominations. I think it is safe, and even correct, to declare the pointers as volatile const void *, since the functions really can handle volatile data, unlike copyin(). Atomic op functions are declared as taking pointers to volatile for similar reasons. Often they are applied to non-volatile data, but adding a qualifier is type-safe and doesn't cost efficiency since the pointer access is is not known to the compiler. (The last point is not so clear -- the compiler can see things in the functions since they are inline asm. fueword() isn't inline so its (in)efficiency is not changed.) The atomic read functions are not declared as taking pointers to const. The __DECONST() abomination might be used to work around this bug. umtx is indeed complicated. Bruce From owner-freebsd-arch@FreeBSD.ORG Tue Oct 21 16:23:23 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 57EF3D26 for ; Tue, 21 Oct 2014 16:23:23 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 97E5432D for ; Tue, 21 Oct 2014 16:23:22 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9LGN79s071856 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 21 Oct 2014 19:23:07 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9LGN79s071856 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s9LGN6Hn071855; Tue, 21 Oct 2014 19:23:06 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 21 Oct 2014 19:23:06 +0300 From: Konstantin Belousov To: Bruce Evans Subject: Re: RfC: fueword(9) and casueword(9) Message-ID: <20141021162306.GE1877@kib.kiev.ua> References: <20141021094539.GA1877@kib.kiev.ua> <20141022002825.H2080@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141022002825.H2080@besplex.bde.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED, FREEMAIL_FROM, NML_ADSP_CUSTOM_MED, T_FILL_THIS_FORM_SHORT autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2014 16:23:23 -0000 On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote: > On Tue, 21 Oct 2014, Konstantin Belousov wrote: > This API is larger, slower, and harder to use. No better fix is evident, > except for fubyte() and fuword16(). These never had a problem on any > supported arch, since bytes are only 8 bits on all supported arches, > and 16-bit ints are not supported on any arch, so -1 is always out of > band. Not touching them is a better fix. You didn't change any of > their callers, but pessimized their implementation to a wrapper around > fue*(). (BTW, fuword16() and fuswintr() are misdocumented as taking a > non-const void * arg.). I reverted the addition of fuebyte(9) and fueword16(9). First I thought that it would be nicer to provide fully complement KPI with regard to fuX/fueX, but it seems that lack of fuebyte() and fueword16() will cause the right questions. > > @@ -921,7 +933,9 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, > > * can fault on any access. > > */ > > for (;;) { > > - owner = fuword32(__DEVOLATILE(void *, &m->m_owner)); > > + rv = fueword32(__DEVOLATILE(void *, &m->m_owner), &owner); > > + if (error == -1) > > + return (EFAULT); > > if (mode == _UMUTEX_WAIT) { > > if (owner == UMUTEX_UNOWNED || owner == UMUTEX_CONTESTED) > > return (0); > > A new API should try to fix these __DEVOLATILE() abominations. I think it > is safe, and even correct, to declare the pointers as volatile const void *, > since the functions really can handle volatile data, unlike copyin(). > > Atomic op functions are declared as taking pointers to volatile for > similar reasons. Often they are applied to non-volatile data, but > adding a qualifier is type-safe and doesn't cost efficiency since the > pointer access is is not known to the compiler. (The last point is not > so clear -- the compiler can see things in the functions since they are > inline asm. fueword() isn't inline so its (in)efficiency is not changed.) > > The atomic read functions are not declared as taking pointers to const. > The __DECONST() abomination might be used to work around this bug. I prefer to not complicate the fetch(9) KPI due to the mistakes in the umtx structures definitions. I think that it is bug to mark the lock words with volatile. I want the fueword(9) interface to be as much similar to fuword(9), in particular, volatile seems to be not needed. Below is the updated patch, together with the bug fix noted by Eric. diff --git a/share/man/man9/Makefile b/share/man/man9/Makefile index bc21dc6..0464aca 100644 --- a/share/man/man9/Makefile +++ b/share/man/man9/Makefile @@ -581,6 +581,9 @@ MLINKS+=condvar.9 cv_broadcast.9 \ MLINKS+=config_intrhook.9 config_intrhook_disestablish.9 \ config_intrhook.9 config_intrhook_establish.9 MLINKS+=contigmalloc.9 contigfree.9 +MLINKS+=casuword.9 casueword.9 \ + casuword.9 casueword32.9 \ + casuword.9 casuword32.9 MLINKS+=copy.9 copyin.9 \ copy.9 copyin_nofault.9 \ copy.9 copyinstr.9 \ @@ -688,7 +691,12 @@ MLINKS+=fetch.9 fubyte.9 \ fetch.9 fuword.9 \ fetch.9 fuword16.9 \ fetch.9 fuword32.9 \ - fetch.9 fuword64.9 + fetch.9 fuword64.9 \ + fetch.9 fuebyte.9 \ + fetch.9 fueword.9 \ + fetch.9 fueword16.9 \ + fetch.9 fueword32.9 \ + fetch.9 fueword64.9 MLINKS+=firmware.9 firmware_get.9 \ firmware.9 firmware_put.9 \ firmware.9 firmware_register.9 \ diff --git a/share/man/man9/casuword.9 b/share/man/man9/casuword.9 new file mode 100644 index 0000000..34a0f1d --- /dev/null +++ b/share/man/man9/casuword.9 @@ -0,0 +1,95 @@ +.\" Copyright (c) 2014 The FreeBSD Foundation +.\" All rights reserved. +.\" +.\" Part of this documentation was written by +.\" Konstantin Belousov under sponsorship +.\" from the FreeBSD Foundation. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd October 21, 2014 +.Dt CASU 9 +.Os +.Sh NAME +.Nm casueword , +.Nm casueword32 , +.Nm casuword , +.Nm casuword32 +.Nd fetch, compare and store data from user-space +.Sh SYNOPSIS +.In sys/types.h +.In sys/systm.h +.Ft int +.Fn casueword "volatile u_long *base" "u_long oldval" "u_long *oldvalp" "u_long newval" +.Ft int +.Fn casueword32 "volatile uint32_t *base" "uint32_t oldval" "uint32_t *oldvalp" "uint32_t newval" +.Ft u_long +.Fn casuword "volatile u_long *base" "u_long oldval" "u_long newval" +.Ft uint32_t +.Fn casuword32 "volatile uint32_t *base" "uint32_t oldval" "uint32_t newval" +.Sh DESCRIPTION +The +.Nm +functions are designed to perform atomic compare-and-swap operation on +the value in the usermode memory of the current process. +.Pp +The +.Nm +routines reads the value from user memory with address +.Pa base , +and compare the value read with +.Pa oldval . +If the values are equal, +.Pa newval +is written to the +.Pa *base . +In case of +.Fn casueword32 +and +.Fn casueword , +old value is stored into the (kernel-mode) variable pointed by +.Pa *oldvalp . +The userspace value must be naturally aligned. +.Pp +The callers of +.Fn casuword +and +.Fn casuword32 +functions cannot distinguish between -1 read from +userspace and function failure. +.Sh RETURN VALUES +The +.Fn casuword +and +.Fn casuword32 +functions return the data fetched or -1 on failure. +The +.Fn casueword +and +.Fn casueword32 +functions return 0 on success and -1 on failure. +.Sh SEE ALSO +.Xr atomic 9 , +.Xr fetch 9 , +.Xr store 9 diff --git a/share/man/man9/fetch.9 b/share/man/man9/fetch.9 index ccf6866..e8ff100 100644 --- a/share/man/man9/fetch.9 +++ b/share/man/man9/fetch.9 @@ -34,7 +34,7 @@ .\" .\" $FreeBSD$ .\" -.Dd October 5, 2009 +.Dd October 21, 2014 .Dt FETCH 9 .Os .Sh NAME @@ -44,11 +44,13 @@ .Nm fuword , .Nm fuword16 , .Nm fuword32 , -.Nm fuword64 +.Nm fuword64 , +.Nm fueword , +.Nm fueword32 , +.Nm fueword64 .Nd fetch data from user-space .Sh SYNOPSIS .In sys/types.h -.In sys/time.h .In sys/systm.h .Ft int .Fn fubyte "const void *base" @@ -60,18 +62,27 @@ .Fn fuword32 "const void *base" .Ft int64_t .Fn fuword64 "const void *base" +.Ft long +.Fn fueword "const void *base" "long *val" +.Ft int32_t +.Fn fueword32 "const void *base" "int32_t *val" +.Ft int64_t +.Fn fueword64 "const void *base" "int64_t *val" .In sys/resourcevar.h .Ft int .Fn fuswintr "void *base" .Sh DESCRIPTION The .Nm -functions are designed to copy small amounts of data from user-space. +functions are designed to copy small amounts of data from user-space +of the current process. +If read is successfull, it is performed atomically. +The data read must be naturally aligned. .Pp The .Nm routines provide the following functionality: -.Bl -tag -width "fuswintr()" +.Bl -tag -width "fueword32()" .It Fn fubyte Fetches a byte of data from the user-space address .Pa base . @@ -91,11 +102,46 @@ Fetches 64 bits of data from the user-space address Fetches a short word of data from the user-space address .Pa base . This function is safe to call during an interrupt context. +.It Fn fueword +Fetches a word of data from the user-space address +.Pa base +and stores the result in the variable pointed by +.Pa val . +.It Fn fueword32 +Fetches 32 bits of data from the user-space address +.Pa base +and stores the result in the variable pointed by +.Pa val . +.It Fn fueword64 +Fetches 64 bits of data from the user-space address +.Pa base +and stores the result in the variable pointed by +.Pa val . .El +.Pp +The callers of +.Fn fuword , +.Fn fuword32 +and +.Fn fuword64 +functions cannot distinguish between -1 read from +userspace and function failure. .Sh RETURN VALUES The -.Nm +.Fn fubyte , +.Fn fuword , +.Fn fuword16 , +.Fn fuword32 , +.Fn fuword64 , +and +.Fn fuswintr functions return the data fetched or -1 on failure. +The +.Fn fueword , +.Fn fueword32 +and +.Fn fueword64 +functions return 0 on success and -1 on failure. .Sh SEE ALSO .Xr copy 9 , .Xr store 9 diff --git a/sys/amd64/amd64/support.S b/sys/amd64/amd64/support.S index 4897367..f9a3165 100644 --- a/sys/amd64/amd64/support.S +++ b/sys/amd64/amd64/support.S @@ -312,12 +312,13 @@ copyin_fault: END(copyin) /* - * casuword32. Compare and set user integer. Returns -1 or the current value. - * dst = %rdi, old = %rsi, new = %rdx + * casueword32. Compare and set user integer. Returns -1 on fault, + * 0 if access was successfull. Old value is written to *oldp. + * dst = %rdi, old = %esi, oldp = %rdx, new = %ecx */ -ENTRY(casuword32) - movq PCPU(CURPCB),%rcx - movq $fusufault,PCB_ONFAULT(%rcx) +ENTRY(casueword32) + movq PCPU(CURPCB),%r8 + movq $fusufault,PCB_ONFAULT(%r8) movq $VM_MAXUSER_ADDRESS-4,%rax cmpq %rax,%rdi /* verify address is valid */ @@ -327,26 +328,34 @@ ENTRY(casuword32) #ifdef SMP lock #endif - cmpxchgl %edx,(%rdi) /* new = %edx */ + cmpxchgl %ecx,(%rdi) /* new = %ecx */ /* * The old value is in %eax. If the store succeeded it will be the * value we expected (old) from before the store, otherwise it will - * be the current value. + * be the current value. Save %eax into %esi to prepare the return + * value. */ + movl %eax,%esi + xorl %eax,%eax + movq %rax,PCB_ONFAULT(%r8) - movq PCPU(CURPCB),%rcx - movq $0,PCB_ONFAULT(%rcx) + /* + * Access the oldp after the pcb_onfault is cleared, to correctly + * catch corrupted pointer. + */ + movl %esi,(%rdx) /* oldp = %rdx */ ret -END(casuword32) +END(casueword32) /* - * casuword. Compare and set user word. Returns -1 or the current value. - * dst = %rdi, old = %rsi, new = %rdx + * casueword. Compare and set user long. Returns -1 on fault, + * 0 if access was successfull. Old value is written to *oldp. + * dst = %rdi, old = %rsi, oldp = %rdx, new = %rcx */ -ENTRY(casuword) - movq PCPU(CURPCB),%rcx - movq $fusufault,PCB_ONFAULT(%rcx) +ENTRY(casueword) + movq PCPU(CURPCB),%r8 + movq $fusufault,PCB_ONFAULT(%r8) movq $VM_MAXUSER_ADDRESS-4,%rax cmpq %rax,%rdi /* verify address is valid */ @@ -356,28 +365,28 @@ ENTRY(casuword) #ifdef SMP lock #endif - cmpxchgq %rdx,(%rdi) /* new = %rdx */ + cmpxchgq %rcx,(%rdi) /* new = %rcx */ /* - * The old value is in %eax. If the store succeeded it will be the + * The old value is in %rax. If the store succeeded it will be the * value we expected (old) from before the store, otherwise it will * be the current value. */ - - movq PCPU(CURPCB),%rcx - movq $fusufault,PCB_ONFAULT(%rcx) - movq $0,PCB_ONFAULT(%rcx) + movq %rax,%rsi + xorl %eax,%eax + movq %rax,PCB_ONFAULT(%r8) + movq %rsi,(%rdx) ret -END(casuword) +END(casueword) /* * Fetch (load) a 64-bit word, a 32-bit word, a 16-bit word, or an 8-bit - * byte from user memory. All these functions are MPSAFE. - * addr = %rdi + * byte from user memory. + * addr = %rdi, valp = %rsi */ -ALTENTRY(fuword64) -ENTRY(fuword) +ALTENTRY(fueword64) +ENTRY(fueword) movq PCPU(CURPCB),%rcx movq $fusufault,PCB_ONFAULT(%rcx) @@ -385,13 +394,15 @@ ENTRY(fuword) cmpq %rax,%rdi /* verify address is valid */ ja fusufault - movq (%rdi),%rax - movq $0,PCB_ONFAULT(%rcx) + xorl %eax,%eax + movq (%rdi),%r11 + movq %rax,PCB_ONFAULT(%rcx) + movq %r11,(%rsi) ret END(fuword64) END(fuword) -ENTRY(fuword32) +ENTRY(fueword32) movq PCPU(CURPCB),%rcx movq $fusufault,PCB_ONFAULT(%rcx) @@ -399,10 +410,12 @@ ENTRY(fuword32) cmpq %rax,%rdi /* verify address is valid */ ja fusufault - movl (%rdi),%eax - movq $0,PCB_ONFAULT(%rcx) + xorl %eax,%eax + movl (%rdi),%r11d + movq %rax,PCB_ONFAULT(%rcx) + movl %r11d,(%rsi) ret -END(fuword32) +END(fueword32) /* * fuswintr() and suswintr() are specialized variants of fuword16() and diff --git a/sys/amd64/ia32/ia32_syscall.c b/sys/amd64/ia32/ia32_syscall.c index 0cdec6f..92249f9 100644 --- a/sys/amd64/ia32/ia32_syscall.c +++ b/sys/amd64/ia32/ia32_syscall.c @@ -110,7 +110,7 @@ ia32_fetch_syscall_args(struct thread *td, struct syscall_args *sa) struct proc *p; struct trapframe *frame; caddr_t params; - u_int32_t args[8]; + u_int32_t args[8], tmp; int error, i; p = td->td_proc; @@ -126,7 +126,10 @@ ia32_fetch_syscall_args(struct thread *td, struct syscall_args *sa) /* * Code is first argument, followed by actual args. */ - sa->code = fuword32(params); + error = fueword32(params, &tmp); + if (error == -1) + return (EFAULT); + sa->code = tmp; params += sizeof(int); } else if (sa->code == SYS___syscall) { /* @@ -135,7 +138,10 @@ ia32_fetch_syscall_args(struct thread *td, struct syscall_args *sa) * We use a 32-bit fetch in case params is not * aligned. */ - sa->code = fuword32(params); + error = fueword32(params, &tmp); + if (error == -1) + return (EFAULT); + sa->code = tmp; params += sizeof(quad_t); } if (p->p_sysent->sv_mask) diff --git a/sys/arm/include/param.h b/sys/arm/include/param.h index 4a64607..6267154 100644 --- a/sys/arm/include/param.h +++ b/sys/arm/include/param.h @@ -149,4 +149,8 @@ #define pgtok(x) ((x) * (PAGE_SIZE / 1024)) +#ifdef _KERNEL +#define NO_FUEWORD 1 +#endif + #endif /* !_ARM_INCLUDE_PARAM_H_ */ diff --git a/sys/compat/freebsd32/freebsd32_misc.c b/sys/compat/freebsd32/freebsd32_misc.c index d909a71..98948d3 100644 --- a/sys/compat/freebsd32/freebsd32_misc.c +++ b/sys/compat/freebsd32/freebsd32_misc.c @@ -1832,16 +1832,21 @@ freebsd32_sysctl(struct thread *td, struct freebsd32_sysctl_args *uap) { int error, name[CTL_MAXNAME]; size_t j, oldlen; + uint32_t tmp; if (uap->namelen > CTL_MAXNAME || uap->namelen < 2) return (EINVAL); error = copyin(uap->name, name, uap->namelen * sizeof(int)); if (error) return (error); - if (uap->oldlenp) - oldlen = fuword32(uap->oldlenp); - else + if (uap->oldlenp) { + error = fueword32(uap->oldlenp, &tmp); + oldlen = tmp; + } else { oldlen = 0; + } + if (error != 0) + return (EFAULT); error = userland_sysctl(td, name, uap->namelen, uap->old, &oldlen, 1, uap->new, uap->newlen, &j, SCTL_MASK32); diff --git a/sys/i386/i386/support.s b/sys/i386/i386/support.s index c126f78..0a08012 100644 --- a/sys/i386/i386/support.s +++ b/sys/i386/i386/support.s @@ -389,16 +389,16 @@ copyin_fault: ret /* - * casuword. Compare and set user word. Returns -1 or the current value. + * casueword. Compare and set user word. Returns -1 on fault, + * 0 on non-faulting access. The current value is in *oldp. */ - -ALTENTRY(casuword32) -ENTRY(casuword) +ALTENTRY(casueword32) +ENTRY(casueword) movl PCPU(CURPCB),%ecx movl $fusufault,PCB_ONFAULT(%ecx) movl 4(%esp),%edx /* dst */ movl 8(%esp),%eax /* old */ - movl 12(%esp),%ecx /* new */ + movl 16(%esp),%ecx /* new */ cmpl $VM_MAXUSER_ADDRESS-4,%edx /* verify address is valid */ ja fusufault @@ -416,17 +416,20 @@ ENTRY(casuword) movl PCPU(CURPCB),%ecx movl $0,PCB_ONFAULT(%ecx) + movl 12(%esp),%edx /* oldp */ + movl %eax,(%edx) + xorl %eax,%eax ret -END(casuword32) -END(casuword) +END(casueword32) +END(casueword) /* * Fetch (load) a 32-bit word, a 16-bit word, or an 8-bit byte from user - * memory. All these functions are MPSAFE. + * memory. */ -ALTENTRY(fuword32) -ENTRY(fuword) +ALTENTRY(fueword32) +ENTRY(fueword) movl PCPU(CURPCB),%ecx movl $fusufault,PCB_ONFAULT(%ecx) movl 4(%esp),%edx /* from */ @@ -436,9 +439,12 @@ ENTRY(fuword) movl (%edx),%eax movl $0,PCB_ONFAULT(%ecx) + movl 8(%esp),%edx + movl %eax,(%edx) + xorl %eax,%eax ret -END(fuword32) -END(fuword) +END(fueword32) +END(fueword) /* * fuswintr() and suswintr() are specialized variants of fuword16() and diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c index 1d0d104..84d6ec3 100644 --- a/sys/i386/i386/trap.c +++ b/sys/i386/i386/trap.c @@ -1059,6 +1059,7 @@ cpu_fetch_syscall_args(struct thread *td, struct syscall_args *sa) struct proc *p; struct trapframe *frame; caddr_t params; + long tmp; int error; p = td->td_proc; @@ -1074,14 +1075,20 @@ cpu_fetch_syscall_args(struct thread *td, struct syscall_args *sa) /* * Code is first argument, followed by actual args. */ - sa->code = fuword(params); + error = fueword(params, &tmp); + if (error == -1) + return (EFAULT); + sa->code = tmp; params += sizeof(int); } else if (sa->code == SYS___syscall) { /* * Like syscall, but code is a quad, so as to maintain * quad alignment for the rest of the arguments. */ - sa->code = fuword(params); + error = fueword(params, &tmp); + if (error == -1) + return (EFAULT); + sa->code = tmp; params += sizeof(quad_t); } diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c index f2bbdaa..13a52e9 100644 --- a/sys/kern/kern_exec.c +++ b/sys/kern/kern_exec.c @@ -1091,7 +1091,7 @@ int exec_copyin_args(struct image_args *args, char *fname, enum uio_seg segflg, char **argv, char **envv) { - char *argp, *envp; + u_long argp, envp; int error; size_t length; @@ -1127,13 +1127,17 @@ exec_copyin_args(struct image_args *args, char *fname, /* * extract arguments first */ - while ((argp = (caddr_t) (intptr_t) fuword(argv++))) { - if (argp == (caddr_t) -1) { + for (;;) { + error = fueword(argv++, &argp); + if (error == -1) { error = EFAULT; goto err_exit; } - if ((error = copyinstr(argp, args->endp, - args->stringspace, &length))) { + if (argp == 0) + break; + error = copyinstr((void *)(uintptr_t)argp, args->endp, + args->stringspace, &length); + if (error != 0) { if (error == ENAMETOOLONG) error = E2BIG; goto err_exit; @@ -1149,13 +1153,17 @@ exec_copyin_args(struct image_args *args, char *fname, * extract environment strings */ if (envv) { - while ((envp = (caddr_t)(intptr_t)fuword(envv++))) { - if (envp == (caddr_t)-1) { + for (;;) { + error = fueword(envv++, &envp); + if (error == -1) { error = EFAULT; goto err_exit; } - if ((error = copyinstr(envp, args->endp, - args->stringspace, &length))) { + if (envp == 0) + break; + error = copyinstr((void *)(uintptr_t)envp, + args->endp, args->stringspace, &length); + if (error != 0) { if (error == ENAMETOOLONG) error = E2BIG; goto err_exit; diff --git a/sys/kern/kern_umtx.c b/sys/kern/kern_umtx.c index 7cfef38..f5c4cc2 100644 --- a/sys/kern/kern_umtx.c +++ b/sys/kern/kern_umtx.c @@ -847,6 +847,7 @@ do_wait(struct thread *td, void *addr, u_long id, struct abs_timeout timo; struct umtx_q *uq; u_long tmp; + uint32_t tmp32; int error = 0; uq = td->td_umtxq; @@ -860,18 +861,29 @@ do_wait(struct thread *td, void *addr, u_long id, umtxq_lock(&uq->uq_key); umtxq_insert(uq); umtxq_unlock(&uq->uq_key); - if (compat32 == 0) - tmp = fuword(addr); - else - tmp = (unsigned int)fuword32(addr); + if (compat32 == 0) { + error = fueword(addr, &tmp); + if (error != 0) + error = EFAULT; + } else { + error = fueword32(addr, &tmp32); + if (error == 0) + tmp = tmp32; + else + error = EFAULT; + } umtxq_lock(&uq->uq_key); - if (tmp == id) - error = umtxq_sleep(uq, "uwait", timeout == NULL ? - NULL : &timo); - if ((uq->uq_flags & UQF_UMTXQ) == 0) - error = 0; - else + if (error == 0) { + if (tmp == id) + error = umtxq_sleep(uq, "uwait", timeout == NULL ? + NULL : &timo); + if ((uq->uq_flags & UQF_UMTXQ) == 0) + error = 0; + else + umtxq_remove(uq); + } else if ((uq->uq_flags & UQF_UMTXQ) != 0) { umtxq_remove(uq); + } umtxq_unlock(&uq->uq_key); umtx_key_release(&uq->uq_key); if (error == ERESTART) @@ -908,11 +920,11 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, struct abs_timeout timo; struct umtx_q *uq; uint32_t owner, old, id; - int error = 0; + int error, rv; id = td->td_tid; uq = td->td_umtxq; - + error = 0; if (timeout != NULL) abs_timeout_init2(&timo, timeout); @@ -921,7 +933,9 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, * can fault on any access. */ for (;;) { - owner = fuword32(__DEVOLATILE(void *, &m->m_owner)); + rv = fueword32(__DEVOLATILE(void *, &m->m_owner), &owner); + if (rv == -1) + return (EFAULT); if (mode == _UMUTEX_WAIT) { if (owner == UMUTEX_UNOWNED || owner == UMUTEX_CONTESTED) return (0); @@ -929,31 +943,31 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, /* * Try the uncontested case. This should be done in userland. */ - owner = casuword32(&m->m_owner, UMUTEX_UNOWNED, id); + rv = casueword32(&m->m_owner, UMUTEX_UNOWNED, + &owner, id); + /* The address was invalid. */ + if (rv == -1) + return (EFAULT); /* The acquire succeeded. */ if (owner == UMUTEX_UNOWNED) return (0); - /* The address was invalid. */ - if (owner == -1) - return (EFAULT); - /* If no one owns it but it is contested try to acquire it. */ if (owner == UMUTEX_CONTESTED) { - owner = casuword32(&m->m_owner, - UMUTEX_CONTESTED, id | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, + UMUTEX_CONTESTED, &owner, + id | UMUTEX_CONTESTED); + /* The address was invalid. */ + if (rv == -1) + return (EFAULT); if (owner == UMUTEX_CONTESTED) return (0); - /* The address was invalid. */ - if (owner == -1) - return (EFAULT); - - error = umtxq_check_susp(td); - if (error != 0) - return (error); + rv = umtxq_check_susp(td); + if (rv != 0) + return (rv); /* If this failed the lock has changed, restart. */ continue; @@ -985,10 +999,11 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, * either some one else has acquired the lock or it has been * released. */ - old = casuword32(&m->m_owner, owner, owner | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, owner, &old, + owner | UMUTEX_CONTESTED); /* The address was invalid. */ - if (old == -1) { + if (rv == -1) { umtxq_lock(&uq->uq_key); umtxq_remove(uq); umtxq_unbusy(&uq->uq_key); @@ -1033,16 +1048,16 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags) /* * Make sure we own this mtx. */ - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - if (owner == -1) + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner); + if (error == -1) return (EFAULT); if ((owner & ~UMUTEX_CONTESTED) != id) return (EPERM); if ((owner & UMUTEX_CONTESTED) == 0) { - old = casuword32(&m->m_owner, owner, UMUTEX_UNOWNED); - if (old == -1) + error = casueword32(&m->m_owner, owner, &old, UMUTEX_UNOWNED); + if (error == -1) return (EFAULT); if (old == owner) return (0); @@ -1064,14 +1079,14 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags) * there is zero or one thread only waiting for it. * Otherwise, it must be marked as contested. */ - old = casuword32(&m->m_owner, owner, + error = casueword32(&m->m_owner, owner, &old, count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED); umtxq_lock(&key); umtxq_signal(&key,1); umtxq_unbusy(&key); umtxq_unlock(&key); umtx_key_release(&key); - if (old == -1) + if (error == -1) return (EFAULT); if (old != owner) return (EINVAL); @@ -1091,14 +1106,16 @@ do_wake_umutex(struct thread *td, struct umutex *m) int error; int count; - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - if (owner == -1) + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner); + if (error == -1) return (EFAULT); if ((owner & ~UMUTEX_CONTESTED) != 0) return (0); - flags = fuword32(&m->m_flags); + error = fueword32(&m->m_flags, &flags); + if (error == -1) + return (EFAULT); /* We should only ever be in here for contested locks */ if ((error = umtx_key_get(m, TYPE_NORMAL_UMUTEX, GET_SHARE(flags), @@ -1110,16 +1127,20 @@ do_wake_umutex(struct thread *td, struct umutex *m) count = umtxq_count(&key); umtxq_unlock(&key); - if (count <= 1) - owner = casuword32(&m->m_owner, UMUTEX_CONTESTED, UMUTEX_UNOWNED); + if (count <= 1) { + error = casueword32(&m->m_owner, UMUTEX_CONTESTED, &owner, + UMUTEX_UNOWNED); + if (error == -1) + error = EFAULT; + } umtxq_lock(&key); - if (count != 0 && (owner & ~UMUTEX_CONTESTED) == 0) + if (error == 0 && count != 0 && (owner & ~UMUTEX_CONTESTED) == 0) umtxq_signal(&key, 1); umtxq_unbusy(&key); umtxq_unlock(&key); umtx_key_release(&key); - return (0); + return (error); } /* @@ -1162,41 +1183,49 @@ do_wake2_umutex(struct thread *td, struct umutex *m, uint32_t flags) * any memory. */ if (count > 1) { - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - while ((owner & UMUTEX_CONTESTED) ==0) { - old = casuword32(&m->m_owner, owner, - owner|UMUTEX_CONTESTED); + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), + &owner); + if (error == -1) + error = EFAULT; + while (error == 0 && (owner & UMUTEX_CONTESTED) == 0) { + error = casueword32(&m->m_owner, owner, &old, + owner | UMUTEX_CONTESTED); + if (error == -1) { + error = EFAULT; + break; + } if (old == owner) break; owner = old; - if (old == -1) - break; error = umtxq_check_susp(td); if (error != 0) break; } } else if (count == 1) { - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - while ((owner & ~UMUTEX_CONTESTED) != 0 && + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), + &owner); + if (error == -1) + error = EFAULT; + while (error == 0 && (owner & ~UMUTEX_CONTESTED) != 0 && (owner & UMUTEX_CONTESTED) == 0) { - old = casuword32(&m->m_owner, owner, - owner|UMUTEX_CONTESTED); + error = casueword32(&m->m_owner, owner, &old, + owner | UMUTEX_CONTESTED); + if (error == -1) { + error = EFAULT; + break; + } if (old == owner) break; owner = old; - if (old == -1) - break; error = umtxq_check_susp(td); if (error != 0) break; } } umtxq_lock(&key); - if (owner == -1) { - error = EFAULT; + if (error == EFAULT) { umtxq_signal(&key, INT_MAX); - } - else if (count != 0 && (owner & ~UMUTEX_CONTESTED) == 0) + } else if (count != 0 && (owner & ~UMUTEX_CONTESTED) == 0) umtxq_signal(&key, 1); umtxq_unbusy(&key); umtxq_unlock(&key); @@ -1576,7 +1605,7 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, struct umtx_q *uq; struct umtx_pi *pi, *new_pi; uint32_t id, owner, old; - int error; + int error, rv; id = td->td_tid; uq = td->td_umtxq; @@ -1619,7 +1648,12 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, /* * Try the uncontested case. This should be done in userland. */ - owner = casuword32(&m->m_owner, UMUTEX_UNOWNED, id); + rv = casueword32(&m->m_owner, UMUTEX_UNOWNED, &owner, id); + /* The address was invalid. */ + if (rv == -1) { + error = EFAULT; + break; + } /* The acquire succeeded. */ if (owner == UMUTEX_UNOWNED) { @@ -1627,16 +1661,15 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, break; } - /* The address was invalid. */ - if (owner == -1) { - error = EFAULT; - break; - } - /* If no one owns it but it is contested try to acquire it. */ if (owner == UMUTEX_CONTESTED) { - owner = casuword32(&m->m_owner, - UMUTEX_CONTESTED, id | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, + UMUTEX_CONTESTED, &owner, id | UMUTEX_CONTESTED); + /* The address was invalid. */ + if (rv == -1) { + error = EFAULT; + break; + } if (owner == UMUTEX_CONTESTED) { umtxq_lock(&uq->uq_key); @@ -1647,12 +1680,6 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, break; } - /* The address was invalid. */ - if (owner == -1) { - error = EFAULT; - break; - } - error = umtxq_check_susp(td); if (error != 0) break; @@ -1683,10 +1710,11 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, * either some one else has acquired the lock or it has been * released. */ - old = casuword32(&m->m_owner, owner, owner | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, owner, &old, + owner | UMUTEX_CONTESTED); /* The address was invalid. */ - if (old == -1) { + if (rv == -1) { umtxq_lock(&uq->uq_key); umtxq_unbusy(&uq->uq_key); umtxq_unlock(&uq->uq_key); @@ -1741,8 +1769,8 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags) /* * Make sure we own this mtx. */ - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - if (owner == -1) + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner); + if (error == -1) return (EFAULT); if ((owner & ~UMUTEX_CONTESTED) != id) @@ -1750,8 +1778,8 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags) /* This should be done in userland */ if ((owner & UMUTEX_CONTESTED) == 0) { - old = casuword32(&m->m_owner, owner, UMUTEX_UNOWNED); - if (old == -1) + error = casueword32(&m->m_owner, owner, &old, UMUTEX_UNOWNED); + if (error == -1) return (EFAULT); if (old == owner) return (0); @@ -1809,14 +1837,14 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags) * there is zero or one thread only waiting for it. * Otherwise, it must be marked as contested. */ - old = casuword32(&m->m_owner, owner, - count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED); + error = casueword32(&m->m_owner, owner, &old, + count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED); umtxq_lock(&key); umtxq_unbusy(&key); umtxq_unlock(&key); umtx_key_release(&key); - if (old == -1) + if (error == -1) return (EFAULT); if (old != owner) return (EINVAL); @@ -1835,7 +1863,7 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags, struct umtx_pi *pi; uint32_t ceiling; uint32_t owner, id; - int error, pri, old_inherited_pri, su; + int error, pri, old_inherited_pri, su, rv; id = td->td_tid; uq = td->td_umtxq; @@ -1853,7 +1881,12 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags, umtxq_busy(&uq->uq_key); umtxq_unlock(&uq->uq_key); - ceiling = RTP_PRIO_MAX - fuword32(&m->m_ceilings[0]); + rv = fueword32(&m->m_ceilings[0], &ceiling); + if (rv == -1) { + error = EFAULT; + goto out; + } + ceiling = RTP_PRIO_MAX - ceiling; if (ceiling > RTP_PRIO_MAX) { error = EINVAL; goto out; @@ -1874,17 +1907,16 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags, } mtx_unlock_spin(&umtx_lock); - owner = casuword32(&m->m_owner, - UMUTEX_CONTESTED, id | UMUTEX_CONTESTED); - - if (owner == UMUTEX_CONTESTED) { - error = 0; + rv = casueword32(&m->m_owner, + UMUTEX_CONTESTED, &owner, id | UMUTEX_CONTESTED); + /* The address was invalid. */ + if (rv == -1) { + error = EFAULT; break; } - /* The address was invalid. */ - if (owner == -1) { - error = EFAULT; + if (owner == UMUTEX_CONTESTED) { + error = 0; break; } @@ -1973,8 +2005,8 @@ do_unlock_pp(struct thread *td, struct umutex *m, uint32_t flags) /* * Make sure we own this mtx. */ - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - if (owner == -1) + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner); + if (error == -1) return (EFAULT); if ((owner & ~UMUTEX_CONTESTED) != id) @@ -2047,9 +2079,11 @@ do_set_ceiling(struct thread *td, struct umutex *m, uint32_t ceiling, uint32_t save_ceiling; uint32_t owner, id; uint32_t flags; - int error; + int error, rv; - flags = fuword32(&m->m_flags); + error = fueword32(&m->m_flags, &flags); + if (error == -1) + return (EFAULT); if ((flags & UMUTEX_PRIO_PROTECT) == 0) return (EINVAL); if (ceiling > RTP_PRIO_MAX) @@ -2064,10 +2098,18 @@ do_set_ceiling(struct thread *td, struct umutex *m, uint32_t ceiling, umtxq_busy(&uq->uq_key); umtxq_unlock(&uq->uq_key); - save_ceiling = fuword32(&m->m_ceilings[0]); + rv = fueword32(&m->m_ceilings[0], &save_ceiling); + if (rv == -1) { + error = EFAULT; + break; + } - owner = casuword32(&m->m_owner, - UMUTEX_CONTESTED, id | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, + UMUTEX_CONTESTED, &owner, id | UMUTEX_CONTESTED); + if (rv == -1) { + error = EFAULT; + break; + } if (owner == UMUTEX_CONTESTED) { suword32(&m->m_ceilings[0], ceiling); @@ -2077,12 +2119,6 @@ do_set_ceiling(struct thread *td, struct umutex *m, uint32_t ceiling, break; } - /* The address was invalid. */ - if (owner == -1) { - error = EFAULT; - break; - } - if ((owner & ~UMUTEX_CONTESTED) == id) { suword32(&m->m_ceilings[0], ceiling); error = 0; @@ -2129,8 +2165,8 @@ do_lock_umutex(struct thread *td, struct umutex *m, uint32_t flags; int error; - flags = fuword32(&m->m_flags); - if (flags == -1) + error = fueword32(&m->m_flags, &flags); + if (error == -1) return (EFAULT); switch(flags & (UMUTEX_PRIO_INHERIT | UMUTEX_PRIO_PROTECT)) { @@ -2164,9 +2200,10 @@ static int do_unlock_umutex(struct thread *td, struct umutex *m) { uint32_t flags; + int error; - flags = fuword32(&m->m_flags); - if (flags == -1) + error = fueword32(&m->m_flags, &flags); + if (error == -1) return (EFAULT); switch(flags & (UMUTEX_PRIO_INHERIT | UMUTEX_PRIO_PROTECT)) { @@ -2187,21 +2224,27 @@ do_cv_wait(struct thread *td, struct ucond *cv, struct umutex *m, { struct abs_timeout timo; struct umtx_q *uq; - uint32_t flags; - uint32_t clockid; + uint32_t flags, clockid, hasw; int error; uq = td->td_umtxq; - flags = fuword32(&cv->c_flags); + error = fueword32(&cv->c_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(cv, TYPE_CV, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); if ((wflags & CVWAIT_CLOCKID) != 0) { - clockid = fuword32(&cv->c_clockid); + error = fueword32(&cv->c_clockid, &clockid); + if (error == -1) { + umtx_key_release(&uq->uq_key); + return (EFAULT); + } if (clockid < CLOCK_REALTIME || clockid >= CLOCK_THREAD_CPUTIME_ID) { /* hmm, only HW clock id will work. */ + umtx_key_release(&uq->uq_key); return (EINVAL); } } else { @@ -2217,7 +2260,9 @@ do_cv_wait(struct thread *td, struct ucond *cv, struct umutex *m, * Set c_has_waiters to 1 before releasing user mutex, also * don't modify cache line when unnecessary. */ - if (fuword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters)) == 0) + error = fueword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters), + &hasw); + if (error == 0 && hasw == 0) suword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters), 1); umtxq_lock(&uq->uq_key); @@ -2276,7 +2321,9 @@ do_cv_signal(struct thread *td, struct ucond *cv) int error, cnt, nwake; uint32_t flags; - flags = fuword32(&cv->c_flags); + error = fueword32(&cv->c_flags, &flags); + if (error == -1) + return (EFAULT); if ((error = umtx_key_get(cv, TYPE_CV, GET_SHARE(flags), &key)) != 0) return (error); umtxq_lock(&key); @@ -2287,6 +2334,8 @@ do_cv_signal(struct thread *td, struct ucond *cv) umtxq_unlock(&key); error = suword32( __DEVOLATILE(uint32_t *, &cv->c_has_waiters), 0); + if (error == -1) + error = EFAULT; umtxq_lock(&key); } umtxq_unbusy(&key); @@ -2302,7 +2351,9 @@ do_cv_broadcast(struct thread *td, struct ucond *cv) int error; uint32_t flags; - flags = fuword32(&cv->c_flags); + error = fueword32(&cv->c_flags, &flags); + if (error == -1) + return (EFAULT); if ((error = umtx_key_get(cv, TYPE_CV, GET_SHARE(flags), &key)) != 0) return (error); @@ -2312,6 +2363,8 @@ do_cv_broadcast(struct thread *td, struct ucond *cv) umtxq_unlock(&key); error = suword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters), 0); + if (error == -1) + error = EFAULT; umtxq_lock(&key); umtxq_unbusy(&key); @@ -2329,10 +2382,12 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx uint32_t flags, wrflags; int32_t state, oldstate; int32_t blocked_readers; - int error; + int error, rv; uq = td->td_umtxq; - flags = fuword32(&rwlock->rw_flags); + error = fueword32(&rwlock->rw_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(rwlock, TYPE_RWLOCK, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); @@ -2345,15 +2400,22 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx wrflags |= URWLOCK_WRITE_WAITERS; for (;;) { - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), + &state); + if (rv == -1) { + umtx_key_release(&uq->uq_key); + return (EFAULT); + } + /* try to lock it */ while (!(state & wrflags)) { if (__predict_false(URWLOCK_READER_COUNT(state) == URWLOCK_MAX_READERS)) { umtx_key_release(&uq->uq_key); return (EAGAIN); } - oldstate = casuword32(&rwlock->rw_state, state, state + 1); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state + 1); + if (rv == -1) { umtx_key_release(&uq->uq_key); return (EFAULT); } @@ -2379,12 +2441,17 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx * re-read the state, in case it changed between the try-lock above * and the check below */ - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), + &state); + if (rv == -1) + error = EFAULT; /* set read contention bit */ - while ((state & wrflags) && !(state & URWLOCK_READ_WAITERS)) { - oldstate = casuword32(&rwlock->rw_state, state, state | URWLOCK_READ_WAITERS); - if (oldstate == -1) { + while (error == 0 && (state & wrflags) && + !(state & URWLOCK_READ_WAITERS)) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state | URWLOCK_READ_WAITERS); + if (rv == -1) { error = EFAULT; break; } @@ -2415,7 +2482,12 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx sleep: /* contention bit is set, before sleeping, increase read waiter count */ - blocked_readers = fuword32(&rwlock->rw_blocked_readers); + rv = fueword32(&rwlock->rw_blocked_readers, + &blocked_readers); + if (rv == -1) { + error = EFAULT; + break; + } suword32(&rwlock->rw_blocked_readers, blocked_readers+1); while (state & wrflags) { @@ -2431,18 +2503,31 @@ sleep: umtxq_unlock(&uq->uq_key); if (error) break; - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, + &rwlock->rw_state), &state); + if (rv == -1) { + error = EFAULT; + break; + } } /* decrease read waiter count, and may clear read contention bit */ - blocked_readers = fuword32(&rwlock->rw_blocked_readers); + error = fueword32(&rwlock->rw_blocked_readers, + &blocked_readers); + if (error == -1) { + error = EFAULT; + break; + } suword32(&rwlock->rw_blocked_readers, blocked_readers-1); if (blocked_readers == 1) { - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); - for (;;) { - oldstate = casuword32(&rwlock->rw_state, state, - state & ~URWLOCK_READ_WAITERS); - if (oldstate == -1) { + rv = fueword32(__DEVOLATILE(int32_t *, + &rwlock->rw_state), &state); + if (rv == -1) + error = EFAULT; + while (error == 0) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state & ~URWLOCK_READ_WAITERS); + if (rv == -1) { error = EFAULT; break; } @@ -2450,8 +2535,6 @@ sleep: break; state = oldstate; error = umtxq_check_susp(td); - if (error != 0) - break; } } @@ -2476,10 +2559,12 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo int32_t state, oldstate; int32_t blocked_writers; int32_t blocked_readers; - int error; + int error, rv; uq = td->td_umtxq; - flags = fuword32(&rwlock->rw_flags); + error = fueword32(&rwlock->rw_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(rwlock, TYPE_RWLOCK, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); @@ -2489,10 +2574,16 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo blocked_readers = 0; for (;;) { - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), + &state); + if (rv == -1) { + umtx_key_release(&uq->uq_key); + return (EFAULT); + } while (!(state & URWLOCK_WRITE_OWNER) && URWLOCK_READER_COUNT(state) == 0) { - oldstate = casuword32(&rwlock->rw_state, state, state | URWLOCK_WRITE_OWNER); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state | URWLOCK_WRITE_OWNER); + if (rv == -1) { umtx_key_release(&uq->uq_key); return (EFAULT); } @@ -2528,12 +2619,17 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo * re-read the state, in case it changed between the try-lock above * and the check below */ - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), + &state); + if (rv == -1) + error = EFAULT; - while (((state & URWLOCK_WRITE_OWNER) || URWLOCK_READER_COUNT(state) != 0) && - (state & URWLOCK_WRITE_WAITERS) == 0) { - oldstate = casuword32(&rwlock->rw_state, state, state | URWLOCK_WRITE_WAITERS); - if (oldstate == -1) { + while (error == 0 && ((state & URWLOCK_WRITE_OWNER) || + URWLOCK_READER_COUNT(state) != 0) && + (state & URWLOCK_WRITE_WAITERS) == 0) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state | URWLOCK_WRITE_WAITERS); + if (rv == -1) { error = EFAULT; break; } @@ -2561,7 +2657,12 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo continue; } sleep: - blocked_writers = fuword32(&rwlock->rw_blocked_writers); + rv = fueword32(&rwlock->rw_blocked_writers, + &blocked_writers); + if (rv == -1) { + error = EFAULT; + break; + } suword32(&rwlock->rw_blocked_writers, blocked_writers+1); while ((state & URWLOCK_WRITE_OWNER) || URWLOCK_READER_COUNT(state) != 0) { @@ -2577,17 +2678,34 @@ sleep: umtxq_unlock(&uq->uq_key); if (error) break; - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, + &rwlock->rw_state), &state); + if (rv == -1) { + error = EFAULT; + break; + } } - blocked_writers = fuword32(&rwlock->rw_blocked_writers); + if (error != 0) + break; + rv = fueword32(&rwlock->rw_blocked_writers, + &blocked_writers); + if (rv == -1) { + error = EFAULT; + break; + } suword32(&rwlock->rw_blocked_writers, blocked_writers-1); if (blocked_writers == 1) { - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, + &rwlock->rw_state), &state); + if (rv == -1) { + error = EFAULT; + break; + } for (;;) { - oldstate = casuword32(&rwlock->rw_state, state, - state & ~URWLOCK_WRITE_WAITERS); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state & ~URWLOCK_WRITE_WAITERS); + if (rv == -1) { error = EFAULT; break; } @@ -2603,7 +2721,12 @@ sleep: if (error != 0) break; } - blocked_readers = fuword32(&rwlock->rw_blocked_readers); + rv = fueword32(&rwlock->rw_blocked_readers, + &blocked_readers); + if (rv == -1) { + error = EFAULT; + break; + } } else blocked_readers = 0; @@ -2624,20 +2747,24 @@ do_rw_unlock(struct thread *td, struct urwlock *rwlock) struct umtx_q *uq; uint32_t flags; int32_t state, oldstate; - int error, q, count; + int error, rv, q, count; uq = td->td_umtxq; - flags = fuword32(&rwlock->rw_flags); + error = fueword32(&rwlock->rw_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(rwlock, TYPE_RWLOCK, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + error = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), &state); + if (error == -1) + return (EFAULT); if (state & URWLOCK_WRITE_OWNER) { for (;;) { - oldstate = casuword32(&rwlock->rw_state, state, - state & ~URWLOCK_WRITE_OWNER); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state & ~URWLOCK_WRITE_OWNER); + if (rv == -1) { error = EFAULT; goto out; } @@ -2655,9 +2782,9 @@ do_rw_unlock(struct thread *td, struct urwlock *rwlock) } } else if (URWLOCK_READER_COUNT(state) != 0) { for (;;) { - oldstate = casuword32(&rwlock->rw_state, state, - state - 1); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state - 1); + if (rv == -1) { error = EFAULT; goto out; } @@ -2719,7 +2846,9 @@ do_sem_wait(struct thread *td, struct _usem *sem, struct _umtx_time *timeout) int error; uq = td->td_umtxq; - flags = fuword32(&sem->_flags); + error = fueword32(&sem->_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(sem, TYPE_SEM, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); @@ -2732,14 +2861,14 @@ do_sem_wait(struct thread *td, struct _usem *sem, struct _umtx_time *timeout) umtxq_insert(uq); umtxq_unlock(&uq->uq_key); casuword32(__DEVOLATILE(uint32_t *, &sem->_has_waiters), 0, 1); - count = fuword32(__DEVOLATILE(uint32_t *, &sem->_count)); - if (count != 0) { + error = fueword32(__DEVOLATILE(uint32_t *, &sem->_count), &count); + if (error == -1 || count != 0) { umtxq_lock(&uq->uq_key); umtxq_unbusy(&uq->uq_key); umtxq_remove(uq); umtxq_unlock(&uq->uq_key); umtx_key_release(&uq->uq_key); - return (0); + return (error == -1 ? EFAULT : 0); } umtxq_lock(&uq->uq_key); umtxq_unbusy(&uq->uq_key); @@ -2770,7 +2899,9 @@ do_sem_wake(struct thread *td, struct _usem *sem) int error, cnt; uint32_t flags; - flags = fuword32(&sem->_flags); + error = fueword32(&sem->_flags, &flags); + if (error == -1) + return (EFAULT); if ((error = umtx_key_get(sem, TYPE_SEM, GET_SHARE(flags), &key)) != 0) return (error); umtxq_lock(&key); @@ -2788,6 +2919,8 @@ do_sem_wake(struct thread *td, struct _usem *sem) error = suword32( __DEVOLATILE(uint32_t *, &sem->_has_waiters), 0); umtxq_lock(&key); + if (error == -1) + error = EFAULT; } } umtxq_unbusy(&key); diff --git a/sys/kern/subr_uio.c b/sys/kern/subr_uio.c index f2e6e32..f2bbb0c 100644 --- a/sys/kern/subr_uio.c +++ b/sys/kern/subr_uio.c @@ -7,6 +7,11 @@ * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * + * Copyright (c) 2014 The FreeBSD Foundation + * + * Portions of this software were developed by Konstantin Belousov + * under sponsorship from the FreeBSD Foundation. + * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: @@ -438,3 +443,128 @@ copyout_unmap(struct thread *td, vm_offset_t addr, size_t sz) return (0); } + +#ifdef NO_FUEWORD +/* + * XXXKIB The temporal implementation of fue*() functions which do not + * handle usermode -1 properly, mixing it with the fault code. Keep + * this until MD code is written. Currently sparc64, mips and arm do + * not have proper implementation. + */ + +int +fueword(const void *base, long *val) +{ + long res; + + res = fuword(base); + if (res == -1) + return (-1); + *val = res; + return (0); +} + +int +fueword32(const void *base, int32_t *val) +{ + int32_t res; + + res = fuword32(base); + if (res == -1) + return (-1); + *val = res; + return (0); +} + +#ifdef _LP64 +int +fueword64(const void *base, int64_t *val) +{ + int32_t res; + + res = fuword64(base); + if (res == -1) + return (-1); + *val = res; + return (0); +} +#endif + +int +casueword32(volatile uint32_t *base, uint32_t oldval, uint32_t *oldvalp, + uint32_t newval) +{ + int32_t ov; + + ov = casuword32(base, oldval, newval); + if (ov == -1) + return (-1); + *oldvalp = ov; + return (0); +} + +int +casueword(volatile u_long *p, u_long oldval, u_long *oldvalp, u_long newval) +{ + u_long ov; + + ov = casuword(p, oldval, newval); + if (ov == -1) + return (-1); + *oldvalp = ov; + return (0); +} +#else /* NO_FUEWORD */ +int32_t +fuword32(const void *addr) +{ + int rv; + int32_t val; + + rv = fueword32(addr, &val); + return (rv == -1 ? -1 : val); +} + +#ifdef _LP64 +int64_t +fuword64(const void *addr) +{ + int rv; + int64_t val; + + rv = fueword64(addr, &val); + return (rv == -1 ? -1 : val); +} +#endif /* _LP64 */ + +long +fuword(const void *addr) +{ + long val; + int rv; + + rv = fueword(addr, &val); + return (rv == -1 ? -1 : val); +} + +uint32_t +casuword32(volatile uint32_t *addr, uint32_t old, uint32_t new) +{ + int rv; + uint32_t val; + + rv = casueword32(addr, old, &val, new); + return (rv == -1 ? -1 : val); +} + +u_long +casuword(volatile u_long *addr, u_long old, u_long new) +{ + int rv; + u_long val; + + rv = casueword(addr, old, &val, new); + return (rv == -1 ? -1 : val); +} + +#endif /* NO_FUEWORD */ diff --git a/sys/kern/vfs_acl.c b/sys/kern/vfs_acl.c index 93626fb..e9361e5 100644 --- a/sys/kern/vfs_acl.c +++ b/sys/kern/vfs_acl.c @@ -148,6 +148,7 @@ acl_copyin(void *user_acl, struct acl *kernel_acl, acl_type_t type) static int acl_copyout(struct acl *kernel_acl, void *user_acl, acl_type_t type) { + uint32_t am; int error; struct oldacl old; @@ -162,8 +163,11 @@ acl_copyout(struct acl *kernel_acl, void *user_acl, acl_type_t type) break; default: - if (fuword32((char *)user_acl + - offsetof(struct acl, acl_maxcnt)) != ACL_MAX_ENTRIES) + error = fueword32((char *)user_acl + + offsetof(struct acl, acl_maxcnt), &am); + if (error == -1) + return (EFAULT); + if (am != ACL_MAX_ENTRIES) return (EINVAL); error = copyout(kernel_acl, user_acl, sizeof(*kernel_acl)); diff --git a/sys/mips/include/param.h b/sys/mips/include/param.h index 2d1d7f1..90f3e6f 100644 --- a/sys/mips/include/param.h +++ b/sys/mips/include/param.h @@ -178,4 +178,8 @@ #define pgtok(x) ((x) * (PAGE_SIZE / 1024)) +#ifdef _KERNEL +#define NO_FUEWORD 1 +#endif + #endif /* !_MIPS_INCLUDE_PARAM_H_ */ diff --git a/sys/net/if_spppsubr.c b/sys/net/if_spppsubr.c index 9dc55c5..c0f8e39 100644 --- a/sys/net/if_spppsubr.c +++ b/sys/net/if_spppsubr.c @@ -5060,7 +5060,8 @@ sppp_params(struct sppp *sp, u_long cmd, void *data) * Check the cmd word first before attempting to fetch all the * data. */ - if ((subcmd = fuword(ifr->ifr_data)) == -1) { + rv = fueword(ifr->ifr_data, &subcmd); + if (rv == -1) { rv = EFAULT; goto quit; } diff --git a/sys/powerpc/powerpc/copyinout.c b/sys/powerpc/powerpc/copyinout.c index dcfab80..a337c8b 100644 --- a/sys/powerpc/powerpc/copyinout.c +++ b/sys/powerpc/powerpc/copyinout.c @@ -405,14 +405,13 @@ fubyte(const void *addr) return (val); } -#ifdef __powerpc64__ -int32_t -fuword32(const void *addr) +int +fuword16(const void *addr) { struct thread *td; pmap_t pm; faultbuf env; - int32_t *p, val; + uint16_t *p, val; td = curthread; pm = &td->td_proc->p_vmspace->vm_pmap; @@ -432,15 +431,14 @@ fuword32(const void *addr) td->td_pcb->pcb_onfault = NULL; return (val); } -#endif -long -fuword(const void *addr) +int +fueword32(const void *addr, int32_t *val) { struct thread *td; pmap_t pm; faultbuf env; - long *p, val; + int32_t *p; td = curthread; pm = &td->td_proc->p_vmspace->vm_pmap; @@ -455,22 +453,71 @@ fuword(const void *addr) return (-1); } - val = *p; + *val = *p; td->td_pcb->pcb_onfault = NULL; - return (val); + return (0); } -#ifndef __powerpc64__ -int32_t -fuword32(const void *addr) +#ifdef __powerpc64__ +int +fueword64(const void *addr, int64_t *val) { - return ((int32_t)fuword(addr)); + struct thread *td; + pmap_t pm; + faultbuf env; + int64_t *p; + + td = curthread; + pm = &td->td_proc->p_vmspace->vm_pmap; + + if (setfault(env)) { + td->td_pcb->pcb_onfault = NULL; + return (-1); + } + + if (map_user_ptr(pm, addr, (void **)&p, sizeof(*p), NULL)) { + td->td_pcb->pcb_onfault = NULL; + return (-1); + } + + *val = *p; + + td->td_pcb->pcb_onfault = NULL; + return (0); } #endif -uint32_t -casuword32(volatile uint32_t *addr, uint32_t old, uint32_t new) +int +fueword(const void *addr, long *val) +{ + struct thread *td; + pmap_t pm; + faultbuf env; + long *p; + + td = curthread; + pm = &td->td_proc->p_vmspace->vm_pmap; + + if (setfault(env)) { + td->td_pcb->pcb_onfault = NULL; + return (-1); + } + + if (map_user_ptr(pm, addr, (void **)&p, sizeof(*p), NULL)) { + td->td_pcb->pcb_onfault = NULL; + return (-1); + } + + *val = *p; + + td->td_pcb->pcb_onfault = NULL; + return (0); +} + +int +casueword32(volatile uint32_t *addr, uint32_t old, uint32_t *oldvalp, + uint32_t new) { struct thread *td; pmap_t pm; @@ -507,18 +554,21 @@ casuword32(volatile uint32_t *addr, uint32_t old, uint32_t new) td->td_pcb->pcb_onfault = NULL; - return (val); + *oldvalp = val; + return (0); } #ifndef __powerpc64__ -u_long -casuword(volatile u_long *addr, u_long old, u_long new) +int +casueword(volatile u_long *addr, u_long old, u_long *oldvalp, u_long new) { - return (casuword32((volatile uint32_t *)addr, old, new)); + + return (casueword32((volatile uint32_t *)addr, old, + (uint32_t *)oldvalp, new)); } #else -u_long -casuword(volatile u_long *addr, u_long old, u_long new) +int +casueword(volatile u_long *addr, u_long old, u_long *oldvalp, u_long new) { struct thread *td; pmap_t pm; @@ -555,7 +605,7 @@ casuword(volatile u_long *addr, u_long old, u_long new) td->td_pcb->pcb_onfault = NULL; - return (val); + *oldvalp = val; + return (0); } #endif - diff --git a/sys/sparc64/include/param.h b/sys/sparc64/include/param.h index e59f2c4..46bacae 100644 --- a/sys/sparc64/include/param.h +++ b/sys/sparc64/include/param.h @@ -146,4 +146,8 @@ #define pgtok(x) ((unsigned long)(x) * (PAGE_SIZE / 1024)) +#ifdef _KERNEL +#define NO_FUEWORD 1 +#endif + #endif /* !_SPARC64_INCLUDE_PARAM_H_ */ diff --git a/sys/sys/systm.h b/sys/sys/systm.h index f4eae57..6e5ee61 100644 --- a/sys/sys/systm.h +++ b/sys/sys/systm.h @@ -254,16 +254,23 @@ int copyout_nofault(const void * __restrict kaddr, void * __restrict udaddr, int fubyte(const void *base); long fuword(const void *base); -int fuword16(void *base); +int fuword16(const void *base); int32_t fuword32(const void *base); int64_t fuword64(const void *base); +int fueword(const void *base, long *val); +int fueword32(const void *base, int32_t *val); +int fueword64(const void *base, int64_t *val); int subyte(void *base, int byte); int suword(void *base, long word); int suword16(void *base, int word); int suword32(void *base, int32_t word); int suword64(void *base, int64_t word); uint32_t casuword32(volatile uint32_t *base, uint32_t oldval, uint32_t newval); -u_long casuword(volatile u_long *p, u_long oldval, u_long newval); +u_long casuword(volatile u_long *p, u_long oldval, u_long newval); +int casueword32(volatile uint32_t *base, uint32_t oldval, uint32_t *oldvalp, + uint32_t newval); +int casueword(volatile u_long *p, u_long oldval, u_long *oldvalp, + u_long newval); void realitexpire(void *); From owner-freebsd-arch@FreeBSD.ORG Wed Oct 22 08:29:53 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 24D309E8 for ; Wed, 22 Oct 2014 08:29:53 +0000 (UTC) Received: from mail-wg0-x234.google.com (mail-wg0-x234.google.com [IPv6:2a00:1450:400c:c00::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B59EFA11 for ; Wed, 22 Oct 2014 08:29:52 +0000 (UTC) Received: by mail-wg0-f52.google.com with SMTP id a1so3156451wgh.35 for ; Wed, 22 Oct 2014 01:29:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Plp3RrFmhUNXmHlP00SGLxL0bPSeJtWDLUwEBoOs6Co=; b=kugPpkogs1Vis3pLHaqAvkTwF/JwkvWi0slMU8mlwlIwFRiJJ/vS31P9pnSHIKbkmH G3q/cmjzyOQcMZNUCvjiknCAPDSdFbycVwlecJzlJNKBfN54G10B/OdX/Pxpkii8bok2 SNZXGTTNg2XyiKAKf7q+OcGrQVseQQlM2WP098dkWxQcx7uzQx2YeOcBpkeFs8oKhQOc ktJXwud5FSDf4EZBXd9w0dpMnGhazNffe882m8/+1lfRaG+fGpDUgG8E7TR3EsfmKJlP tEQwZQRV48zJVUTE9GkHcSJ9c2zme+x+YUKxUDFuOoaEjH6w4KnLkoYBoQcN4eyJgt6L V9jQ== MIME-Version: 1.0 X-Received: by 10.180.39.65 with SMTP id n1mr35707252wik.13.1413966590940; Wed, 22 Oct 2014 01:29:50 -0700 (PDT) Received: by 10.194.44.67 with HTTP; Wed, 22 Oct 2014 01:29:50 -0700 (PDT) Date: Wed, 22 Oct 2014 12:29:50 +0400 Message-ID: Subject: PowerPC G5 missed system cache From: =?UTF-8?B?0JTQvNC40YLRgNC40Lkg0JzQvtGB0YLQvtCy0YnQuNC60L7Qsg==?= To: freebsd-arch@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2014 08:29:53 -0000 Hi, I am running FreeBSD-10.0-RELEASE on powerpc64 architecture with two G5 CPUs and zfs as root file system (if it has a meaning). The server is running zabbix server with PostgreSQL database and gather metrics from about twenty servers. I am not familiar too much with this arch and I have faced with following issue - I don't see system cache. Neither top utility, nor sysctl shows any cache size. A top utility just doesn't show this option at all and sysctl shows vm.stats.vm.v_cache_count=3D0. A kernel and a world were built with default configuration because binaries weren't found at the official sites. My knowledge in ppc are weak and probably missed cache it's a feature of this arch. Any attempts to open web interface of zabbix are led to high amount of reading operations. That's why I am worried about memory caching. Can anyone give some explanation of the issue (missed cache) or a link where a can get information for that. Thank you in advance --=20 =D0=A1 =D0=BD=D0=B0=D0=B8=D0=BB=D1=83=D1=87=D1=88=D0=B8=D0=BC=D0=B8 =D0=BF= =D0=BE=D0=B6=D0=B5=D0=BB=D0=B0=D0=BD=D0=B8=D1=8F=D0=BC=D0=B8, =D0=94=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D0=B9 =D0=9C=D0=BE=D1=81=D1=82=D0=BE= =D0=B2=D1=89=D0=B8=D0=BA=D0=BE=D0=B2 Email: dmadm2008@gmail.com --- With best regards, Dmitriy Mostovschikov Email: dmadm2008 at gmail dot com From owner-freebsd-arch@FreeBSD.ORG Wed Oct 22 18:11:13 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C9CFB195; Wed, 22 Oct 2014 18:11:13 +0000 (UTC) Received: from mail-la0-x22b.google.com (mail-la0-x22b.google.com [IPv6:2a00:1450:4010:c03::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BE85AD9E; Wed, 22 Oct 2014 18:11:12 +0000 (UTC) Received: by mail-la0-f43.google.com with SMTP id mc6so3490520lab.30 for ; Wed, 22 Oct 2014 11:11:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=O43OQbEzRDZv+i1juIAVN8ETcl45IPmlcC0CX0zRZyQ=; b=DaQ9YzTFVna8WqLb6//MAMr3JG1uNwMF8MHJzH7qVAHprERlEe02BkUbR92AZ2xcZn OjDjYRSy84foyLYBWYKp02QJ9NOgy8X4O6RhCkHacZ5pxHCmR3wWYTRjjZOqNetnuVmt sgFLZvAUsiKc4s0Bv7R1kSJj9quRUQL66pGn3vSvTaElIuHBBSK8m7da3ti6plQBsAXJ VtPNNN4aX055q+RB9aGyLD+K9Hd6EwdEy3+haoVB6tR6t4FigOAJs9QKsJTYF5CaUd6c /Ltqc3FzIolF03KmpDsK3WE5nb3EufQiTfntypXGts5Vh2IS9iy/f/ejfeh55j7JibwO P8fQ== MIME-Version: 1.0 X-Received: by 10.152.198.166 with SMTP id jd6mr25146295lac.81.1414001470597; Wed, 22 Oct 2014 11:11:10 -0700 (PDT) Sender: crodr001@gmail.com Received: by 10.112.84.197 with HTTP; Wed, 22 Oct 2014 11:11:10 -0700 (PDT) In-Reply-To: <201410210217.s9L2HECn097421@idle.juniper.net> References: <54433211.6000303@freebsd.org> <201410210217.s9L2HECn097421@idle.juniper.net> Date: Wed, 22 Oct 2014 11:11:10 -0700 X-Google-Sender-Auth: OVAE3D2VXfvVdgvxr1GC74tOvs8 Message-ID: Subject: Re: XML Output: libxo - provide single API to output TXT, XML, JSON and HTML From: Craig Rodrigues To: Phil Shafer Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: Marcel Moolenaar , John-Mark Gurney , Alfred Perlstein , "Simon J. Gerraty" , arch@freebsd.org, Poul-Henning Kamp , freebsd-arch , Konstantin Belousov X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2014 18:11:14 -0000 On Mon, Oct 20, 2014 at 7:17 PM, Phil Shafer wrote: > Alfred Perlstein writes: > >It's now been 2 months since the last discussion about this. > >Where is the code from Juniper? > > Code's still in github, if you want to take a look. It's been > pretty much done since the end of August: > > https://github.com/Juniper/libxo/graphs/contributors > > Since this code is on github, and you are doing releases via github at https://github.com/Juniper/libxo/releases , then all that is really required at this point is to do a vendor import to bring it into FreeBSD: https://www.freebsd.org/doc/en/articles/committers-guide/subversion-primer.html#svn-advanced-use-vendor-imports That can be done by any FreeBSD developer, not just Marcel. Putting libxo on github actually makes this a lot easier for FreeBSD developers (and even developers for Linux and other operating systems) to use and integrate. -- Craig From owner-freebsd-arch@FreeBSD.ORG Wed Oct 22 19:16:12 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EDBA211B for ; Wed, 22 Oct 2014 19:16:12 +0000 (UTC) Received: from mail-pa0-f46.google.com (mail-pa0-f46.google.com [209.85.220.46]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C28D96AC for ; Wed, 22 Oct 2014 19:16:12 +0000 (UTC) Received: by mail-pa0-f46.google.com with SMTP id fa1so4297826pad.33 for ; Wed, 22 Oct 2014 12:16:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:content-type :content-transfer-encoding:subject:message-id:date:to:mime-version; bh=9VEtjS76KjVma6l5z83yXTIJnQgtLkX1oi+ChJ4EnJY=; b=JAjGOGplwmSd3ZIWSz+iasUMjRGqImXVjvyqG7Wb/hfluAlGvSV7ciuruYLbCdZGaw C5j1IC42KvYBlIoULhbl2E3YEfz7Vz36JyroeigWmF5/dWRPUKidkDLYaLjo4p61kg8B AvGJjApagR2R0x6k64LDtOB+TTss20kzSJl0VhnX2Ygkw7h65w6ZTAbVaEwxoaBV3i5l daQS955HHbICzKs34CNNwGzis9RUU7UGW1kugNJbt5SZRTg33N2keGSkICrYwDvcKwLg BN4omDuE9ax5V/MWBgISFiFyAlM/HRuFu+NZbXimmfiYxxqAz1AewkHoy8uOhMLPcdT2 MA2g== X-Gm-Message-State: ALoCoQnyXWefvGnHOFriw1mGt9RS/wRDiYJ6uCIz+FuSUNNpeSEXIxwQ2JwVBZe3RXjtOBtWLKiB X-Received: by 10.70.92.68 with SMTP id ck4mr190252pdb.28.1414005371918; Wed, 22 Oct 2014 12:16:11 -0700 (PDT) Received: from macintosh-c42c033c0b73.corp.netflix.com (dc1-prod.netflix.com. [69.53.236.251]) by mx.google.com with ESMTPSA id vt1sm3081295pab.17.2014.10.22.12.16.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 22 Oct 2014 12:16:11 -0700 (PDT) Sender: Warner Losh From: Warner Losh X-Google-Original-From: Warner Losh Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Retiring WITH_INSTALL_AS_USER Message-Id: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> Date: Wed, 22 Oct 2014 13:16:09 -0600 To: FreeBSD Arch Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) X-Mailer: Apple Mail (2.1878.6) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2014 19:16:13 -0000 Greetings, I=92d like to retire WITH_INSTALL_AS_USER. Brooks=92 NO_ROOT (which I=92m = also converting to WITHOUT_ROOT) is much better. WITH_INSTALL_AS_USER is = broken. First, it requites you also define NO_FSCHG. Second, any = Makefile that overrides the owner and also has to include src.opts.mk = early to check options fails. It has been broken for some time (even = before my conversion to src.opts.mk, since bsd.own.mk included earlier = before would cause this). This isn=92t hard to fix (just tedious), but = given the better solution we already have in the tree, it seems the best = thing to do is to just kill it since its day has passed. Comments? Warner= From owner-freebsd-arch@FreeBSD.ORG Wed Oct 22 19:41:06 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 27C005BD for ; Wed, 22 Oct 2014 19:41:06 +0000 (UTC) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id DD5319AE for ; Wed, 22 Oct 2014 19:41:05 +0000 (UTC) Received: from nine.des.no (smtp.des.no [194.63.250.102]) by smtp-int.des.no (Postfix) with ESMTP id 899CCA46F; Wed, 22 Oct 2014 19:41:04 +0000 (UTC) Received: by nine.des.no (Postfix, from userid 1001) id F3E63556F; Wed, 22 Oct 2014 21:40:52 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Warner Losh Subject: Re: Retiring WITH_INSTALL_AS_USER References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> Date: Wed, 22 Oct 2014 21:40:52 +0200 In-Reply-To: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> (Warner Losh's message of "Wed, 22 Oct 2014 13:16:09 -0600") Message-ID: <86tx2vc2pn.fsf@nine.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2014 19:41:06 -0000 Warner Losh writes: > I=E2=80=99d like to retire WITH_INSTALL_AS_USER. Brooks=E2=80=99 NO_ROOT= (which I=E2=80=99m > also converting to WITHOUT_ROOT) is much better. WITH_INSTALL_AS_USER > is broken. [...] This isn=E2=80=99t hard to fix (just tedious) [...] Wow, I actually have a patch for that, but it has probably rotted by now. I think it predates your src.opts.mk work. But if NO_ROOT works, by all means, kill WITH_INSTALL_AS_USER. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Thu Oct 23 05:02:54 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7471C3F9 for ; Thu, 23 Oct 2014 05:02:54 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bon0132.outbound.protection.outlook.com [157.56.111.132]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1DD888A3 for ; Thu, 23 Oct 2014 05:02:53 +0000 (UTC) Received: from BY2PR05CA026.namprd05.prod.outlook.com (10.141.250.16) by BLUPR05MB117.namprd05.prod.outlook.com (10.255.214.11) with Microsoft SMTP Server (TLS) id 15.0.1054.13; Thu, 23 Oct 2014 04:29:25 +0000 Received: from BN1BFFO11FD023.protection.gbl (2a01:111:f400:7c10::1:125) by BY2PR05CA026.outlook.office365.com (2a01:111:e400:2c5f::16) with Microsoft SMTP Server (TLS) id 15.0.1054.13 via Frontend Transport; Thu, 23 Oct 2014 04:29:24 +0000 Received: from P-EMF01-SAC.jnpr.net (66.129.239.15) by BN1BFFO11FD023.mail.protection.outlook.com (10.58.144.86) with Microsoft SMTP Server (TLS) id 15.0.1049.20 via Frontend Transport; Thu, 23 Oct 2014 04:29:23 +0000 Received: from magenta.juniper.net (172.17.27.123) by P-EMF01-SAC.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.146.0; Wed, 22 Oct 2014 21:29:20 -0700 Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id s9N4TJR66221; Wed, 22 Oct 2014 21:29:19 -0700 (PDT) (envelope-from sjg@juniper.net) Received: from chaos (localhost [127.0.0.1]) by chaos.jnpr.net (Postfix) with ESMTP id EC597580A3; Wed, 22 Oct 2014 21:29:18 -0700 (PDT) To: Warner Losh Subject: Re: Retiring WITH_INSTALL_AS_USER In-Reply-To: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> Comments: In-reply-to: Warner Losh message dated "Wed, 22 Oct 2014 13:16:09 -0600." From: "Simon J. Gerraty" X-Mailer: MH-E 8.0.3; nmh 1.3; GNU Emacs 22.3.1 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Date: Wed, 22 Oct 2014 21:29:18 -0700 Message-ID: <21044.1414038558@chaos> X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:66.129.239.15; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(199003)(189002)(24454002)(35774003)(120916001)(99396003)(97736003)(92566001)(20776003)(76176999)(50986999)(50466002)(117636001)(76506005)(105596002)(110136001)(106466001)(81156004)(57986006)(50226001)(4396001)(64706001)(21056001)(92726001)(47776003)(88136002)(77156001)(85306004)(6806004)(19580405001)(19580395003)(69596002)(87286001)(23676002)(68736004)(44976005)(89996001)(76482002)(104166001)(86362001)(93916002)(62966002)(31966008)(80022003)(95666004)(85852003)(46102003)(87936001)(33716001)(84676001)(42262002)(62816006); DIR:OUT; SFP:1102; SCL:1; SRVR:BLUPR05MB117; H:P-EMF01-SAC.jnpr.net; FPR:; MLV:sfv; PTR:InfoDomainNonexistent; A:1; MX:1; LANG:en; X-Microsoft-Antispam: UriScan:; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:BLUPR05MB117; X-Exchange-Antispam-Report-Test: UriScan:; X-Forefront-PRVS: 0373D94D15 Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.15 as permitted sender) Authentication-Results: spf=softfail (sender IP is 66.129.239.15) smtp.mailfrom=sjg@juniper.net; X-OriginatorOrg: juniper.net Cc: FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2014 05:02:54 -0000 Warner Losh wrote: > I=E2=80=99d like to retire WITH_INSTALL_AS_USER. Brooks=E2=80=99 NO_ROOT = (which I=E2=80=99m > also converting to WITHOUT_ROOT) is much better. WITH_INSTALL_AS_USER Hmm I don't see anything in share/mk for NO_ROOT. I see some tweaks in src/Makefile.inc1 though. Is there more somewhere? > is broken. First, it requites you also define NO_FSCHG. Second, any > Makefile that overrides the owner and also has to include src.opts.mk > early to check options fails. It has been broken for some time (even > before my conversion to src.opts.mk, since bsd.own.mk included earlier Sorry I hadn't noticed - but I think the only place we seriously exercise it is in src/include and etc for distrib-dirs. Losing the ablity to do that would be a problem. > Comments? My only concern would be if it somehow depends on top-level build it won't work for those of us who do that or at least not using Makefile.inc1 What can I look at to better understand NO_ROOT ? From owner-freebsd-arch@FreeBSD.ORG Thu Oct 23 13:45:33 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C82C9F97 for ; Thu, 23 Oct 2014 13:45:33 +0000 (UTC) Received: from mail-ie0-f180.google.com (mail-ie0-f180.google.com [209.85.223.180]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8EA0390E for ; Thu, 23 Oct 2014 13:45:33 +0000 (UTC) Received: by mail-ie0-f180.google.com with SMTP id at20so934900iec.25 for ; Thu, 23 Oct 2014 06:45:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=HG+RGxc/paz36afxE0hNNWscA75O7dwxPXRKQwRoo5Y=; b=UuHdWCcjLOoBlsVy0nW0ncFSex7Y1QO3yvT0OTYm5KDWmbyD/dZv4VS4TsI8WO29Tq GKzjlZLnWLn6+VRgqPLsWWnQa8wYdQ3RtP5CvWZONBxasoNAUcQF/qmYDXGbR+2CfadE Xm+9aNkBuQdR+kvjTn2qn4iAnxSpNV4vfVe6XtnDApvXs0alMcxnpZEMwpw0k8VBN0O0 ufUhNmsiJvwqY7AhvZRrZzwKO077bxjWows44tJ1F3WcPr5xyo+oPHpDAU3XxCJQJrkl MMXl5lONAjlwaH90A6GyjhREM+41wgYsdfOz9LxiIbkBbPO+94i8LbHhQVGgcHAd+3PH Xscg== X-Gm-Message-State: ALoCoQm8h5+HF4ulWkRqgToCR64YdnFVbSzKvIpnxRra0yvSKT4WYz7Eye5lRAxC5WmmHUPGSQZg X-Received: by 10.50.6.100 with SMTP id z4mr12472867igz.37.1414071932202; Thu, 23 Oct 2014 06:45:32 -0700 (PDT) Received: from netflix-mac.bsdimp.com ([50.253.99.174]) by mx.google.com with ESMTPSA id e5sm2213682igr.14.2014.10.23.06.45.31 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 23 Oct 2014 06:45:31 -0700 (PDT) Sender: Warner Losh Content-Type: multipart/signed; boundary="Apple-Mail=_E0E7882C-4F02-457D-84FD-BF2151D2CB55"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Retiring WITH_INSTALL_AS_USER From: Warner Losh In-Reply-To: <21044.1414038558@chaos> Date: Thu, 23 Oct 2014 07:45:30 -0600 Message-Id: References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> <21044.1414038558@chaos> To: "Simon J. Gerraty" X-Mailer: Apple Mail (2.1878.6) Cc: FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2014 13:45:33 -0000 --Apple-Mail=_E0E7882C-4F02-457D-84FD-BF2151D2CB55 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Oct 22, 2014, at 10:29 PM, Simon J. Gerraty wrote: > Warner Losh wrote: >=20 >> I=92d like to retire WITH_INSTALL_AS_USER. Brooks=92 NO_ROOT (which = I=92m >> also converting to WITHOUT_ROOT) is much better. WITH_INSTALL_AS_USER >=20 > Hmm I don't see anything in share/mk for NO_ROOT. > I see some tweaks in src/Makefile.inc1 though. > Is there more somewhere? Makefile.inc1 and etc/Makefile. It will woo be in share/mk. >> is broken. First, it requites you also define NO_FSCHG. Second, any >> Makefile that overrides the owner and also has to include src.opts.mk >> early to check options fails. It has been broken for some time (even >> before my conversion to src.opts.mk, since bsd.own.mk included = earlier >=20 > Sorry I hadn't noticed - but I think the only place we seriously > exercise it is in src/include and etc for distrib-dirs. > Losing the ablity to do that would be a problem. If it is in the tree, it needs to work. It is broken in about a dozen = places now. Perhaps not the ones that you use. >> Comments? >=20 > My only concern would be if it somehow depends on top-level build it > won't work for those of us who do that or at least not using > Makefile.inc1 >=20 > What can I look at to better understand NO_ROOT ? Makefile.inc1 is the only place it is documented right now. NO_ROOT = creates a METADATA file for the attributes of the file and does simple = copies instead. This lets you build entirely as an unpriv=92d user, but = still use makefs to get a filesystem with the proper attributes. In many = ways it is what you want, and you could get what you want by specifying = /dev/null for that METADATA if it were more tightly coupled. Warner --Apple-Mail=_E0E7882C-4F02-457D-84FD-BF2151D2CB55 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJUSQZ6AAoJEGwc0Sh9sBEABKUP/1P2FE5v/ThQKJL3Q5bW6sgn gLL3x5KeAuQiHgolMmDs6ZjS1LFVn5d04l2LYabF+U/J2dnhoQMsncW9xVp+jpCs 1Kzx2byFGTcChOVJlS3i/wy4h6Ajj+dSY9uCJYrAiyqB/pbSBjvWlXmCb8/ImTqk ut21kYBaPQ1kLmNC0P+7hPQfLJz3PK2tlypQTczOil2U1L5X7pGKeJdGrNZEOX94 6eM8C9O7w1lPVQ8Q3RkGP1VaKmUA6hrF1kkm9MNNEnInDQpHEnUUVKpIdvi0aLc6 QWobNuA+Sh13vyAFtnJejK0rGr2V3w1oCdVKXH8qreGozklxjhWvcN+/gpc6e/p2 l2oFfvbpyZ1PUqxKTnuL2g1Nl9XYWucIrs1uPUBy1DDzWONmjF/mA10jnU7m0yWe yLghfi932ZO0HqwJUQEg0rtHBRA0dGkLLScwJKCYzLit7RQoKZyDfAm92QXakJZc Bt/fm6nMnJ0z8dKN6wlWLXX5YttV04Ip4/VhTmjuYdS9mZliAqQe0NNE9PjQBtsW 6noJSd5WooMC6QYtzrRO2BeGGEcFiEqdHEesX5fXi3jwNQg9PMKis45n77v5I4HN fGCP37j1WBHwcpyPHBNc4jSRjhiChRWk7UwCmRsStRI1VsHlz4fPHtaSXqRD2jwx iTyoVcbTr8da2BddPe5f =tp3b -----END PGP SIGNATURE----- --Apple-Mail=_E0E7882C-4F02-457D-84FD-BF2151D2CB55-- From owner-freebsd-arch@FreeBSD.ORG Thu Oct 23 16:33:07 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9933A287 for ; Thu, 23 Oct 2014 16:33:07 +0000 (UTC) Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2on0103.outbound.protection.outlook.com [207.46.100.103]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6274CE46 for ; Thu, 23 Oct 2014 16:33:06 +0000 (UTC) Received: from BY2PR05CA045.namprd05.prod.outlook.com (10.141.250.35) by BY2PR05MB128.namprd05.prod.outlook.com (10.242.38.25) with Microsoft SMTP Server (TLS) id 15.0.1054.13; Thu, 23 Oct 2014 15:00:12 +0000 Received: from BN1BFFO11FD019.protection.gbl (2a01:111:f400:7c10::1:157) by BY2PR05CA045.outlook.office365.com (2a01:111:e400:2c5f::35) with Microsoft SMTP Server (TLS) id 15.1.6.9 via Frontend Transport; Thu, 23 Oct 2014 15:00:12 +0000 Received: from P-EMF01-SAC.jnpr.net (66.129.239.15) by BN1BFFO11FD019.mail.protection.outlook.com (10.58.144.82) with Microsoft SMTP Server (TLS) id 15.0.1049.20 via Frontend Transport; Thu, 23 Oct 2014 15:00:12 +0000 Received: from magenta.juniper.net (172.17.27.123) by P-EMF01-SAC.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.146.0; Thu, 23 Oct 2014 07:59:40 -0700 Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id s9NEwtR67647; Thu, 23 Oct 2014 07:59:35 -0700 (PDT) (envelope-from sjg@juniper.net) Received: from chaos (localhost [127.0.0.1]) by chaos.jnpr.net (Postfix) with ESMTP id 75523580A3; Thu, 23 Oct 2014 07:58:55 -0700 (PDT) To: Warner Losh Subject: Re: Retiring WITH_INSTALL_AS_USER In-Reply-To: References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> <21044.1414038558@chaos> Comments: In-reply-to: Warner Losh message dated "Thu, 23 Oct 2014 07:45:30 -0600." From: "Simon J. Gerraty" X-Mailer: MH-E 8.0.3; nmh 1.3; GNU Emacs 22.3.1 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Date: Thu, 23 Oct 2014 07:58:55 -0700 Message-ID: <9250.1414076335@chaos> X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:66.129.239.15; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(199003)(189002)(24454002)(76506005)(93916002)(97736003)(87936001)(19580395003)(81156004)(106466001)(19580405001)(46102003)(33716001)(89996001)(6806004)(92566001)(69596002)(107046002)(102836001)(4396001)(21056001)(44976005)(68736004)(105596002)(92726001)(84676001)(50986999)(88136002)(80022003)(87286001)(76176999)(50466002)(86362001)(23676002)(77156001)(104166001)(57986006)(64706001)(85852003)(99396003)(110136001)(62966002)(47776003)(50226001)(85306004)(95666004)(31966008)(20776003)(76482002)(120916001)(117636001)(62816006)(42262002); DIR:OUT; SFP:1102; SCL:1; SRVR:BY2PR05MB128; H:P-EMF01-SAC.jnpr.net; FPR:; MLV:sfv; PTR:InfoDomainNonexistent; A:1; MX:1; LANG:en; X-Microsoft-Antispam: UriScan:; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:BY2PR05MB128; X-Exchange-Antispam-Report-Test: UriScan:; X-Forefront-PRVS: 0373D94D15 Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.15 as permitted sender) Authentication-Results: spf=softfail (sender IP is 66.129.239.15) smtp.mailfrom=sjg@juniper.net; X-OriginatorOrg: juniper.net Cc: FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2014 16:33:07 -0000 Warner Losh wrote: > If it is in the tree, it needs to work.=20 No argument there. > It is broken in about a dozen places > now. Perhaps not the ones that you use. Hmm I have it permanently set in a projects/bmake tree that builds buildworld etc fine (while producing meta files) - though its been a month or two since last sync. Internally we have it set in head trees too. I don't doubt there's something lacking - just haven't noticed, sorry. > Makefile.inc1 is the only place it is documented right now. NO_ROOT > creates a METADATA file for the attributes of the file and does simple > copies instead. This lets you build entirely as an unpriv=E2=80=99d user,= but > still use makefs to get a filesystem with the proper attributes. In > many ways it is what you want, and you could get what you want by > specifying /dev/null for that METADATA if it were more tightly > coupled.=20 Sounds ok.=20 Hmm etc/Makefile looks like it lost the ability to run mtree safely=20 in a cross-build env? The MTREE_FILTER stuff ensures that mtree doesn't choke on unknown users and such. How is that handled now? From owner-freebsd-arch@FreeBSD.ORG Thu Oct 23 22:14:36 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 42BFFFBD; Thu, 23 Oct 2014 22:14:36 +0000 (UTC) Received: from pp1.rice.edu (proofpoint1.mail.rice.edu [128.42.201.100]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 036E0A0F; Thu, 23 Oct 2014 22:14:35 +0000 (UTC) Received: from pps.filterd (pp1.rice.edu [127.0.0.1]) by pp1.rice.edu (8.14.5/8.14.5) with SMTP id s9NMD0sH022264; Thu, 23 Oct 2014 17:14:27 -0500 Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10]) by pp1.rice.edu with ESMTP id 1q76u685ky-1; Thu, 23 Oct 2014 17:14:26 -0500 X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel Received: from 108-254-203-201.lightspeed.hstntx.sbcglobal.net (108-254-203-201.lightspeed.hstntx.sbcglobal.net [108.254.203.201]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh3.mail.rice.edu (Postfix) with ESMTPSA id 489484003F; Thu, 23 Oct 2014 17:14:26 -0500 (CDT) Message-ID: <54497DC1.5070506@rice.edu> Date: Thu, 23 Oct 2014 17:14:25 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Svatopluk Kraus Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE References: <5428AF3B.1030906@rice.edu> In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: multipart/mixed; boundary="------------070109070505060304030005" X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=11 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1410230150 Cc: alc@freebsd.org, FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2014 22:14:36 -0000 This is a multi-part message in MIME format. --------------070109070505060304030005 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 10/08/2014 10:38, Svatopluk Kraus wrote: > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox wrote: > >> On 09/27/2014 03:51, Svatopluk Kraus wrote: >> >> >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox wrote: >> >>> >>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus >>> wrote: >>> >>>> Hi, >>>> >>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem >>>> we've >>>> dealt with somehow, but now we would like to do it better. It's about >>>> physical pages which are allocated before vm subsystem is initialized. >>>> While later on these pages could be found in vm_page_array when >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for >>>> VM_PHYSSEG_SPARSE >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model. >>>> >>>> It really would be nice to utilize vm_page_array for such preallocated >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. Things >>>> could be much easier then. In our case, it's about pages which are used >>>> for >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of such >>>> pages. First ones are preallocated and second ones are allocated after vm >>>> subsystem was inited. We must deal with each set differently. So code is >>>> more complex and so is debugging. >>>> >>>> Thus we need some method how to say that some part of physical memory >>>> should be included in vm_page_array, but the pages from that region >>>> should >>>> not be put to free list during initialization. We think that such >>>> possibility could be utilized in general. There could be a need for some >>>> physical space which: >>>> >>>> (1) is needed only during boot and later on it can be freed and put to vm >>>> subsystem, >>>> >>>> (2) is needed for something else and vm_page_array code could be used >>>> without some kind of its duplication. >>>> >>>> There is already some code which deals with blacklisted pages in >>>> vm_page.c >>>> file. So the easiest way how to deal with presented situation is to add >>>> some callback to this part of code which will be able to either exclude >>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the >>>> biggest >>>> phys_avail region is used for vm subsystem allocations, there should be >>>> some more coding. (However, blacklisted pages are not dealt with on that >>>> part of region.) >>>> >>>> We would like to know if there is any objection: >>>> >>>> (1) to deal with presented problem, >>>> (2) to deal with the problem presented way. >>>> Some help is very appreciated. Thanks >>>> >>>> >>> As an experiment, try modifying vm_phys.c to use dump_avail instead of >>> phys_avail when sizing vm_page_array. On amd64, where the same problem >>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right now, this is >>> probably my preferred solution. The catch being that not all architectures >>> implement dump_avail, but my recollection is that arm does. >>> >> Frankly, I would prefer this too, but there is one big open question: >> >> What is dump_avail for? >> >> >> >> dump_avail[] is solving a similar problem in the minidump code, hence, the >> prefix "dump_" in its name. In other words, the minidump code couldn't use >> phys_avail[] either because it didn't describe the full range of physical >> addresses that might be included in a minidump, so dump_avail[] was created. >> >> There is already precedent for what I'm suggesting. dump_avail[] is >> already (ab)used outside of the minidump code on x86 to solve this same >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c. >> >> >> Using it for vm_page_array initialization and segmentation means that >> phys_avail must be a subset of it. And this must be stated and be visible >> enough. Maybe it should be even checked in code. I like the idea of >> thinking about dump_avail as something what desribes all memory in a >> system, but it's not how dump_avail is defined in archs now. >> >> >> >> When you say "it's not how dump_avail is defined in archs now", I'm not >> sure whether you're talking about the code or the comments. In terms of >> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of any >> code that would have to change. In terms of comments, I did a grep looking >> for comments defining what dump_avail[] is, because I couldn't remember >> any. I found one ... on arm. So, I don't think it's a onerous task >> changing the definition of dump_avail[]. :-) >> >> Already, as things stand today with dump_avail[] being used outside of the >> minidump code, one could reasonably argue that it should be renamed to >> something like phys_exists[]. >> >> >> >> I will experiment with it on monday then. However, it's not only about how >> memory segments are created in vm_phys.c, but it's about how vm_page_array >> size is computed in vm_page.c too. >> >> >> >> Yes, and there is also a place in vm_reserv.c that needs to change. I've >> attached the patch that I developed and tested a long time ago. It still >> applies cleanly and runs ok on amd64. >> >> >> > > > Well, I've created and tested minimalistic patch which - I hope - is > commitable. It runs ok on pandaboard (arm-v6) and solves presented problem. > I would really appreciate if this will be commited. Thanks. Sorry for the slow reply. I've just been swamped with work lately. I finally had some time to look at this in the last day or so. The first thing that I propose to do is commit the attached patch. This patch changes pmap_init() on amd64, armv6, and i386 so that it no longer consults phys_avail[] to determine the end of memory. Instead, it calls a new function provided by vm_phys.c to obtain the same information from vm_phys_segs[]. With this change, the new variable phys_managed in your patch wouldn't need to be a global. It could be a local variable in vm_page_startup() that we pass as a parameter to vm_phys_init() and vm_reserv_init(). More generally, the long-term vision that I have is that we would stop using phys_avail[] after vm_page_startup() had completed. It would only be used during initialization. After that we would use vm_phys_segs[] and functions provided by vm_phys.c. > > BTW, while I was inspecting all archs, I think that maybe it's time to do > what was done for busdma not long ago. There are many similar codes across > archs which deal with physical memory and could be generalized and put to > kern/subr_physmem.c for utilization. All work with physical memory could be > simplify to two arrays of regions. > > phys_present[] ... describes all present physical memory regions > phys_exclude[] ... describes various exclusions from phys_present[] > > Each excluded region will be labeled by flags to say what kind of exclusion > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW could > be combined. This idea is taken from sys/arm/arm/physmem.c. > > All other arrays like phys_managed[], phys_avail[], dump_avail[] will be > created from these phys_present[] and phys_exclude[]. > This way bootstrap codes in archs could be simplified and unified. For > example, dealing with either hw.physmem or page with PA 0x00000000 could be > transparent. > > I'm prepared to volunteer if the thing is ripe. However, some tutor will be > looked for. I've never really looked at arm/arm/physmem.c before. Let me do that before I comment on this. --------------070109070505060304030005 Content-Type: text/plain; charset=ISO-8859-15; name="vm_phys_get_end1.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="vm_phys_get_end1.patch" SW5kZXg6IGFtZDY0L2FtZDY0L3BtYXAuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBhbWQ2NC9hbWQ2 NC9wbWFwLmMJKHJldmlzaW9uIDI3MzU1MCkKKysrIGFtZDY0L2FtZDY0L3BtYXAuYwkod29y a2luZyBjb3B5KQpAQCAtMTMwLDYgKzEzMCw3IEBAIF9fRkJTRElEKCIkRnJlZUJTRCQiKTsK ICNpbmNsdWRlIDx2bS92bV9leHRlcm4uaD4KICNpbmNsdWRlIDx2bS92bV9wYWdlb3V0Lmg+ CiAjaW5jbHVkZSA8dm0vdm1fcGFnZXIuaD4KKyNpbmNsdWRlIDx2bS92bV9waHlzLmg+CiAj aW5jbHVkZSA8dm0vdm1fcmFkaXguaD4KICNpbmNsdWRlIDx2bS92bV9yZXNlcnYuaD4KICNp bmNsdWRlIDx2bS91bWEuaD4KQEAgLTEwNjAsOCArMTA2MSw3IEBAIHBtYXBfaW5pdCh2b2lk KQogCS8qCiAJICogQ2FsY3VsYXRlIHRoZSBzaXplIG9mIHRoZSBwdiBoZWFkIHRhYmxlIGZv ciBzdXBlcnBhZ2VzLgogCSAqLwotCWZvciAoaSA9IDA7IHBoeXNfYXZhaWxbaSArIDFdOyBp ICs9IDIpOwotCXB2X25wZyA9IHJvdW5kXzJtcGFnZShwaHlzX2F2YWlsWyhpIC0gMikgKyAx XSkgLyBOQlBEUjsKKwlwdl9ucGcgPSByb3VuZF8ybXBhZ2Uodm1fcGh5c19nZXRfZW5kKCkp IC8gTkJQRFI7CiAKIAkvKgogCSAqIEFsbG9jYXRlIG1lbW9yeSBmb3IgdGhlIHB2IGhlYWQg dGFibGUgZm9yIHN1cGVycGFnZXMuCkluZGV4OiBhcm0vYXJtL3BtYXAtdjYuYwo9PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09Ci0tLSBhcm0vYXJtL3BtYXAtdjYuYwkocmV2aXNpb24gMjczNTUwKQorKysgYXJt L2FybS9wbWFwLXY2LmMJKHdvcmtpbmcgY29weSkKQEAgLTE3Miw2ICsxNzIsNyBAQCBfX0ZC U0RJRCgiJEZyZWVCU0QkIik7CiAjaW5jbHVkZSA8dm0vdm1fbWFwLmg+CiAjaW5jbHVkZSA8 dm0vdm1fcGFnZS5oPgogI2luY2x1ZGUgPHZtL3ZtX3BhZ2VvdXQuaD4KKyNpbmNsdWRlIDx2 bS92bV9waHlzLmg+CiAjaW5jbHVkZSA8dm0vdm1fZXh0ZXJuLmg+CiAjaW5jbHVkZSA8dm0v dm1fcmVzZXJ2Lmg+CiAKQEAgLTEzNDMsOCArMTM0NCw3IEBAIHBtYXBfaW5pdCh2b2lkKQog CS8qCiAJICogQ2FsY3VsYXRlIHRoZSBzaXplIG9mIHRoZSBwdiBoZWFkIHRhYmxlIGZvciBz dXBlcnBhZ2VzLgogCSAqLwotCWZvciAoaSA9IDA7IHBoeXNfYXZhaWxbaSArIDFdOyBpICs9 IDIpOwotCXB2X25wZyA9IHJvdW5kXzFtcGFnZShwaHlzX2F2YWlsWyhpIC0gMikgKyAxXSkg LyBOQlBEUjsKKwlwdl9ucGcgPSByb3VuZF8xbXBhZ2Uodm1fcGh5c19nZXRfZW5kKCkpIC8g TkJQRFI7CiAKIAkvKgogCSAqIEFsbG9jYXRlIG1lbW9yeSBmb3IgdGhlIHB2IGhlYWQgdGFi bGUgZm9yIHN1cGVycGFnZXMuCkluZGV4OiBpMzg2L2kzODYvcG1hcC5jCj09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT0KLS0tIGkzODYvaTM4Ni9wbWFwLmMJKHJldmlzaW9uIDI3MzU1MCkKKysrIGkzODYvaTM4 Ni9wbWFwLmMJKHdvcmtpbmcgY29weSkKQEAgLTEzMyw2ICsxMzMsNyBAQCBfX0ZCU0RJRCgi JEZyZWVCU0QkIik7CiAjaW5jbHVkZSA8dm0vdm1fZXh0ZXJuLmg+CiAjaW5jbHVkZSA8dm0v dm1fcGFnZW91dC5oPgogI2luY2x1ZGUgPHZtL3ZtX3BhZ2VyLmg+CisjaW5jbHVkZSA8dm0v dm1fcGh5cy5oPgogI2luY2x1ZGUgPHZtL3ZtX3JhZGl4Lmg+CiAjaW5jbHVkZSA8dm0vdm1f cmVzZXJ2Lmg+CiAjaW5jbHVkZSA8dm0vdW1hLmg+CkBAIC03NzksOCArNzgwLDcgQEAgcG1h cF9pbml0KHZvaWQpCiAJLyoKIAkgKiBDYWxjdWxhdGUgdGhlIHNpemUgb2YgdGhlIHB2IGhl YWQgdGFibGUgZm9yIHN1cGVycGFnZXMuCiAJICovCi0JZm9yIChpID0gMDsgcGh5c19hdmFp bFtpICsgMV07IGkgKz0gMik7Ci0JcHZfbnBnID0gcm91bmRfNG1wYWdlKHBoeXNfYXZhaWxb KGkgLSAyKSArIDFdKSAvIE5CUERSOworCXB2X25wZyA9IHJvdW5kXzRtcGFnZSh2bV9waHlz X2dldF9lbmQoKSkgLyBOQlBEUjsKIAogCS8qCiAJICogQWxsb2NhdGUgbWVtb3J5IGZvciB0 aGUgcHYgaGVhZCB0YWJsZSBmb3Igc3VwZXJwYWdlcy4KSW5kZXg6IHZtL3ZtX3BoeXMuYwo9 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09Ci0tLSB2bS92bV9waHlzLmMJKHJldmlzaW9uIDI3MzU1MCkKKysrIHZt L3ZtX3BoeXMuYwkod29ya2luZyBjb3B5KQpAQCAtODU2LDYgKzg1NiwxNyBAQCB2bV9waHlz X2ZyZWVfY29udGlnKHZtX3BhZ2VfdCBtLCB1X2xvbmcgbnBhZ2VzKQogfQogCiAvKgorICog UmV0dXJuIHRoZSBwaHlzaWNhbCBhZGRyZXNzIG9mIHRoZSBlbmQgb2YgbWVtb3J5LCB0aGF0 IGlzLCB0aGUgYWRkcmVzcyBvZgorICogdGhlIGxhc3QgdXNlYWJsZSBieXRlIG9mIFJBTSBw bHVzIG9uZS4KKyAqLwordm1fcGFkZHJfdAordm1fcGh5c19nZXRfZW5kKHZvaWQpCit7CisK KwlyZXR1cm4gKHZtX3BoeXNfc2Vnc1t2bV9waHlzX25zZWdzIC0gMV0uZW5kKTsKK30KKwor LyoKICAqIFNldCB0aGUgcG9vbCBmb3IgYSBjb250aWd1b3VzLCBwb3dlciBvZiB0d28tc2l6 ZWQgc2V0IG9mIHBoeXNpY2FsIHBhZ2VzLiAKICAqLwogdm9pZApJbmRleDogdm0vdm1fcGh5 cy5oCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT0KLS0tIHZtL3ZtX3BoeXMuaAkocmV2aXNpb24gMjczNTUwKQor Kysgdm0vdm1fcGh5cy5oCSh3b3JraW5nIGNvcHkpCkBAIC04MCw2ICs4MCw3IEBAIHZvaWQg dm1fcGh5c19maWN0aXRpb3VzX3VucmVnX3JhbmdlKHZtX3BhZGRyX3Qgc3RhCiB2bV9wYWdl X3Qgdm1fcGh5c19maWN0aXRpb3VzX3RvX3ZtX3BhZ2Uodm1fcGFkZHJfdCBwYSk7CiB2b2lk IHZtX3BoeXNfZnJlZV9jb250aWcodm1fcGFnZV90IG0sIHVfbG9uZyBucGFnZXMpOwogdm9p ZCB2bV9waHlzX2ZyZWVfcGFnZXModm1fcGFnZV90IG0sIGludCBvcmRlcik7Cit2bV9wYWRk cl90IHZtX3BoeXNfZ2V0X2VuZCh2b2lkKTsKIHZvaWQgdm1fcGh5c19pbml0KHZvaWQpOwog dm1fcGFnZV90IHZtX3BoeXNfcGFkZHJfdG9fdm1fcGFnZSh2bV9wYWRkcl90IHBhKTsKIHZv aWQgdm1fcGh5c19zZXRfcG9vbChpbnQgcG9vbCwgdm1fcGFnZV90IG0sIGludCBvcmRlcik7 Cg== --------------070109070505060304030005-- From owner-freebsd-arch@FreeBSD.ORG Fri Oct 24 08:59:56 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 18D14B40 for ; Fri, 24 Oct 2014 08:59:56 +0000 (UTC) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id D05FBE0E for ; Fri, 24 Oct 2014 08:59:55 +0000 (UTC) Received: from nine.des.no (smtp.des.no [194.63.250.102]) by smtp-int.des.no (Postfix) with ESMTP id B5389AA83; Fri, 24 Oct 2014 08:59:53 +0000 (UTC) Received: by nine.des.no (Postfix, from userid 1001) id 0C7C91036D; Fri, 24 Oct 2014 10:59:43 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: "Simon J. Gerraty" Subject: Re: Retiring WITH_INSTALL_AS_USER References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> <21044.1414038558@chaos> <9250.1414076335@chaos> Date: Fri, 24 Oct 2014 10:59:42 +0200 In-Reply-To: <9250.1414076335@chaos> (Simon J. Gerraty's message of "Thu, 23 Oct 2014 07:58:55 -0700") Message-ID: <86wq7p4zcx.fsf@nine.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Oct 2014 08:59:56 -0000 "Simon J. Gerraty" writes: > Hmm I have it permanently set in a projects/bmake tree that builds > buildworld etc fine (while producing meta files) - though its been a > month or two since last sync. WITH_INSTALL_AS_USER does not modify buildworld, so that's moot. It modifies installworld, and it does not work: % make installworld WITH_INSTALL_AS_USER=3DYES DESTDIR=3D/tmp/wiau [...] =3D=3D=3D> lib/libc (install) install -C -o des -g des -m 444 libc.a /tmp/wiau/usr/lib install -o des -g des -m 444 -fschg -S libc.so.7 /tmp/wiau/lib install: /tmp/wiau/lib/libc.so.7: chflags: Operation not permitted *** Error code 71 All it does is set the default *BIN / *GRP to the user name and primary group name of the current user. It does not prevent Makefiles from overriding them (or from setting chflags like libc does). DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Fri Oct 24 11:33:17 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DA091EA8; Fri, 24 Oct 2014 11:33:17 +0000 (UTC) Received: from mail-yh0-x22c.google.com (mail-yh0-x22c.google.com [IPv6:2607:f8b0:4002:c01::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8A5D93D2; Fri, 24 Oct 2014 11:33:17 +0000 (UTC) Received: by mail-yh0-f44.google.com with SMTP id i57so406783yha.31 for ; Fri, 24 Oct 2014 04:33:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=2MaDnmzdq1JKr1oAl2BoE9Avux+lNTyaJx40wtam2ds=; b=zKEvLM7v2hh3Ismqe16ybfNAjS1p0KZUuVHwiwrTacMo8/IYzg2D8zaJQSWIxpDDJY E0QwZKZIAJFhMrgajq7qvRW6CimHPhpiZQmx9uYWZcstngri/M66flXjTgceMgrKVDQ/ +dFZ84Pcuv5BD5vUvHYeverZ/WkC8i23esEyauYRDkvujDTcIQphqSz9Mm+8Dc5AUm2m ZYGFbbRZEZRBo8+95ATmiJuR442sZT+SpkE9CvG1DMHhXhmgb7T2zo0Gv3CdTCn6CqlW nv7qbXanJZjKKi6gPaS38UBTcStZV1Yd078cJIBt7v5JGulcTS39EkHxQj/gFvTSHtow rJMQ== MIME-Version: 1.0 X-Received: by 10.170.205.129 with SMTP id w123mr5886180yke.0.1414150395114; Fri, 24 Oct 2014 04:33:15 -0700 (PDT) Received: by 10.140.23.242 with HTTP; Fri, 24 Oct 2014 04:33:15 -0700 (PDT) In-Reply-To: <54497DC1.5070506@rice.edu> References: <5428AF3B.1030906@rice.edu> <54497DC1.5070506@rice.edu> Date: Fri, 24 Oct 2014 13:33:15 +0200 Message-ID: Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE From: Svatopluk Kraus To: Alan Cox Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: alc@freebsd.org, FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Oct 2014 11:33:17 -0000 On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox wrote: > On 10/08/2014 10:38, Svatopluk Kraus wrote: > > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox wrote: > > > >> On 09/27/2014 03:51, Svatopluk Kraus wrote: > >> > >> > >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox wrote: > >> > >>> > >>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem > >>>> we've > >>>> dealt with somehow, but now we would like to do it better. It's about > >>>> physical pages which are allocated before vm subsystem is initialized. > >>>> While later on these pages could be found in vm_page_array when > >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for > >>>> VM_PHYSSEG_SPARSE > >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model. > >>>> > >>>> It really would be nice to utilize vm_page_array for such preallocated > >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. > Things > >>>> could be much easier then. In our case, it's about pages which are > used > >>>> for > >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of > such > >>>> pages. First ones are preallocated and second ones are allocated > after vm > >>>> subsystem was inited. We must deal with each set differently. So code > is > >>>> more complex and so is debugging. > >>>> > >>>> Thus we need some method how to say that some part of physical memory > >>>> should be included in vm_page_array, but the pages from that region > >>>> should > >>>> not be put to free list during initialization. We think that such > >>>> possibility could be utilized in general. There could be a need for > some > >>>> physical space which: > >>>> > >>>> (1) is needed only during boot and later on it can be freed and put > to vm > >>>> subsystem, > >>>> > >>>> (2) is needed for something else and vm_page_array code could be used > >>>> without some kind of its duplication. > >>>> > >>>> There is already some code which deals with blacklisted pages in > >>>> vm_page.c > >>>> file. So the easiest way how to deal with presented situation is to > add > >>>> some callback to this part of code which will be able to either > exclude > >>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the > >>>> biggest > >>>> phys_avail region is used for vm subsystem allocations, there should > be > >>>> some more coding. (However, blacklisted pages are not dealt with on > that > >>>> part of region.) > >>>> > >>>> We would like to know if there is any objection: > >>>> > >>>> (1) to deal with presented problem, > >>>> (2) to deal with the problem presented way. > >>>> Some help is very appreciated. Thanks > >>>> > >>>> > >>> As an experiment, try modifying vm_phys.c to use dump_avail instead of > >>> phys_avail when sizing vm_page_array. On amd64, where the same problem > >>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right now, this is > >>> probably my preferred solution. The catch being that not all > architectures > >>> implement dump_avail, but my recollection is that arm does. > >>> > >> Frankly, I would prefer this too, but there is one big open question: > >> > >> What is dump_avail for? > >> > >> > >> > >> dump_avail[] is solving a similar problem in the minidump code, hence, > the > >> prefix "dump_" in its name. In other words, the minidump code couldn't > use > >> phys_avail[] either because it didn't describe the full range of > physical > >> addresses that might be included in a minidump, so dump_avail[] was > created. > >> > >> There is already precedent for what I'm suggesting. dump_avail[] is > >> already (ab)used outside of the minidump code on x86 to solve this same > >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c. > >> > >> > >> Using it for vm_page_array initialization and segmentation means that > >> phys_avail must be a subset of it. And this must be stated and be > visible > >> enough. Maybe it should be even checked in code. I like the idea of > >> thinking about dump_avail as something what desribes all memory in a > >> system, but it's not how dump_avail is defined in archs now. > >> > >> > >> > >> When you say "it's not how dump_avail is defined in archs now", I'm not > >> sure whether you're talking about the code or the comments. In terms of > >> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of > any > >> code that would have to change. In terms of comments, I did a grep > looking > >> for comments defining what dump_avail[] is, because I couldn't remember > >> any. I found one ... on arm. So, I don't think it's a onerous task > >> changing the definition of dump_avail[]. :-) > >> > >> Already, as things stand today with dump_avail[] being used outside of > the > >> minidump code, one could reasonably argue that it should be renamed to > >> something like phys_exists[]. > >> > >> > >> > >> I will experiment with it on monday then. However, it's not only about > how > >> memory segments are created in vm_phys.c, but it's about how > vm_page_array > >> size is computed in vm_page.c too. > >> > >> > >> > >> Yes, and there is also a place in vm_reserv.c that needs to change. > I've > >> attached the patch that I developed and tested a long time ago. It > still > >> applies cleanly and runs ok on amd64. > >> > >> > >> > > > > > > Well, I've created and tested minimalistic patch which - I hope - is > > commitable. It runs ok on pandaboard (arm-v6) and solves presented > problem. > > I would really appreciate if this will be commited. Thanks. > > > Sorry for the slow reply. I've just been swamped with work lately. I > finally had some time to look at this in the last day or so. > > The first thing that I propose to do is commit the attached patch. This > patch changes pmap_init() on amd64, armv6, and i386 so that it no longer > consults phys_avail[] to determine the end of memory. Instead, it calls > a new function provided by vm_phys.c to obtain the same information from > vm_phys_segs[]. > > With this change, the new variable phys_managed in your patch wouldn't > need to be a global. It could be a local variable in vm_page_startup() > that we pass as a parameter to vm_phys_init() and vm_reserv_init(). > > More generally, the long-term vision that I have is that we would stop > using phys_avail[] after vm_page_startup() had completed. It would only > be used during initialization. After that we would use vm_phys_segs[] > and functions provided by vm_phys.c. > I understand. The patch and the long-term vision are fine for me. I just was not to bold to pass phys_managed as a parameter to vm_phys_init() and vm_reserv_init(). However, I certainly was thinking about it. While reading comment above vm_phys_get_end(), do we care of if last usable address is 0xFFFFFFFF? Do you think that the rest of my patch considering changes due to your patch is ok? > > > > BTW, while I was inspecting all archs, I think that maybe it's time to do > > what was done for busdma not long ago. There are many similar codes > across > > archs which deal with physical memory and could be generalized and put to > > kern/subr_physmem.c for utilization. All work with physical memory could > be > > simplify to two arrays of regions. > > > > phys_present[] ... describes all present physical memory regions > > phys_exclude[] ... describes various exclusions from phys_present[] > > > > Each excluded region will be labeled by flags to say what kind of > exclusion > > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW could > > be combined. This idea is taken from sys/arm/arm/physmem.c. > > > > All other arrays like phys_managed[], phys_avail[], dump_avail[] will be > > created from these phys_present[] and phys_exclude[]. > > This way bootstrap codes in archs could be simplified and unified. For > > example, dealing with either hw.physmem or page with PA 0x00000000 could > be > > transparent. > > > > I'm prepared to volunteer if the thing is ripe. However, some tutor will > be > > looked for. > > > I've never really looked at arm/arm/physmem.c before. Let me do that > before I comment on this. > > No problem. This could be long-term aim. However, I hope the VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present time. In every case, thanks for your help. From owner-freebsd-arch@FreeBSD.ORG Fri Oct 24 15:53:27 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BE31ED03 for ; Fri, 24 Oct 2014 15:53:27 +0000 (UTC) Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2on0122.outbound.protection.outlook.com [207.46.100.122]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 690EE7B1 for ; Fri, 24 Oct 2014 15:53:26 +0000 (UTC) Received: from BLUPR05CA0069.namprd05.prod.outlook.com (10.141.20.39) by BLUPR05MB118.namprd05.prod.outlook.com (10.255.214.16) with Microsoft SMTP Server (TLS) id 15.0.1054.13; Fri, 24 Oct 2014 15:53:19 +0000 Received: from BN1AFFO11FD060.protection.gbl (2a01:111:f400:7c10::138) by BLUPR05CA0069.outlook.office365.com (2a01:111:e400:855::39) with Microsoft SMTP Server (TLS) id 15.1.6.9 via Frontend Transport; Fri, 24 Oct 2014 15:53:19 +0000 Received: from P-EMF01-SAC.jnpr.net (66.129.239.15) by BN1AFFO11FD060.mail.protection.outlook.com (10.58.53.75) with Microsoft SMTP Server (TLS) id 15.0.1049.20 via Frontend Transport; Fri, 24 Oct 2014 15:53:19 +0000 Received: from magenta.juniper.net (172.17.27.123) by P-EMF01-SAC.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.146.0; Fri, 24 Oct 2014 08:53:18 -0700 Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id s9OFrHR75391; Fri, 24 Oct 2014 08:53:17 -0700 (PDT) (envelope-from sjg@juniper.net) Received: from chaos (localhost [127.0.0.1]) by chaos.jnpr.net (Postfix) with ESMTP id D2D34580A3; Fri, 24 Oct 2014 08:53:16 -0700 (PDT) To: =?us-ascii?Q?=3D=3Futf-8=3FQ=3FDag-Erling=5FSm=3DC3=3DB8rgrav=3F=3D?= Subject: Re: Retiring WITH_INSTALL_AS_USER In-Reply-To: <86wq7p4zcx.fsf@nine.des.no> References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> <21044.1414038558@chaos> <9250.1414076335@chaos> <86wq7p4zcx.fsf@nine.des.no> Comments: In-reply-to: =?us-ascii?Q?=3D=3Futf-8=3FQ=3FDag-Erling=5FSm=3DC3?= =?us-ascii?Q?=3DB8rgrav=3F=3D?= message dated "Fri, 24 Oct 2014 10:59:42 +0200." From: "Simon J. Gerraty" X-Mailer: MH-E 8.0.3; nmh 1.3; GNU Emacs 22.3.1 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Date: Fri, 24 Oct 2014 08:53:16 -0700 Message-ID: <10072.1414165996@chaos> X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:66.129.239.15; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(199003)(24454002)(189002)(51704005)(102836001)(76506005)(57986006)(105596002)(46102003)(77156001)(107046002)(85852003)(88136002)(87936001)(99396003)(31966008)(120916001)(89996001)(87286001)(106466001)(81156004)(76482002)(50466002)(95666004)(104166001)(69596002)(93886004)(76176999)(117636001)(50986999)(80022003)(110136001)(86362001)(93916002)(64706001)(62966002)(85306004)(97736003)(92726001)(20776003)(84676001)(47776003)(92566001)(19580395003)(6806004)(68736004)(19580405001)(33716001)(4396001)(44976005)(23756003)(21056001)(50226001)(62816006)(42262002); DIR:OUT; SFP:1102; SCL:1; SRVR:BLUPR05MB118; H:P-EMF01-SAC.jnpr.net; FPR:; MLV:sfv; PTR:InfoDomainNonexistent; A:1; MX:1; LANG:en; X-Microsoft-Antispam: UriScan:; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:BLUPR05MB118; X-Exchange-Antispam-Report-Test: UriScan:; X-Forefront-PRVS: 0374433C81 Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.15 as permitted sender) Authentication-Results: spf=softfail (sender IP is 66.129.239.15) smtp.mailfrom=sjg@juniper.net; X-OriginatorOrg: juniper.net Cc: FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Oct 2014 15:53:27 -0000 Dag-Erling Sm=F8rgrav wrote: > "Simon J. Gerraty" writes: > > Hmm I have it permanently set in a projects/bmake tree that builds > > buildworld etc fine (while producing meta files) - though its been a > > month or two since last sync. >=20 > WITH_INSTALL_AS_USER does not modify buildworld, so that's moot. It > modifies installworld, and it does not work: Ah I never tried that=20 Though I guess the knob name implies it should do something useful. Ok, that explains the "brokenness" I wasn't aware of. > All it does is set the default *BIN / *GRP to the user name and primary > group name of the current user. It does not prevent Makefiles from > overriding them (or from setting chflags like libc does). If that were desired, it would be simple enough to add them to=20 .MAKEOVERRIDES (treats them as though they were set on command line) But it sounds like NO_ROOT already solves this issue? From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 02:28:14 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 432E97B7 for ; Sat, 25 Oct 2014 02:28:14 +0000 (UTC) Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com [IPv6:2a00:1450:400c:c05::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D24D3EF5 for ; Sat, 25 Oct 2014 02:28:13 +0000 (UTC) Received: by mail-wi0-f171.google.com with SMTP id em10so2529675wid.10 for ; Fri, 24 Oct 2014 19:28:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=aecj7wA5gTFAHiyNXTyeldZ2loAkPMWzcxe16D5yhfs=; b=ygIrRAD3pCf0FaRwCXR4P2B1O/J6Qp0mnxxcGIqQW12z6C5z9SKohaxTB31+fiqrSm BCsJIK9xqtcNGa4E2IhoMGqD+bd7UKYpWTNQ2xay9k5d6Je0Y4RWlXnI7C6qx2tdgC57 Ds2T1ORxmpUfJZezl40TCyZJqApTY5vIV7HjrFa1FZRdWBtkODg+FtSkMwTMW9OhV0fM kgso2HBJvMpUpD+vSrCXerxzzyo3zIfYyT6FavASQJ4I0rBjhuIKctF3z8WexWsxYFpM cN8RSuJwgb+G/cl506pUnC01lkLQ7BbKcw1mFBwWM3PYVtVSppA7s2mTl63/aHWclSKW El1Q== X-Received: by 10.180.74.142 with SMTP id t14mr7554382wiv.17.1414204092088; Fri, 24 Oct 2014 19:28:12 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id fa7sm7430018wjd.27.2014.10.24.19.28.10 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Fri, 24 Oct 2014 19:28:11 -0700 (PDT) Date: Sat, 25 Oct 2014 04:28:09 +0200 From: Mateusz Guzik To: freebsd-arch@freebsd.org Subject: syscalls from loadable modules compiled in statically into the kernel Message-ID: <20141025022808.GA14551@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 02:28:14 -0000 The kernel has the following mechanism: int syscall_thread_enter(struct thread *td, struct sysent *se) { u_int32_t cnt, oldcnt; do { oldcnt = se->sy_thrcnt; if ((oldcnt & SY_THR_STATIC) != 0) return (0); if ((oldcnt & (SY_THR_DRAINING | SY_THR_ABSENT)) != 0) return (ENOSYS); cnt = oldcnt + SY_THR_INCR; } while (atomic_cmpset_acq_32(&se->sy_thrcnt, oldcnt, cnt) == 0); return (0); } Except it turns out that it is used even if given module (here: sysvshm) is compiled in statically. So my proposal is to give modules an easy way to tell whether they got compiled in and extend syscall_register interface so that it would allow registering static syscalls. The latter could also be used by modules which are loadable, but don't support unloads. I don't have any good idea how to provide aforementioned detection method though. Also, please see https://reviews.freebsd.org/D1007 which moves SY_THR_STATIC check to an inline function, saving us 2 function calls on each syscall. -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 09:22:40 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DE3E2398 for ; Sat, 25 Oct 2014 09:22:40 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 53494B2A for ; Sat, 25 Oct 2014 09:22:40 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9P9MYre036967 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 25 Oct 2014 12:22:35 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9P9MYre036967 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s9P9MYKD036966; Sat, 25 Oct 2014 12:22:34 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 25 Oct 2014 12:22:34 +0300 From: Konstantin Belousov To: Mateusz Guzik Subject: Re: syscalls from loadable modules compiled in statically into the kernel Message-ID: <20141025092234.GI1877@kib.kiev.ua> References: <20141025022808.GA14551@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141025022808.GA14551@dft-labs.eu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 09:22:41 -0000 On Sat, Oct 25, 2014 at 04:28:09AM +0200, Mateusz Guzik wrote: > The kernel has the following mechanism: > > int > syscall_thread_enter(struct thread *td, struct sysent *se) > { > u_int32_t cnt, oldcnt; > > do { > oldcnt = se->sy_thrcnt; > if ((oldcnt & SY_THR_STATIC) != 0) > return (0); > if ((oldcnt & (SY_THR_DRAINING | SY_THR_ABSENT)) != 0) > return (ENOSYS); > cnt = oldcnt + SY_THR_INCR; > } while (atomic_cmpset_acq_32(&se->sy_thrcnt, oldcnt, cnt) == 0); > return (0); > } > > Except it turns out that it is used even if given module (here: sysvshm) is > compiled in statically. > > So my proposal is to give modules an easy way to tell whether they got > compiled in and extend syscall_register interface so that it would allow > registering static syscalls. > > The latter could also be used by modules which are loadable, but don't > support unloads. > > I don't have any good idea how to provide aforementioned detection > method though. The method would be a combination of some change to syscall_register() and #ifdef KLD_MODULE. Look at the sys/conf.h MAKEDEV_ETERNAL_KLD definition, which provides similar in spirit optimization for non-destructable cdevs. > > Also, please see https://reviews.freebsd.org/D1007 which moves > SY_THR_STATIC check to an inline function, saving us 2 function calls on > each syscall. Did you benchmarked this ? I dislike the code bloat. From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 13:20:44 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7BCF120F for ; Sat, 25 Oct 2014 13:20:44 +0000 (UTC) Received: from mail-wg0-x233.google.com (mail-wg0-x233.google.com [IPv6:2a00:1450:400c:c00::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 14136760 for ; Sat, 25 Oct 2014 13:20:43 +0000 (UTC) Received: by mail-wg0-f51.google.com with SMTP id b13so2776434wgh.22 for ; Sat, 25 Oct 2014 06:20:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=RRrYWh3AWV3ZNE09mM1Il03iZtqdhpGjMAIDhcuN1vE=; b=nVPkfZGI4yHKGp9KkB+uY5jKTw6WLQJNnL507XqJT69AD/tXDdeaKFm6UGt1dQYn6o jDzvlsKouN8fhnPKZkNWp/tqUv2xdW+bMHE3yk+I60X9Ki9jwa1QhczoiPE91P+E216c D+qvCKHw2LGv10Ri4vSzvXWutOzwFP39awPglZWMGsJq9UmUw/cPPSuK0O2340WNhZ/l qOjA3RHU+cq8JNcn/JmTUjCmm12tnU5TwMF2OlPcStlqWTpOQZOop+9ol/MtXub8BZCQ CACtakSGVVUMFcZwIe1WbAOoDYdu0zxpSlZWQkvOFrQ7B16F4ssfyPyNaUQ6c4h++1ge Mf5A== X-Received: by 10.180.212.42 with SMTP id nh10mr10296876wic.52.1414243242298; Sat, 25 Oct 2014 06:20:42 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id f7sm5121911wiz.13.2014.10.25.06.20.41 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 25 Oct 2014 06:20:41 -0700 (PDT) Date: Sat, 25 Oct 2014 15:20:39 +0200 From: Mateusz Guzik To: Konstantin Belousov Subject: Re: syscalls from loadable modules compiled in statically into the kernel Message-ID: <20141025132039.GA20599@dft-labs.eu> References: <20141025022808.GA14551@dft-labs.eu> <20141025092234.GI1877@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20141025092234.GI1877@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 13:20:44 -0000 On Sat, Oct 25, 2014 at 12:22:34PM +0300, Konstantin Belousov wrote: > On Sat, Oct 25, 2014 at 04:28:09AM +0200, Mateusz Guzik wrote: > > The kernel has the following mechanism: > > > > int > > syscall_thread_enter(struct thread *td, struct sysent *se) > > { > > u_int32_t cnt, oldcnt; > > > > do { > > oldcnt = se->sy_thrcnt; > > if ((oldcnt & SY_THR_STATIC) != 0) > > return (0); > > if ((oldcnt & (SY_THR_DRAINING | SY_THR_ABSENT)) != 0) > > return (ENOSYS); > > cnt = oldcnt + SY_THR_INCR; > > } while (atomic_cmpset_acq_32(&se->sy_thrcnt, oldcnt, cnt) == 0); > > return (0); > > } > > > > Except it turns out that it is used even if given module (here: sysvshm) is > > compiled in statically. > > > > So my proposal is to give modules an easy way to tell whether they got > > compiled in and extend syscall_register interface so that it would allow > > registering static syscalls. > > > > The latter could also be used by modules which are loadable, but don't > > support unloads. > > > > I don't have any good idea how to provide aforementioned detection > > method though. > The method would be a combination of some change to syscall_register() > and #ifdef KLD_MODULE. Look at the sys/conf.h MAKEDEV_ETERNAL_KLD > definition, which provides similar in spirit optimization for > non-destructable cdevs. > Ok, so I'll add sysctl_register_flags and SY_THR_STATIC_KLD + making sure SY_THR_STATIC cannot be unregistered. > > > > Also, please see https://reviews.freebsd.org/D1007 which moves > > SY_THR_STATIC check to an inline function, saving us 2 function calls on > > each syscall. > > Did you benchmarked this ? I dislike the code bloat. with syscall_timing from tools/tools ministat says +4% for getuid and +1 for pipe+close. -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 14:16:03 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4443DA17 for ; Sat, 25 Oct 2014 14:16:03 +0000 (UTC) Received: from mail-wg0-x232.google.com (mail-wg0-x232.google.com [IPv6:2a00:1450:400c:c00::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C43D5BFE for ; Sat, 25 Oct 2014 14:16:02 +0000 (UTC) Received: by mail-wg0-f50.google.com with SMTP id z12so302709wgg.9 for ; Sat, 25 Oct 2014 07:16:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=2cX1r8SY89mZ5Qlcy73rZDdEb7vhJHXuH2+njkv4XBk=; b=Z5vCwre8FWloMyKxZ5dV7wuV7oaMi5eHJWyUq4w6EGaAywlCstvoiOsg8AC61EbVHz a9r7z7Dok1/TMdgMXGXy0kjqHDOt9llTDE32D0fWprS4+Q36gjAymCCaV09CqHnu94+P btqsLkixezPmsdxQBfUgDF/rXPDZuAR3d/iW47gb6hIseBU+Alpteu2s0+ObXhhrt4Gv g0VMB16IPbGVSmWtaE22RMA5iLu9OPtVpzZnEMr/ZG9Aveg+S0D4Mk4eKV6QLttA6Mqe RjTdn6vmf/QnzuWJ5OlJmYQdYPAua9hQ7LHKWUqbCiHScsKO1igBF2rQQGZxMGOS2aJn NyOw== X-Received: by 10.180.149.208 with SMTP id uc16mr10364962wib.23.1414246560887; Sat, 25 Oct 2014 07:16:00 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id fq1sm5243041wib.12.2014.10.25.07.15.59 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 25 Oct 2014 07:16:00 -0700 (PDT) Date: Sat, 25 Oct 2014 16:15:57 +0200 From: Mateusz Guzik To: Konstantin Belousov Subject: Re: syscalls from loadable modules compiled in statically into the kernel Message-ID: <20141025141557.GB20599@dft-labs.eu> References: <20141025022808.GA14551@dft-labs.eu> <20141025092234.GI1877@kib.kiev.ua> <20141025132039.GA20599@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20141025132039.GA20599@dft-labs.eu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 14:16:03 -0000 On Sat, Oct 25, 2014 at 03:20:39PM +0200, Mateusz Guzik wrote: > On Sat, Oct 25, 2014 at 12:22:34PM +0300, Konstantin Belousov wrote: > > On Sat, Oct 25, 2014 at 04:28:09AM +0200, Mateusz Guzik wrote: > > > The kernel has the following mechanism: > > > > > > int > > > syscall_thread_enter(struct thread *td, struct sysent *se) > > > { > > > u_int32_t cnt, oldcnt; > > > > > > do { > > > oldcnt = se->sy_thrcnt; > > > if ((oldcnt & SY_THR_STATIC) != 0) > > > return (0); > > > if ((oldcnt & (SY_THR_DRAINING | SY_THR_ABSENT)) != 0) > > > return (ENOSYS); > > > cnt = oldcnt + SY_THR_INCR; > > > } while (atomic_cmpset_acq_32(&se->sy_thrcnt, oldcnt, cnt) == 0); > > > return (0); > > > } > > > > > > Except it turns out that it is used even if given module (here: sysvshm) is > > > compiled in statically. > > > > > > So my proposal is to give modules an easy way to tell whether they got > > > compiled in and extend syscall_register interface so that it would allow > > > registering static syscalls. > > > > > > The latter could also be used by modules which are loadable, but don't > > > support unloads. > > > > > > I don't have any good idea how to provide aforementioned detection > > > method though. > > The method would be a combination of some change to syscall_register() > > and #ifdef KLD_MODULE. Look at the sys/conf.h MAKEDEV_ETERNAL_KLD > > definition, which provides similar in spirit optimization for > > non-destructable cdevs. > > > > Ok, so I'll add sysctl_register_flags and SY_THR_STATIC_KLD + making > sure SY_THR_STATIC cannot be unregistered. > Turns out freebsd32 duplicates a lot of code and didn't receive some fixes regular syscall table support got. I decided to just patch it up without fixing that for now. diff --git a/sys/compat/freebsd32/freebsd32_misc.c b/sys/compat/freebsd32/freebsd32_misc.c index d909a71..8fe4b83 100644 --- a/sys/compat/freebsd32/freebsd32_misc.c +++ b/sys/compat/freebsd32/freebsd32_misc.c @@ -2627,9 +2627,13 @@ freebsd32_xxx(struct thread *td, struct freebsd32_xxx_args *uap) #endif int -syscall32_register(int *offset, struct sysent *new_sysent, - struct sysent *old_sysent) +syscall32_register_flags(int *offset, struct sysent *new_sysent, + struct sysent *old_sysent, int flags) { + + if ((flags & ~SY_THR_STATIC) != 0) + return (EINVAL); + if (*offset == NO_SYSCALL) { int i; @@ -2648,15 +2652,26 @@ syscall32_register(int *offset, struct sysent *new_sysent, *old_sysent = freebsd32_sysent[*offset]; freebsd32_sysent[*offset] = *new_sysent; + atomic_store_rel_32(&freebsd32_sysent[*offset].sy_thrcnt, flags); return 0; } int +syscall32_register(int *offset, struct sysent *new_sysent, + struct sysent *old_sysent) +{ + + return (syscall32_register_flags(offset, new_sysent, old_sysent, 0)); +} + +int syscall32_deregister(int *offset, struct sysent *old_sysent) { - if (*offset) - freebsd32_sysent[*offset] = *old_sysent; + if (*offset == 0) + return (0); + + freebsd32_sysent[*offset] = *old_sysent; return 0; } @@ -2707,14 +2722,14 @@ syscall32_module_handler(struct module *mod, int what, void *arg) } int -syscall32_helper_register(struct syscall_helper_data *sd) +syscall32_helper_register_flags(struct syscall_helper_data *sd, int flags) { struct syscall_helper_data *sd1; int error; for (sd1 = sd; sd1->syscall_no != NO_SYSCALL; sd1++) { - error = syscall32_register(&sd1->syscall_no, &sd1->new_sysent, - &sd1->old_sysent); + error = syscall32_register_flags(&sd1->syscall_no, + &sd1->new_sysent, &sd1->old_sysent, flags); if (error != 0) { syscall32_helper_unregister(sd); return (error); @@ -2725,6 +2740,13 @@ syscall32_helper_register(struct syscall_helper_data *sd) } int +syscall32_helper_register(struct syscall_helper_data *sd) +{ + + return (syscall32_helper_register_flags(sd, 0)); +} + +int syscall32_helper_unregister(struct syscall_helper_data *sd) { struct syscall_helper_data *sd1; diff --git a/sys/compat/freebsd32/freebsd32_util.h b/sys/compat/freebsd32/freebsd32_util.h index a5945cf..e473ef6 100644 --- a/sys/compat/freebsd32/freebsd32_util.h +++ b/sys/compat/freebsd32/freebsd32_util.h @@ -97,10 +97,14 @@ SYSCALL32_MODULE(syscallname, \ .syscall_no = FREEBSD32_SYS_##syscallname \ } +int syscall32_register_flags(int *offset, struct sysent *new_sysent, + struct sysent *old_sysent, int flags); int syscall32_register(int *offset, struct sysent *new_sysent, struct sysent *old_sysent); int syscall32_deregister(int *offset, struct sysent *old_sysent); int syscall32_module_handler(struct module *mod, int what, void *arg); +int syscall32_helper_register_flags(struct syscall_helper_data *sd, + int flags); int syscall32_helper_register(struct syscall_helper_data *sd); int syscall32_helper_unregister(struct syscall_helper_data *sd); diff --git a/sys/kern/kern_syscalls.c b/sys/kern/kern_syscalls.c index 03f6088..30dd203 100644 --- a/sys/kern/kern_syscalls.c +++ b/sys/kern/kern_syscalls.c @@ -104,11 +104,14 @@ syscall_thread_exit(struct thread *td, struct sysent *se) } int -syscall_register(int *offset, struct sysent *new_sysent, - struct sysent *old_sysent) +syscall_register_flags(int *offset, struct sysent *new_sysent, + struct sysent *old_sysent, int flags) { int i; + if ((flags & ~SY_THR_STATIC) != 0) + return (EINVAL); + if (*offset == NO_SYSCALL) { for (i = 1; i < SYS_MAXSYSCALL; ++i) if (sysent[i].sy_call == (sy_call_t *)lkmnosys) @@ -127,18 +130,31 @@ syscall_register(int *offset, struct sysent *new_sysent, *old_sysent = sysent[*offset]; new_sysent->sy_thrcnt = SY_THR_ABSENT; sysent[*offset] = *new_sysent; - atomic_store_rel_32(&sysent[*offset].sy_thrcnt, 0); + atomic_store_rel_32(&sysent[*offset].sy_thrcnt, flags); return (0); } int +syscall_register(int *offset, struct sysent *new_sysent, + struct sysent *old_sysent) +{ + + return (syscall_register_flags(offset, new_sysent, old_sysent, 0)); +} + +int syscall_deregister(int *offset, struct sysent *old_sysent) { + struct sysent *se; - if (*offset) { - syscall_thread_drain(&sysent[*offset]); - sysent[*offset] = *old_sysent; - } + if (*offset == 0) + return (0); /* XXX? */ + + se = &sysent[*offset]; + if ((se->sy_thrcnt & SY_THR_STATIC) != 0) + return (EINVAL); + syscall_thread_drain(se); + sysent[*offset] = *old_sysent; return (0); } @@ -190,14 +206,14 @@ syscall_module_handler(struct module *mod, int what, void *arg) } int -syscall_helper_register(struct syscall_helper_data *sd) +syscall_helper_register_flags(struct syscall_helper_data *sd, int flags) { struct syscall_helper_data *sd1; int error; for (sd1 = sd; sd1->syscall_no != NO_SYSCALL; sd1++) { - error = syscall_register(&sd1->syscall_no, &sd1->new_sysent, - &sd1->old_sysent); + error = syscall_register_flags(&sd1->syscall_no, + &sd1->new_sysent, &sd1->old_sysent, flags); if (error != 0) { syscall_helper_unregister(sd); return (error); @@ -208,6 +224,13 @@ syscall_helper_register(struct syscall_helper_data *sd) } int +syscall_helper_register(struct syscall_helper_data *sd) +{ + + return (syscall_helper_register_flags(sd, 0)); +} + +int syscall_helper_unregister(struct syscall_helper_data *sd) { struct syscall_helper_data *sd1; diff --git a/sys/kern/sysv_msg.c b/sys/kern/sysv_msg.c index a572a0e..4fc04bc 100644 --- a/sys/kern/sysv_msg.c +++ b/sys/kern/sysv_msg.c @@ -252,11 +252,12 @@ msginit() } mtx_init(&msq_mtx, "msq", NULL, MTX_DEF); - error = syscall_helper_register(msg_syscalls); + error = syscall_helper_register_flags(msg_syscalls, SY_THR_STATIC_KLD); if (error != 0) return (error); #ifdef COMPAT_FREEBSD32 - error = syscall32_helper_register(msg32_syscalls); + error = syscall32_helper_register_flags(msg32_syscalls, + SY_THR_STATIC_KLD); if (error != 0) return (error); #endif diff --git a/sys/kern/sysv_sem.c b/sys/kern/sysv_sem.c index c632902..dc1c66a 100644 --- a/sys/kern/sysv_sem.c +++ b/sys/kern/sysv_sem.c @@ -278,11 +278,12 @@ seminit(void) semexit_tag = EVENTHANDLER_REGISTER(process_exit, semexit_myhook, NULL, EVENTHANDLER_PRI_ANY); - error = syscall_helper_register(sem_syscalls); + error = syscall_helper_register_flags(sem_syscalls, SY_THR_STATIC_KLD); if (error != 0) return (error); #ifdef COMPAT_FREEBSD32 - error = syscall32_helper_register(sem32_syscalls); + error = syscall32_helper_register_flags(sem32_syscalls, + SY_THR_STATIC_KLD); if (error != 0) return (error); #endif diff --git a/sys/kern/sysv_shm.c b/sys/kern/sysv_shm.c index 3480d11..b3132c3 100644 --- a/sys/kern/sysv_shm.c +++ b/sys/kern/sysv_shm.c @@ -910,11 +910,12 @@ shminit() shmexit_hook = &shmexit_myhook; shmfork_hook = &shmfork_myhook; - error = syscall_helper_register(shm_syscalls); + error = syscall_helper_register_flags(shm_syscalls, SY_THR_STATIC_KLD); if (error != 0) return (error); #ifdef COMPAT_FREEBSD32 - error = syscall32_helper_register(shm32_syscalls); + error = syscall32_helper_register_flags(shm32_syscalls, + SY_THR_STATIC_KLD); if (error != 0) return (error); #endif diff --git a/sys/netinet/sctp_syscalls.c b/sys/netinet/sctp_syscalls.c index 3d0f549..8170ca8 100644 --- a/sys/netinet/sctp_syscalls.c +++ b/sys/netinet/sctp_syscalls.c @@ -94,11 +94,11 @@ sctp_syscalls_init(void *unused __unused) { int error; - error = syscall_helper_register(sctp_syscalls); + error = syscall_helper_register_flags(sctp_syscalls, SY_THR_STATIC); KASSERT((error == 0), ("%s: syscall_helper_register failed for sctp syscalls", __func__)); #ifdef COMPAT_FREEBSD32 - error = syscall32_helper_register(sctp_syscalls); + error = syscall32_helper_register_flags(sctp_syscalls, SY_THR_STATIC); KASSERT((error == 0), ("%s: syscall32_helper_register failed for sctp syscalls", __func__)); diff --git a/sys/sys/sysent.h b/sys/sys/sysent.h index 0f1c256..12bc518 100644 --- a/sys/sys/sysent.h +++ b/sys/sys/sysent.h @@ -76,6 +76,12 @@ struct sysent { /* system call table */ #define SY_THR_ABSENT 0x4 #define SY_THR_INCR 0x8 +#ifdef KLD_MODULE +#define SY_THR_STATIC_KLD 0 +#else +#define SY_THR_STATIC_KLD 0x1 +#endif + struct image_params; struct __sigset; struct syscall_args; @@ -241,10 +247,13 @@ struct syscall_helper_data { .syscall_no = NO_SYSCALL \ } +int syscall_register_flags(int *offset, struct sysent *new_sysent, + struct sysent *old_sysent, int flags); int syscall_register(int *offset, struct sysent *new_sysent, struct sysent *old_sysent); int syscall_deregister(int *offset, struct sysent *old_sysent); int syscall_module_handler(struct module *mod, int what, void *arg); +int syscall_helper_register_flags(struct syscall_helper_data *sd, int flags); int syscall_helper_register(struct syscall_helper_data *sd); int syscall_helper_unregister(struct syscall_helper_data *sd); > > > > > > Also, please see https://reviews.freebsd.org/D1007 which moves > > > SY_THR_STATIC check to an inline function, saving us 2 function calls on > > > each syscall. > > > > Did you benchmarked this ? I dislike the code bloat. > > with syscall_timing from tools/tools ministat says +4% for getuid and +1 > for pipe+close. > > -- > Mateusz Guzik -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 17:51:50 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 94F108D8 for ; Sat, 25 Oct 2014 17:51:50 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E63CB30C for ; Sat, 25 Oct 2014 17:51:49 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9PHpf4a049399 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 25 Oct 2014 20:51:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9PHpf4a049399 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s9PHpfni049398; Sat, 25 Oct 2014 20:51:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 25 Oct 2014 20:51:41 +0300 From: Konstantin Belousov To: Mateusz Guzik Subject: Re: syscalls from loadable modules compiled in statically into the kernel Message-ID: <20141025175141.GJ1877@kib.kiev.ua> References: <20141025022808.GA14551@dft-labs.eu> <20141025092234.GI1877@kib.kiev.ua> <20141025132039.GA20599@dft-labs.eu> <20141025141557.GB20599@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141025141557.GB20599@dft-labs.eu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 17:51:50 -0000 On Sat, Oct 25, 2014 at 04:15:57PM +0200, Mateusz Guzik wrote: > On Sat, Oct 25, 2014 at 03:20:39PM +0200, Mateusz Guzik wrote: > > On Sat, Oct 25, 2014 at 12:22:34PM +0300, Konstantin Belousov wrote: > > > On Sat, Oct 25, 2014 at 04:28:09AM +0200, Mateusz Guzik wrote: > > > > The kernel has the following mechanism: > > > > > > > > int > > > > syscall_thread_enter(struct thread *td, struct sysent *se) > > > > { > > > > u_int32_t cnt, oldcnt; > > > > > > > > do { > > > > oldcnt = se->sy_thrcnt; > > > > if ((oldcnt & SY_THR_STATIC) != 0) > > > > return (0); > > > > if ((oldcnt & (SY_THR_DRAINING | SY_THR_ABSENT)) != 0) > > > > return (ENOSYS); > > > > cnt = oldcnt + SY_THR_INCR; > > > > } while (atomic_cmpset_acq_32(&se->sy_thrcnt, oldcnt, cnt) == 0); > > > > return (0); > > > > } > > > > > > > > Except it turns out that it is used even if given module (here: sysvshm) is > > > > compiled in statically. > > > > > > > > So my proposal is to give modules an easy way to tell whether they got > > > > compiled in and extend syscall_register interface so that it would allow > > > > registering static syscalls. > > > > > > > > The latter could also be used by modules which are loadable, but don't > > > > support unloads. > > > > > > > > I don't have any good idea how to provide aforementioned detection > > > > method though. > > > The method would be a combination of some change to syscall_register() > > > and #ifdef KLD_MODULE. Look at the sys/conf.h MAKEDEV_ETERNAL_KLD > > > definition, which provides similar in spirit optimization for > > > non-destructable cdevs. > > > > > > > Ok, so I'll add sysctl_register_flags and SY_THR_STATIC_KLD + making > > sure SY_THR_STATIC cannot be unregistered. > > > > Turns out freebsd32 duplicates a lot of code and didn't receive some > fixes regular syscall table support got. I decided to just patch it up > without fixing that for now. > > diff --git a/sys/compat/freebsd32/freebsd32_misc.c b/sys/compat/freebsd32/freebsd32_misc.c > index d909a71..8fe4b83 100644 > --- a/sys/compat/freebsd32/freebsd32_misc.c > +++ b/sys/compat/freebsd32/freebsd32_misc.c > @@ -2627,9 +2627,13 @@ freebsd32_xxx(struct thread *td, struct freebsd32_xxx_args *uap) > #endif > > int > -syscall32_register(int *offset, struct sysent *new_sysent, > - struct sysent *old_sysent) > +syscall32_register_flags(int *offset, struct sysent *new_sysent, > + struct sysent *old_sysent, int flags) > { > + > + if ((flags & ~SY_THR_STATIC) != 0) > + return (EINVAL); > + > if (*offset == NO_SYSCALL) { > int i; > > @@ -2648,15 +2652,26 @@ syscall32_register(int *offset, struct sysent *new_sysent, > > *old_sysent = freebsd32_sysent[*offset]; > freebsd32_sysent[*offset] = *new_sysent; > + atomic_store_rel_32(&freebsd32_sysent[*offset].sy_thrcnt, flags); > return 0; Fix style while you are there; the line should be return _(_0_)_; > } > > int > +syscall32_register(int *offset, struct sysent *new_sysent, > + struct sysent *old_sysent) > +{ > + > + return (syscall32_register_flags(offset, new_sysent, old_sysent, 0)); > +} > + > +int > syscall32_deregister(int *offset, struct sysent *old_sysent) > { > > - if (*offset) > - freebsd32_sysent[*offset] = *old_sysent; > + if (*offset == 0) > + return (0); > + > + freebsd32_sysent[*offset] = *old_sysent; > return 0; > } > > @@ -2707,14 +2722,14 @@ syscall32_module_handler(struct module *mod, int what, void *arg) > } > > int > -syscall32_helper_register(struct syscall_helper_data *sd) > +syscall32_helper_register_flags(struct syscall_helper_data *sd, int flags) > { > struct syscall_helper_data *sd1; > int error; > > for (sd1 = sd; sd1->syscall_no != NO_SYSCALL; sd1++) { > - error = syscall32_register(&sd1->syscall_no, &sd1->new_sysent, > - &sd1->old_sysent); > + error = syscall32_register_flags(&sd1->syscall_no, > + &sd1->new_sysent, &sd1->old_sysent, flags); > if (error != 0) { > syscall32_helper_unregister(sd); > return (error); > @@ -2725,6 +2740,13 @@ syscall32_helper_register(struct syscall_helper_data *sd) > } > > int > +syscall32_helper_register(struct syscall_helper_data *sd) > +{ > + > + return (syscall32_helper_register_flags(sd, 0)); > +} > + > +int > syscall32_helper_unregister(struct syscall_helper_data *sd) > { > struct syscall_helper_data *sd1; > diff --git a/sys/compat/freebsd32/freebsd32_util.h b/sys/compat/freebsd32/freebsd32_util.h > index a5945cf..e473ef6 100644 > --- a/sys/compat/freebsd32/freebsd32_util.h > +++ b/sys/compat/freebsd32/freebsd32_util.h > @@ -97,10 +97,14 @@ SYSCALL32_MODULE(syscallname, \ > .syscall_no = FREEBSD32_SYS_##syscallname \ > } > > +int syscall32_register_flags(int *offset, struct sysent *new_sysent, > + struct sysent *old_sysent, int flags); > int syscall32_register(int *offset, struct sysent *new_sysent, > struct sysent *old_sysent); > int syscall32_deregister(int *offset, struct sysent *old_sysent); > int syscall32_module_handler(struct module *mod, int what, void *arg); > +int syscall32_helper_register_flags(struct syscall_helper_data *sd, > + int flags); > int syscall32_helper_register(struct syscall_helper_data *sd); > int syscall32_helper_unregister(struct syscall_helper_data *sd); > > diff --git a/sys/kern/kern_syscalls.c b/sys/kern/kern_syscalls.c > index 03f6088..30dd203 100644 > --- a/sys/kern/kern_syscalls.c > +++ b/sys/kern/kern_syscalls.c > @@ -104,11 +104,14 @@ syscall_thread_exit(struct thread *td, struct sysent *se) > } > > int > -syscall_register(int *offset, struct sysent *new_sysent, > - struct sysent *old_sysent) > +syscall_register_flags(int *offset, struct sysent *new_sysent, > + struct sysent *old_sysent, int flags) > { > int i; > > + if ((flags & ~SY_THR_STATIC) != 0) > + return (EINVAL); > + > if (*offset == NO_SYSCALL) { > for (i = 1; i < SYS_MAXSYSCALL; ++i) > if (sysent[i].sy_call == (sy_call_t *)lkmnosys) > @@ -127,18 +130,31 @@ syscall_register(int *offset, struct sysent *new_sysent, > *old_sysent = sysent[*offset]; > new_sysent->sy_thrcnt = SY_THR_ABSENT; > sysent[*offset] = *new_sysent; > - atomic_store_rel_32(&sysent[*offset].sy_thrcnt, 0); > + atomic_store_rel_32(&sysent[*offset].sy_thrcnt, flags); > return (0); > } > > int > +syscall_register(int *offset, struct sysent *new_sysent, > + struct sysent *old_sysent) > +{ > + > + return (syscall_register_flags(offset, new_sysent, old_sysent, 0)); > +} > + > +int > syscall_deregister(int *offset, struct sysent *old_sysent) > { > + struct sysent *se; > > - if (*offset) { > - syscall_thread_drain(&sysent[*offset]); > - sysent[*offset] = *old_sysent; > - } > + if (*offset == 0) > + return (0); /* XXX? */ > + > + se = &sysent[*offset]; > + if ((se->sy_thrcnt & SY_THR_STATIC) != 0) > + return (EINVAL); > + syscall_thread_drain(se); > + sysent[*offset] = *old_sysent; > return (0); > } > > @@ -190,14 +206,14 @@ syscall_module_handler(struct module *mod, int what, void *arg) > } > > int > -syscall_helper_register(struct syscall_helper_data *sd) > +syscall_helper_register_flags(struct syscall_helper_data *sd, int flags) > { > struct syscall_helper_data *sd1; > int error; > > for (sd1 = sd; sd1->syscall_no != NO_SYSCALL; sd1++) { > - error = syscall_register(&sd1->syscall_no, &sd1->new_sysent, > - &sd1->old_sysent); > + error = syscall_register_flags(&sd1->syscall_no, > + &sd1->new_sysent, &sd1->old_sysent, flags); > if (error != 0) { > syscall_helper_unregister(sd); > return (error); > @@ -208,6 +224,13 @@ syscall_helper_register(struct syscall_helper_data *sd) > } > > int > +syscall_helper_register(struct syscall_helper_data *sd) > +{ > + > + return (syscall_helper_register_flags(sd, 0)); > +} > + > +int > syscall_helper_unregister(struct syscall_helper_data *sd) > { > struct syscall_helper_data *sd1; > diff --git a/sys/kern/sysv_msg.c b/sys/kern/sysv_msg.c > index a572a0e..4fc04bc 100644 > --- a/sys/kern/sysv_msg.c > +++ b/sys/kern/sysv_msg.c > @@ -252,11 +252,12 @@ msginit() > } > mtx_init(&msq_mtx, "msq", NULL, MTX_DEF); > > - error = syscall_helper_register(msg_syscalls); > + error = syscall_helper_register_flags(msg_syscalls, SY_THR_STATIC_KLD); > if (error != 0) > return (error); > #ifdef COMPAT_FREEBSD32 > - error = syscall32_helper_register(msg32_syscalls); > + error = syscall32_helper_register_flags(msg32_syscalls, > + SY_THR_STATIC_KLD); > if (error != 0) > return (error); > #endif > diff --git a/sys/kern/sysv_sem.c b/sys/kern/sysv_sem.c > index c632902..dc1c66a 100644 > --- a/sys/kern/sysv_sem.c > +++ b/sys/kern/sysv_sem.c > @@ -278,11 +278,12 @@ seminit(void) > semexit_tag = EVENTHANDLER_REGISTER(process_exit, semexit_myhook, NULL, > EVENTHANDLER_PRI_ANY); > > - error = syscall_helper_register(sem_syscalls); > + error = syscall_helper_register_flags(sem_syscalls, SY_THR_STATIC_KLD); > if (error != 0) > return (error); > #ifdef COMPAT_FREEBSD32 > - error = syscall32_helper_register(sem32_syscalls); > + error = syscall32_helper_register_flags(sem32_syscalls, > + SY_THR_STATIC_KLD); > if (error != 0) > return (error); > #endif > diff --git a/sys/kern/sysv_shm.c b/sys/kern/sysv_shm.c > index 3480d11..b3132c3 100644 > --- a/sys/kern/sysv_shm.c > +++ b/sys/kern/sysv_shm.c > @@ -910,11 +910,12 @@ shminit() > shmexit_hook = &shmexit_myhook; > shmfork_hook = &shmfork_myhook; > > - error = syscall_helper_register(shm_syscalls); > + error = syscall_helper_register_flags(shm_syscalls, SY_THR_STATIC_KLD); > if (error != 0) > return (error); > #ifdef COMPAT_FREEBSD32 > - error = syscall32_helper_register(shm32_syscalls); > + error = syscall32_helper_register_flags(shm32_syscalls, > + SY_THR_STATIC_KLD); > if (error != 0) > return (error); > #endif > diff --git a/sys/netinet/sctp_syscalls.c b/sys/netinet/sctp_syscalls.c > index 3d0f549..8170ca8 100644 > --- a/sys/netinet/sctp_syscalls.c > +++ b/sys/netinet/sctp_syscalls.c > @@ -94,11 +94,11 @@ sctp_syscalls_init(void *unused __unused) > { > int error; > > - error = syscall_helper_register(sctp_syscalls); > + error = syscall_helper_register_flags(sctp_syscalls, SY_THR_STATIC); Why sctp does this at all ? It cannot be loaded. Seems to be just a mishandling of conditional compilation ? > KASSERT((error == 0), > ("%s: syscall_helper_register failed for sctp syscalls", __func__)); > #ifdef COMPAT_FREEBSD32 > - error = syscall32_helper_register(sctp_syscalls); > + error = syscall32_helper_register_flags(sctp_syscalls, SY_THR_STATIC); > KASSERT((error == 0), > ("%s: syscall32_helper_register failed for sctp syscalls", > __func__)); > diff --git a/sys/sys/sysent.h b/sys/sys/sysent.h > index 0f1c256..12bc518 100644 > --- a/sys/sys/sysent.h > +++ b/sys/sys/sysent.h > @@ -76,6 +76,12 @@ struct sysent { /* system call table */ > #define SY_THR_ABSENT 0x4 > #define SY_THR_INCR 0x8 > > +#ifdef KLD_MODULE > +#define SY_THR_STATIC_KLD 0 > +#else > +#define SY_THR_STATIC_KLD 0x1 > +#endif This is crude. AFAIU, it should be SY_THR_STATIC instead of 0x1. > + > struct image_params; > struct __sigset; > struct syscall_args; > @@ -241,10 +247,13 @@ struct syscall_helper_data { > .syscall_no = NO_SYSCALL \ > } > > +int syscall_register_flags(int *offset, struct sysent *new_sysent, > + struct sysent *old_sysent, int flags); > int syscall_register(int *offset, struct sysent *new_sysent, > struct sysent *old_sysent); > int syscall_deregister(int *offset, struct sysent *old_sysent); > int syscall_module_handler(struct module *mod, int what, void *arg); > +int syscall_helper_register_flags(struct syscall_helper_data *sd, int flags); > int syscall_helper_register(struct syscall_helper_data *sd); > int syscall_helper_unregister(struct syscall_helper_data *sd); I was told many times to not bloat the KPI. What you do is fine for merge to stable/10, while for HEAD, add flags argument to syscall_register() and do not provide wrapper function. > > > > > > > > > > > Also, please see https://reviews.freebsd.org/D1007 which moves > > > > SY_THR_STATIC check to an inline function, saving us 2 function calls on > > > > each syscall. > > > > > > Did you benchmarked this ? I dislike the code bloat. > > > > with syscall_timing from tools/tools ministat says +4% for getuid and +1 > > for pipe+close. > > > > -- > > Mateusz Guzik > > -- > Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 18:44:54 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0C0F7412 for ; Sat, 25 Oct 2014 18:44:54 +0000 (UTC) Received: from mail-wg0-x229.google.com (mail-wg0-x229.google.com [IPv6:2a00:1450:400c:c00::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9133EA14 for ; Sat, 25 Oct 2014 18:44:53 +0000 (UTC) Received: by mail-wg0-f41.google.com with SMTP id b13so3121507wgh.24 for ; Sat, 25 Oct 2014 11:44:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=IcJA+YTYxPUo+6RkvNm0+aWEVSd+1ylUVGPVDPSEl1w=; b=Lq2Wo7ZT6/ZlBocvRvvwrw18mwkfR92XyEez/m+ZqZtRHdCMbN2SQ7fdns5yakBbmr 86EvBNqwnspsPda/ByJUXka4rSBVqYgzhgqTrC1z7+Bp+mCSYi/YaDyehGuPl9zbW2A7 vj5a9SNt9wVIpeU2iI5lWiI6rxJ8ZH1bQTaGHTRQbSYY6GXhL+4pnrvjaJ7X6P9HxU/W kdqubfvqbKYsbJpAutfUYpF4V2W1YldAn6yNsRe2JetIpXbS5cV09++QKwbguWH/mOQi UH9N1XiKagFsugwloNN+j5yOpYa58UGpjbG5gsjA9Q+lxaBB3aJhMKeQfNrrn58tubw4 IS8w== X-Received: by 10.194.172.234 with SMTP id bf10mr12818503wjc.81.1414262691846; Sat, 25 Oct 2014 11:44:51 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id pc8sm9526535wjb.36.2014.10.25.11.44.50 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 25 Oct 2014 11:44:51 -0700 (PDT) Date: Sat, 25 Oct 2014 20:44:48 +0200 From: Mateusz Guzik To: freebsd-arch@freebsd.org Subject: refcount_release_take_##lock Message-ID: <20141025184448.GA19066@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 18:44:54 -0000 The following idiom is used here and there: int old; old = obj->ref; if (old > 1 && atomic_cmpset_int(&obj->ref, old, old -1)) return; lock(&something); if (refcount_release(&obj->ref) == 0) { unlock(&something); return; } free up unlock(&something); ========== I decided to implement it as a common function. We have only refcount.h and I didn't want to bloat all including code with additional definitions and as such I came up with a macro that has to be used in .c file and that will define appropriate inline func. I'm definitely looking for better names for REFCOUNT_RELEASE_TAKE_USE_ macro, assuming it has to stay. Comments? diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c index f8ae0e6..cf16344 100644 --- a/sys/kern/kern_jail.c +++ b/sys/kern/kern_jail.c @@ -81,6 +81,8 @@ __FBSDID("$FreeBSD$"); MALLOC_DEFINE(M_PRISON, "prison", "Prison structures"); static MALLOC_DEFINE(M_PRISON_RACCT, "prison_racct", "Prison racct structures"); +REFCOUNT_RELEASE_TAKE_USE_SX; + /* Keep struct prison prison0 and some code in kern_jail_set() readable. */ #ifdef INET #ifdef INET6 @@ -4466,15 +4468,12 @@ prison_racct_free_locked(struct prison_racct *prr) void prison_racct_free(struct prison_racct *prr) { - int old; sx_assert(&allprison_lock, SA_UNLOCKED); - old = prr->prr_refcount; - if (old > 1 && atomic_cmpset_int(&prr->prr_refcount, old, old - 1)) + if (refcount_release_take_sx(&prr->prr_refcount, &allprison_lock) == 0) return; - sx_xlock(&allprison_lock); prison_racct_free_locked(prr); sx_xunlock(&allprison_lock); } diff --git a/sys/kern/kern_loginclass.c b/sys/kern/kern_loginclass.c index b20f60b..8cc4b40 100644 --- a/sys/kern/kern_loginclass.c +++ b/sys/kern/kern_loginclass.c @@ -70,6 +70,7 @@ LIST_HEAD(, loginclass) loginclasses; */ static struct mtx loginclasses_lock; MTX_SYSINIT(loginclasses_init, &loginclasses_lock, "loginclasses lock", MTX_DEF); +REFCOUNT_RELEASE_TAKE_USE_MTX; void loginclass_hold(struct loginclass *lc) @@ -81,22 +82,14 @@ loginclass_hold(struct loginclass *lc) void loginclass_free(struct loginclass *lc) { - int old; - old = lc->lc_refcount; - if (old > 1 && atomic_cmpset_int(&lc->lc_refcount, old, old - 1)) + if (refcount_release_take_mtx(&lc->lc_refcount, &loginclasses_lock) == 0) return; - mtx_lock(&loginclasses_lock); - if (refcount_release(&lc->lc_refcount)) { - racct_destroy(&lc->lc_racct); - LIST_REMOVE(lc, lc_next); - mtx_unlock(&loginclasses_lock); - free(lc, M_LOGINCLASS); - - return; - } + racct_destroy(&lc->lc_racct); + LIST_REMOVE(lc, lc_next); mtx_unlock(&loginclasses_lock); + free(lc, M_LOGINCLASS); } /* diff --git a/sys/kern/kern_resource.c b/sys/kern/kern_resource.c index f4914fa..efbaa87 100644 --- a/sys/kern/kern_resource.c +++ b/sys/kern/kern_resource.c @@ -71,6 +71,7 @@ static MALLOC_DEFINE(M_PLIMIT, "plimit", "plimit structures"); static MALLOC_DEFINE(M_UIDINFO, "uidinfo", "uidinfo structures"); #define UIHASH(uid) (&uihashtbl[(uid) & uihash]) static struct rwlock uihashtbl_lock; +REFCOUNT_RELEASE_TAKE_USE_RWLOCK; static LIST_HEAD(uihashhead, uidinfo) *uihashtbl; static u_long uihash; /* size of hash table - 1 */ @@ -1327,37 +1328,24 @@ void uifree(uip) struct uidinfo *uip; { - int old; - /* Prepare for optimal case. */ - old = uip->ui_ref; - if (old > 1 && atomic_cmpset_int(&uip->ui_ref, old, old - 1)) + if (refcount_release_take_rwlock(&uip->ui_ref, &uihashtbl_lock) == 0) return; - /* Prepare for suboptimal case. */ - rw_wlock(&uihashtbl_lock); - if (refcount_release(&uip->ui_ref)) { - racct_destroy(&uip->ui_racct); - LIST_REMOVE(uip, ui_hash); - rw_wunlock(&uihashtbl_lock); - if (uip->ui_sbsize != 0) - printf("freeing uidinfo: uid = %d, sbsize = %ld\n", - uip->ui_uid, uip->ui_sbsize); - if (uip->ui_proccnt != 0) - printf("freeing uidinfo: uid = %d, proccnt = %ld\n", - uip->ui_uid, uip->ui_proccnt); - if (uip->ui_vmsize != 0) - printf("freeing uidinfo: uid = %d, swapuse = %lld\n", - uip->ui_uid, (unsigned long long)uip->ui_vmsize); - mtx_destroy(&uip->ui_vmsize_mtx); - free(uip, M_UIDINFO); - return; - } - /* - * Someone added a reference between atomic_cmpset_int() and - * rw_wlock(&uihashtbl_lock). - */ + racct_destroy(&uip->ui_racct); + LIST_REMOVE(uip, ui_hash); rw_wunlock(&uihashtbl_lock); + if (uip->ui_sbsize != 0) + printf("freeing uidinfo: uid = %d, sbsize = %ld\n", + uip->ui_uid, uip->ui_sbsize); + if (uip->ui_proccnt != 0) + printf("freeing uidinfo: uid = %d, proccnt = %ld\n", + uip->ui_uid, uip->ui_proccnt); + if (uip->ui_vmsize != 0) + printf("freeing uidinfo: uid = %d, swapuse = %lld\n", + uip->ui_uid, (unsigned long long)uip->ui_vmsize); + mtx_destroy(&uip->ui_vmsize_mtx); + free(uip, M_UIDINFO); } void diff --git a/sys/sys/refcount.h b/sys/sys/refcount.h index 4611664..9dff576 100644 --- a/sys/sys/refcount.h +++ b/sys/sys/refcount.h @@ -64,4 +64,29 @@ refcount_release(volatile u_int *count) return (old == 1); } +#define REFCOUNT_RELEASE_DEFINE(NAME, TYPE, LOCK, UNLOCK) \ +static __inline int \ +refcount_release_take_##NAME(volatile u_int *count, TYPE *v) \ +{ \ + u_int old; \ + \ + old = *count; \ + if (old > 1 && atomic_cmpset_int(count, old, old - 1)) \ + return (0); \ + LOCK(v); \ + if (refcount_release(count)) \ + return (1); \ + UNLOCK(v); \ + return (0); \ +} + +#define REFCOUNT_RELEASE_TAKE_USE_MTX \ + REFCOUNT_RELEASE_DEFINE(mtx, struct mtx, mtx_lock, mtx_unlock); +#define REFCOUNT_RELEASE_TAKE_USE_RWLOCK \ + REFCOUNT_RELEASE_DEFINE(rwlock, struct rwlock, rw_wlock, rw_wunlock); +#define REFCOUNT_RELEASE_TAKE_USE_RMLOCK \ + REFCOUNT_RELEASE_DEFINE(rmlock, struct rmlock, rm_wlock, rm_wunlock); +#define REFCOUNT_RELEASE_TAKE_USE_SX \ + REFCOUNT_RELEASE_DEFINE(sx, struct sx, sx_xlock, sx_xunlock); + #endif /* ! __SYS_REFCOUNT_H__ */ From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 18:52:36 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D037A72F for ; Sat, 25 Oct 2014 18:52:36 +0000 (UTC) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id 91F7AAE4 for ; Sat, 25 Oct 2014 18:52:36 +0000 (UTC) Received: from nine.des.no (smtp.des.no [194.63.250.102]) by smtp-int.des.no (Postfix) with ESMTP id C3B22AB23; Sat, 25 Oct 2014 18:52:35 +0000 (UTC) Received: by nine.des.no (Postfix, from userid 1001) id D085B10479; Sat, 25 Oct 2014 20:52:25 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: "Simon J. Gerraty" Subject: Re: Retiring WITH_INSTALL_AS_USER References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> <21044.1414038558@chaos> <9250.1414076335@chaos> <86wq7p4zcx.fsf@nine.des.no> <10072.1414165996@chaos> Date: Sat, 25 Oct 2014 20:52:25 +0200 In-Reply-To: <10072.1414165996@chaos> (Simon J. Gerraty's message of "Fri, 24 Oct 2014 08:53:16 -0700") Message-ID: <86k33o7ziu.fsf@nine.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 18:52:36 -0000 "Simon J. Gerraty" writes: > But it sounds like NO_ROOT already solves this issue? NO_ROOT solves it in a much better fashion, by modifying install(1)'s behavior so that instead of performing the chown / chgrp / chmod, it records it in a file which can then be used to generate a package manifest or something like that. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 19:04:10 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2274D9C1 for ; Sat, 25 Oct 2014 19:04:10 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D3EB9BE5 for ; Sat, 25 Oct 2014 19:04:09 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9PJ47XI030946 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 25 Oct 2014 12:04:08 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9PJ47s9030945; Sat, 25 Oct 2014 12:04:07 -0700 (PDT) (envelope-from jmg) Date: Sat, 25 Oct 2014 12:04:07 -0700 From: John-Mark Gurney To: Mateusz Guzik Subject: Re: refcount_release_take_##lock Message-ID: <20141025190407.GU82214@funkthat.com> Mail-Followup-To: Mateusz Guzik , freebsd-arch@freebsd.org References: <20141025184448.GA19066@dft-labs.eu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141025184448.GA19066@dft-labs.eu> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Sat, 25 Oct 2014 12:04:08 -0700 (PDT) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 19:04:10 -0000 Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 20:44 +0200: > The following idiom is used here and there: > > int old; > old = obj->ref; > if (old > 1 && atomic_cmpset_int(&obj->ref, old, old -1)) > return; > lock(&something); > if (refcount_release(&obj->ref) == 0) { > unlock(&something); > return; > } > free up > unlock(&something); > > ========== Couldn't this be better written as: if (__predict_false(refcount_release(&obj->ref) == 0)) { lock(&something); if (__predict_true(!obj->ref)) { free up } unlock(&something); } The reason I'm asking is that I changed how IPsec SA ref counting was handled, and used something similar... My code gets rid of a branch, and is better in that it uses refcount API properly, instead of using atomic_cmpset_int... > I decided to implement it as a common function. > > We have only refcount.h and I didn't want to bloat all including code > with additional definitions and as such I came up with a macro that has > to be used in .c file and that will define appropriate inline func. > > I'm definitely looking for better names for REFCOUNT_RELEASE_TAKE_USE_ > macro, assuming it has to stay. You could shorten it to REFCNT_REL_TAKE_ > Comments? Will you update the refcount(9) man page w/ documentation before committing? -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 19:26:37 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C52B6EC3 for ; Sat, 25 Oct 2014 19:26:37 +0000 (UTC) Received: from mail-wg0-x22b.google.com (mail-wg0-x22b.google.com [IPv6:2a00:1450:400c:c00::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5DFE6E2A for ; Sat, 25 Oct 2014 19:26:37 +0000 (UTC) Received: by mail-wg0-f43.google.com with SMTP id n12so3072886wgh.26 for ; Sat, 25 Oct 2014 12:26:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=RdolTdLegQqgq2jDw7DW0z/AQpTGlRin2yNLiXwg4lU=; b=UM05/IO/AeRROvlNuaqGxoKUwTC4BucOvRymOBiVMyIh7fybREcPu2fEvNyhx9aMeE spS5m3flVC6HNEbzssESSJbU7AN6xdZYxK0TNII4rPmPwDVG31cSDo0eYn8y4uXRPpe4 WXiWeVqm89jHq0ETWYCrkV/BADEoa5ivdpJmEZ7tLfmjD+FcWQ2hHYwF+tY1phPAeGw9 sk98RZ3cZHcOPu7oyiP3DULzu+6cgkUcWoqeLKYv6gFRlCREjEk3pMGnFoGfFGAHbEU1 7Vfjnpe0c9uVPpBTu879qXpCUjvurFXK9b1GzNtbGWniVwLdlY4bv+o5ucKNWNoTHUUj IkUA== X-Received: by 10.180.211.70 with SMTP id na6mr11619901wic.3.1414265195651; Sat, 25 Oct 2014 12:26:35 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id bl9sm5951081wib.24.2014.10.25.12.26.34 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 25 Oct 2014 12:26:34 -0700 (PDT) Date: Sat, 25 Oct 2014 21:26:32 +0200 From: Mateusz Guzik To: freebsd-arch@freebsd.org Subject: Re: refcount_release_take_##lock Message-ID: <20141025192632.GB19066@dft-labs.eu> References: <20141025184448.GA19066@dft-labs.eu> <20141025190407.GU82214@funkthat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20141025190407.GU82214@funkthat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 19:26:38 -0000 On Sat, Oct 25, 2014 at 12:04:07PM -0700, John-Mark Gurney wrote: > Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 20:44 +0200: > > The following idiom is used here and there: > > > > int old; > > old = obj->ref; > > if (old > 1 && atomic_cmpset_int(&obj->ref, old, old -1)) > > return; > > lock(&something); > > if (refcount_release(&obj->ref) == 0) { > > unlock(&something); > > return; > > } > > free up > > unlock(&something); > > > > ========== > > Couldn't this be better written as: > if (__predict_false(refcount_release(&obj->ref) == 0)) { > lock(&something); > if (__predict_true(!obj->ref)) { > free up > } > unlock(&something); > } > > The reason I'm asking is that I changed how IPsec SA ref counting was > handled, and used something similar... > > My code gets rid of a branch, and is better in that it uses refcount > API properly, instead of using atomic_cmpset_int... > This is used when given obj is kept on a list and code which traverses it (locked) expects found objects to be valid to ref. If we were to reach count of 0 and then lock, it would be possible that other thread refed + unrefed the object and is now trying to lock as well. That could be remedied for type stable object by having a generation counter, but I doubt it's worth it. Not to mention objects we lock here are freeable :) > > I decided to implement it as a common function. > > > > We have only refcount.h and I didn't want to bloat all including code > > with additional definitions and as such I came up with a macro that has > > to be used in .c file and that will define appropriate inline func. > > > > I'm definitely looking for better names for REFCOUNT_RELEASE_TAKE_USE_ > > macro, assuming it has to stay. > > You could shorten it to REFCNT_REL_TAKE_ > All function use full 'refcount_release' and the like, so that would be inconsistent. Losing 'take' may be an option, I don't know. > > Comments? > > Will you update the refcount(9) man page w/ documentation before > committing? > Sure. -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 19:42:20 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1E2564CB for ; Sat, 25 Oct 2014 19:42:20 +0000 (UTC) Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D4291FBC for ; Sat, 25 Oct 2014 19:42:19 +0000 (UTC) Received: from [73.34.117.227] (helo=ilsoft.org) by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1Xi6wP-000JMj-1Z; Sat, 25 Oct 2014 19:23:57 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id s9PJNtVS072720; Sat, 25 Oct 2014 13:23:55 -0600 (MDT) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 73.34.117.227 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX19cxQyYTC9Uz84e/ruQTnuK X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan [172.22.42.240] claimed to be [172.22.42.240] Subject: Re: refcount_release_take_##lock From: Ian Lepore To: John-Mark Gurney In-Reply-To: <20141025190407.GU82214@funkthat.com> References: <20141025184448.GA19066@dft-labs.eu> <20141025190407.GU82214@funkthat.com> Content-Type: text/plain; charset="us-ascii" Date: Sat, 25 Oct 2014 13:23:55 -0600 Message-ID: <1414265035.12052.646.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: Mateusz Guzik , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 19:42:20 -0000 On Sat, 2014-10-25 at 12:04 -0700, John-Mark Gurney wrote: > Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 20:44 +0200: > > The following idiom is used here and there: > > > > int old; > > old = obj->ref; > > if (old > 1 && atomic_cmpset_int(&obj->ref, old, old -1)) > > return; > > lock(&something); > > if (refcount_release(&obj->ref) == 0) { > > unlock(&something); > > return; > > } > > free up > > unlock(&something); > > > > ========== > > Couldn't this be better written as: > if (__predict_false(refcount_release(&obj->ref) == 0)) { Could you not get preempted at this point, whereupon another thread acquires then releases obj, deletes it because it keeps running through this point, then eventually your original thread wakes up, gets the lock, and dereferences the now-defunct obj pointer? (Also, I think that should be != 0, above?) -- Ian > lock(&something); > if (__predict_true(!obj->ref)) { > free up > } > unlock(&something); > } > > The reason I'm asking is that I changed how IPsec SA ref counting was > handled, and used something similar... > > My code gets rid of a branch, and is better in that it uses refcount > API properly, instead of using atomic_cmpset_int... > > > I decided to implement it as a common function. > > > > We have only refcount.h and I didn't want to bloat all including code > > with additional definitions and as such I came up with a macro that has > > to be used in .c file and that will define appropriate inline func. > > > > I'm definitely looking for better names for REFCOUNT_RELEASE_TAKE_USE_ > > macro, assuming it has to stay. > > You could shorten it to REFCNT_REL_TAKE_ > > > Comments? > > Will you update the refcount(9) man page w/ documentation before > committing? > From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 19:43:48 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 014CD584 for ; Sat, 25 Oct 2014 19:43:47 +0000 (UTC) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id BA715FCB for ; Sat, 25 Oct 2014 19:43:47 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 8D2C7358C54; Sat, 25 Oct 2014 21:43:44 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id 7211928494; Sat, 25 Oct 2014 21:43:44 +0200 (CEST) Date: Sat, 25 Oct 2014 21:43:44 +0200 From: Jilles Tjoelker To: Konstantin Belousov Subject: Re: RfC: fueword(9) and casueword(9) Message-ID: <20141025194344.GA11568@stack.nl> References: <20141021094539.GA1877@kib.kiev.ua> <20141022002825.H2080@besplex.bde.org> <20141021162306.GE1877@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141021162306.GE1877@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 19:43:48 -0000 On Tue, Oct 21, 2014 at 07:23:06PM +0300, Konstantin Belousov wrote: > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote: > > On Tue, 21 Oct 2014, Konstantin Belousov wrote: > > This API is larger, slower, and harder to use. No better fix is evident, > > except for fubyte() and fuword16(). These never had a problem on any > > supported arch, since bytes are only 8 bits on all supported arches, > > and 16-bit ints are not supported on any arch, so -1 is always out of > > band. Not touching them is a better fix. You didn't change any of > > their callers, but pessimized their implementation to a wrapper around > > fue*(). (BTW, fuword16() and fuswintr() are misdocumented as taking a > > non-const void * arg.). > I reverted the addition of fuebyte(9) and fueword16(9). First I thought > that it would be nicer to provide fully complement KPI with regard to > fuX/fueX, but it seems that lack of fuebyte() and fueword16() will > cause the right questions. > > > @@ -921,7 +933,9 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, > > > * can fault on any access. > > > */ > > > for (;;) { > > > - owner = fuword32(__DEVOLATILE(void *, &m->m_owner)); > > > + rv = fueword32(__DEVOLATILE(void *, &m->m_owner), &owner); > > > + if (error == -1) > > > + return (EFAULT); > > > if (mode == _UMUTEX_WAIT) { > > > if (owner == UMUTEX_UNOWNED || owner == UMUTEX_CONTESTED) > > > return (0); > > A new API should try to fix these __DEVOLATILE() abominations. I think it > > is safe, and even correct, to declare the pointers as volatile const void *, > > since the functions really can handle volatile data, unlike copyin(). > > Atomic op functions are declared as taking pointers to volatile for > > similar reasons. Often they are applied to non-volatile data, but > > adding a qualifier is type-safe and doesn't cost efficiency since the > > pointer access is is not known to the compiler. (The last point is not > > so clear -- the compiler can see things in the functions since they are > > inline asm. fueword() isn't inline so its (in)efficiency is not changed.) > > The atomic read functions are not declared as taking pointers to const. > > The __DECONST() abomination might be used to work around this bug. > I prefer to not complicate the fetch(9) KPI due to the mistakes in the > umtx structures definitions. I think that it is bug to mark the lock > words with volatile. I want the fueword(9) interface to be as much > similar to fuword(9), in particular, volatile seems to be not needed. > Below is the updated patch, together with the bug fix noted by Eric. Hmm, consider returning an error number (that is, EFAULT) instead of -1 on failure. This is somewhat like priv_check() which returns EPERM on failure, and probably reduces confusion if the return value is assigned to a variable named "error". In share/man/man9/Makefile, MLINKS are still created for fuebyte.9 and fueword16.9. "successful" is consistently misspelled "successfull". -- Jilles Tjoelker From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 19:47:00 2014 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8FABB659; Sat, 25 Oct 2014 19:47:00 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 654DDFEA; Sat, 25 Oct 2014 19:47:00 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9PJkw8Q031397 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 25 Oct 2014 12:46:59 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9PJkwJD031396; Sat, 25 Oct 2014 12:46:58 -0700 (PDT) (envelope-from jmg) Date: Sat, 25 Oct 2014 12:46:58 -0700 From: John-Mark Gurney To: Ian Lepore Subject: Re: refcount_release_take_##lock Message-ID: <20141025194658.GV82214@funkthat.com> Mail-Followup-To: Ian Lepore , Mateusz Guzik , freebsd-arch@FreeBSD.org References: <20141025184448.GA19066@dft-labs.eu> <20141025190407.GU82214@funkthat.com> <1414265035.12052.646.camel@revolution.hippie.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1414265035.12052.646.camel@revolution.hippie.lan> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Sat, 25 Oct 2014 12:46:59 -0700 (PDT) Cc: Mateusz Guzik , freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 19:47:00 -0000 Ian Lepore wrote this message on Sat, Oct 25, 2014 at 13:23 -0600: > On Sat, 2014-10-25 at 12:04 -0700, John-Mark Gurney wrote: > > Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 20:44 +0200: > > > The following idiom is used here and there: > > > > > > int old; > > > old = obj->ref; > > > if (old > 1 && atomic_cmpset_int(&obj->ref, old, old -1)) > > > return; > > > lock(&something); > > > if (refcount_release(&obj->ref) == 0) { > > > unlock(&something); > > > return; > > > } > > > free up > > > unlock(&something); > > > > > > ========== > > > > Couldn't this be better written as: > > if (__predict_false(refcount_release(&obj->ref) == 0)) { > > Could you not get preempted at this point, whereupon another thread > acquires then releases obj, deletes it because it keeps running through > this point, then eventually your original thread wakes up, gets the > lock, and dereferences the now-defunct obj pointer? Depends upon how you handle reference counts... If you allow someone to create a reference when they don't have one (by definition since the object has 0 references), then yes, that would be a problem... But by definition, if the current thread transitions an object from 1 to 0 count, you are the only one w/ a reference, and are safe from another thread getting a reference and doing what you said... Now if you're talking about a data structure that keeps a reference so that others can create references to the object, then shouldn't there be one more reference for the data structure? And that case is different, and you are correct, if the above code is used, a race will be introduced... > (Also, I think that should be != 0, above?) Yes, or just drop the comparision... I didn't read the refcount_release man page to check for return value... > > lock(&something); > > if (__predict_true(!obj->ref)) { > > free up > > } > > unlock(&something); > > } > > > > The reason I'm asking is that I changed how IPsec SA ref counting was > > handled, and used something similar... > > > > My code gets rid of a branch, and is better in that it uses refcount > > API properly, instead of using atomic_cmpset_int... > > > > > I decided to implement it as a common function. > > > > > > We have only refcount.h and I didn't want to bloat all including code > > > with additional definitions and as such I came up with a macro that has > > > to be used in .c file and that will define appropriate inline func. > > > > > > I'm definitely looking for better names for REFCOUNT_RELEASE_TAKE_USE_ > > > macro, assuming it has to stay. > > > > You could shorten it to REFCNT_REL_TAKE_ > > > > > Comments? > > > > Will you update the refcount(9) man page w/ documentation before > > committing? -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 19:53:35 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C94D38A4 for ; Sat, 25 Oct 2014 19:53:35 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id A8B51145 for ; Sat, 25 Oct 2014 19:53:35 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9PJrYL7031480 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 25 Oct 2014 12:53:35 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9PJrYSI031479; Sat, 25 Oct 2014 12:53:34 -0700 (PDT) (envelope-from jmg) Date: Sat, 25 Oct 2014 12:53:34 -0700 From: John-Mark Gurney To: Mateusz Guzik Subject: Re: refcount_release_take_##lock Message-ID: <20141025195334.GW82214@funkthat.com> Mail-Followup-To: Mateusz Guzik , freebsd-arch@freebsd.org References: <20141025184448.GA19066@dft-labs.eu> <20141025190407.GU82214@funkthat.com> <20141025192632.GB19066@dft-labs.eu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141025192632.GB19066@dft-labs.eu> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Sat, 25 Oct 2014 12:53:35 -0700 (PDT) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 19:53:36 -0000 Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 21:26 +0200: > On Sat, Oct 25, 2014 at 12:04:07PM -0700, John-Mark Gurney wrote: > > Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 20:44 +0200: > > > The following idiom is used here and there: > > > > > > int old; > > > old = obj->ref; > > > if (old > 1 && atomic_cmpset_int(&obj->ref, old, old -1)) > > > return; > > > lock(&something); > > > if (refcount_release(&obj->ref) == 0) { > > > unlock(&something); > > > return; > > > } > > > free up > > > unlock(&something); > > > > > > ========== > > > > Couldn't this be better written as: > > if (__predict_false(refcount_release(&obj->ref) == 0)) { > > lock(&something); > > if (__predict_true(!obj->ref)) { > > free up > > } > > unlock(&something); > > } > > > > The reason I'm asking is that I changed how IPsec SA ref counting was > > handled, and used something similar... > > > > My code gets rid of a branch, and is better in that it uses refcount > > API properly, instead of using atomic_cmpset_int... > > This is used when given obj is kept on a list and code which traverses > it (locked) expects found objects to be valid to ref. > > If we were to reach count of 0 and then lock, it would be possible that > other thread refed + unrefed the object and is now trying to lock as > well. Per the email I wrote to Ian, this "assumption" needs to be well documented that though the "list" has a reference, and that this reference is not accounted for in the ref count... And I personally think that it's a bug for the list to not hold it's own reference... Yes, then you need to compare for when the ref count hits one, and do the lock/dec/free/unlock, but that keeps the refcount sane... > That could be remedied for type stable object by having a generation > counter, but I doubt it's worth it. Not to mention objects we lock here > are freeable :) That's too heavy weight... > > > I decided to implement it as a common function. > > > > > > We have only refcount.h and I didn't want to bloat all including code > > > with additional definitions and as such I came up with a macro that has > > > to be used in .c file and that will define appropriate inline func. > > > > > > I'm definitely looking for better names for REFCOUNT_RELEASE_TAKE_USE_ > > > macro, assuming it has to stay. > > > > You could shorten it to REFCNT_REL_TAKE_ > > > > All function use full 'refcount_release' and the like, so that would be > inconsistent. > > Losing 'take' may be an option, I don't know. Yeh, the only advantage is that it only appears once per file used, so it's not THAT long... > > > Comments? > > > > > Will you update the refcount(9) man page w/ documentation before > > committing? > > Sure. Thanks. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 20:12:47 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0F282BF4 for ; Sat, 25 Oct 2014 20:12:47 +0000 (UTC) Received: from mail-wi0-x232.google.com (mail-wi0-x232.google.com [IPv6:2a00:1450:400c:c05::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 90D82335 for ; Sat, 25 Oct 2014 20:12:46 +0000 (UTC) Received: by mail-wi0-f178.google.com with SMTP id q5so3608716wiv.17 for ; Sat, 25 Oct 2014 13:12:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=f/j9gSaBbZGgnuOjuxdiHxX4aS7Sbr2qIWBFsJkI65w=; b=RstkupRo505BGqBzI6woiM+ZSGLmyHx0LQ2BCEXdlg4jcCqnSO/UtinM5i1eb4h5HG tkdxg2PZFgNmZl3FDdcSLOXLDf+3RyA2RMHnrTizGHV6SJvn9f2N2Undf5T0NROwXgTh wbGKFF5Q9NoaPmSuG14T9woAQqDam1AveTYyZ0ASqsB1jERURC8muDtyYjx8ywP0cTWD Ze9ZBbn5h/UefcEiHi9H2G+7X9kcmq1tfs6Bwkx2AbRXWnquYTgYe7R4pTgrI5QBGFY1 NXkyetSmTohppaRRimQSNizNf9NhOXv5oTyBrv66gsClDeoqYuLsP7o6sx64O7wqSPPl +qKQ== X-Received: by 10.194.58.8 with SMTP id m8mr13260118wjq.43.1414267964341; Sat, 25 Oct 2014 13:12:44 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id j8sm6071270wib.10.2014.10.25.13.12.43 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 25 Oct 2014 13:12:43 -0700 (PDT) Date: Sat, 25 Oct 2014 22:12:40 +0200 From: Mateusz Guzik To: freebsd-arch@freebsd.org Subject: Re: refcount_release_take_##lock Message-ID: <20141025201240.GC19066@dft-labs.eu> References: <20141025184448.GA19066@dft-labs.eu> <20141025190407.GU82214@funkthat.com> <20141025192632.GB19066@dft-labs.eu> <20141025195334.GW82214@funkthat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20141025195334.GW82214@funkthat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 20:12:47 -0000 On Sat, Oct 25, 2014 at 12:53:34PM -0700, John-Mark Gurney wrote: > Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 21:26 +0200: > > On Sat, Oct 25, 2014 at 12:04:07PM -0700, John-Mark Gurney wrote: > > > Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 20:44 +0200: > > > > The following idiom is used here and there: > > > > > > > > int old; > > > > old = obj->ref; > > > > if (old > 1 && atomic_cmpset_int(&obj->ref, old, old -1)) > > > > return; > > > > lock(&something); > > > > if (refcount_release(&obj->ref) == 0) { > > > > unlock(&something); > > > > return; > > > > } > > > > free up > > > > unlock(&something); > > > > > > > > ========== > > > > > > Couldn't this be better written as: > > > if (__predict_false(refcount_release(&obj->ref) == 0)) { > > > lock(&something); > > > if (__predict_true(!obj->ref)) { > > > free up > > > } > > > unlock(&something); > > > } > > > > > > The reason I'm asking is that I changed how IPsec SA ref counting was > > > handled, and used something similar... > > > > > > My code gets rid of a branch, and is better in that it uses refcount > > > API properly, instead of using atomic_cmpset_int... > > > > This is used when given obj is kept on a list and code which traverses > > it (locked) expects found objects to be valid to ref. > > > > If we were to reach count of 0 and then lock, it would be possible that > > other thread refed + unrefed the object and is now trying to lock as > > well. > > Per the email I wrote to Ian, this "assumption" needs to be well > documented that though the "list" has a reference, and that this > reference is not accounted for in the ref count... > > And I personally think that it's a bug for the list to not hold it's > own reference... Yes, then you need to compare for when the ref count > hits one, and do the lock/dec/free/unlock, but that keeps the refcount > sane... > Well, this is for stuff which cleans up after itself. Example usage is with per-uid stats for resource limits. These automatically free themselves with the last cred with given uid. This has its own problems (like constant creation and destruction of stuff for the same cred), but seems ok enough for some cases. Otherwise we would have to actively free these structs somehow. > > That could be remedied for type stable object by having a generation > > counter, but I doubt it's worth it. Not to mention objects we lock here > > are freeable :) > > That's too heavy weight... > > > > > I decided to implement it as a common function. > > > > > > > > We have only refcount.h and I didn't want to bloat all including code > > > > with additional definitions and as such I came up with a macro that has > > > > to be used in .c file and that will define appropriate inline func. > > > > > > > > I'm definitely looking for better names for REFCOUNT_RELEASE_TAKE_USE_ > > > > macro, assuming it has to stay. > > > > > > You could shorten it to REFCNT_REL_TAKE_ > > > > > > > All function use full 'refcount_release' and the like, so that would be > > inconsistent. > > > > Losing 'take' may be an option, I don't know. > > Yeh, the only advantage is that it only appears once per file used, > so it's not THAT long... > > > > > Comments? > > > > > > > > Will you update the refcount(9) man page w/ documentation before > > > committing? > > > > Sure. > > Thanks. > > -- > John-Mark Gurney Voice: +1 415 225 5579 > > "All that I will do, has been done, All that I have, has not." -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 20:25:12 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3237BF77 for ; Sat, 25 Oct 2014 20:25:12 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 02DF8634 for ; Sat, 25 Oct 2014 20:25:11 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9PKP9R7031825 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 25 Oct 2014 13:25:10 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9PKP9bP031824; Sat, 25 Oct 2014 13:25:09 -0700 (PDT) (envelope-from jmg) Date: Sat, 25 Oct 2014 13:25:09 -0700 From: John-Mark Gurney To: Konstantin Belousov Subject: Re: RfC: fueword(9) and casueword(9) Message-ID: <20141025202509.GX82214@funkthat.com> Mail-Followup-To: Konstantin Belousov , arch@freebsd.org References: <20141021094539.GA1877@kib.kiev.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141021094539.GA1877@kib.kiev.ua> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Sat, 25 Oct 2014 13:25:10 -0700 (PDT) Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 20:25:12 -0000 Konstantin Belousov wrote this message on Tue, Oct 21, 2014 at 12:45 +0300: > FreeBSD provides the fuword(9) family of functions to fetch a word from > the userspace. Functions return the value read, or -1 on failure (i.e. > when faulted on access). This KPI has flaw, which makes it impossible > to distinguish -1 read from usermode vs. the fault. As John Baldwin > pointed out, fuword(9) cannot be replaced by copyin(9), since fuword(9) > is atomic for aligned data, while copyin(9) is typically implemented as > byte copy. We also need to define what a word is in the man page... I assumed that a word (fuword) was a 32bit word, but it's not on 64 bit arches, it's a 64bit word... if words were 32bit words on 64bit arches, fuword would be safe (assuming reading an unsigned word), but that is not the case... Related to this is that it isn't defined if fubyte (returns an int) reads a signed or unsigned byte. If it reads an unsigned byte, then it is safe, and we do not need an fuebyte version... The same goes w/ fuword16 (returns an int)... > I wanted to fix this wart for long time, below is the prototyped patch, > which adds fueword(9) family of functions. They take the address of > variable where to put the value read, and return 0 on success, -1 on > failure. In similar way, casueword(9) fixes casuword(9). > > The tricky part of the patch are the changes to kern_umtx.c, where the > logic of the loops in the lock acquire routines is delicate and care > must be taken to not obliterate possible errors from the suspension > check or signal test on loop retry. > > I only implemented fueword(9) and casueword(9) for x86 and powerpc. > The fuword(9) and casuword(9) are reimplemented as wrappers around > e-variants. > > For arm, mips and sparc, where I do not know or do not remember the > assembler anymore, I made a hack to provide deficient fueword(9), which > calls fuword(9) and thus still mixing -1 from userspace and fault. See > NO_FUEWORD in machine/param.h; hopefully arch maintainers will fix the > remaining places. > > Some users of fuword(9) are still left, in particular in aio and dtrace. > > Patch was only lightly tested on x86 for now. > > Comments and fixes are welcomed. I'll take a closer look at the patch soon... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 20:31:09 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E00331FE for ; Sat, 25 Oct 2014 20:31:09 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5227165D for ; Sat, 25 Oct 2014 20:31:09 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9PKV3L1093557 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 25 Oct 2014 23:31:03 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9PKV3L1093557 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s9PKV34W093556; Sat, 25 Oct 2014 23:31:03 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 25 Oct 2014 23:31:03 +0300 From: Konstantin Belousov To: Jilles Tjoelker Subject: Re: RfC: fueword(9) and casueword(9) Message-ID: <20141025203103.GM1877@kib.kiev.ua> References: <20141021094539.GA1877@kib.kiev.ua> <20141022002825.H2080@besplex.bde.org> <20141021162306.GE1877@kib.kiev.ua> <20141025194344.GA11568@stack.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141025194344.GA11568@stack.nl> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 20:31:10 -0000 On Sat, Oct 25, 2014 at 09:43:44PM +0200, Jilles Tjoelker wrote: > On Tue, Oct 21, 2014 at 07:23:06PM +0300, Konstantin Belousov wrote: > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the > > umtx structures definitions. I think that it is bug to mark the lock > > words with volatile. I want the fueword(9) interface to be as much > > similar to fuword(9), in particular, volatile seems to be not needed. > > > Below is the updated patch, together with the bug fix noted by Eric. > > Hmm, consider returning an error number (that is, EFAULT) instead of -1 > on failure. This is somewhat like priv_check() which returns EPERM on > failure, and probably reduces confusion if the return value is assigned > to a variable named "error". I did considered EFAULT. The KPI becomes too inconsistent if comparing fueword with fuword or suword. If you reference rv in kern_umtx.c, returning EFAULT for fueword or casueword would not allow to use error directly. The reason is the complicated logic of handling the suspend requests. Basically, non-errors from userspace access must not obliterate the errors from previous calls to check_suspend(). See r270345 for example of the bug from earlier pass over userspace access in kern_umtx.c. There are still more such bugs left in the patch, Peter and I chasing them. > > In share/man/man9/Makefile, MLINKS are still created for fuebyte.9 and > fueword16.9. Fixed, thank you. > > "successful" is consistently misspelled "successfull". I corrected all occurences of successfull in the patch, but it is systematic in the tree. pooma% git grep -e successfull -- | grep -v successfully | wc -l 30 From owner-freebsd-arch@FreeBSD.ORG Sat Oct 25 22:47:05 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 88A20374 for ; Sat, 25 Oct 2014 22:47:05 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 549A8601 for ; Sat, 25 Oct 2014 22:47:05 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9PMl2TF033192 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 25 Oct 2014 15:47:03 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9PMl2GH033191; Sat, 25 Oct 2014 15:47:02 -0700 (PDT) (envelope-from jmg) Date: Sat, 25 Oct 2014 15:47:02 -0700 From: John-Mark Gurney To: Mateusz Guzik Subject: Re: refcount_release_take_##lock Message-ID: <20141025224702.GY82214@funkthat.com> Mail-Followup-To: Mateusz Guzik , freebsd-arch@freebsd.org References: <20141025184448.GA19066@dft-labs.eu> <20141025190407.GU82214@funkthat.com> <20141025192632.GB19066@dft-labs.eu> <20141025195334.GW82214@funkthat.com> <20141025201240.GC19066@dft-labs.eu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141025201240.GC19066@dft-labs.eu> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Sat, 25 Oct 2014 15:47:03 -0700 (PDT) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 22:47:05 -0000 Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 22:12 +0200: > On Sat, Oct 25, 2014 at 12:53:34PM -0700, John-Mark Gurney wrote: > > Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 21:26 +0200: > > > On Sat, Oct 25, 2014 at 12:04:07PM -0700, John-Mark Gurney wrote: > > > > Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 20:44 +0200: > > > > > The following idiom is used here and there: > > > > > > > > > > int old; > > > > > old = obj->ref; > > > > > if (old > 1 && atomic_cmpset_int(&obj->ref, old, old -1)) > > > > > return; > > > > > lock(&something); > > > > > if (refcount_release(&obj->ref) == 0) { > > > > > unlock(&something); > > > > > return; > > > > > } > > > > > free up > > > > > unlock(&something); > > > > > > > > > > ========== > > > > > > > > Couldn't this be better written as: > > > > if (__predict_false(refcount_release(&obj->ref) == 0)) { > > > > lock(&something); > > > > if (__predict_true(!obj->ref)) { > > > > free up > > > > } > > > > unlock(&something); > > > > } > > > > > > > > The reason I'm asking is that I changed how IPsec SA ref counting was > > > > handled, and used something similar... > > > > > > > > My code gets rid of a branch, and is better in that it uses refcount > > > > API properly, instead of using atomic_cmpset_int... > > > > > > This is used when given obj is kept on a list and code which traverses > > > it (locked) expects found objects to be valid to ref. > > > > > > If we were to reach count of 0 and then lock, it would be possible that > > > other thread refed + unrefed the object and is now trying to lock as > > > well. > > > > Per the email I wrote to Ian, this "assumption" needs to be well > > documented that though the "list" has a reference, and that this > > reference is not accounted for in the ref count... > > > > And I personally think that it's a bug for the list to not hold it's > > own reference... Yes, then you need to compare for when the ref count > > hits one, and do the lock/dec/free/unlock, but that keeps the refcount > > sane... > > Well, this is for stuff which cleans up after itself. I hope that everything cleans up after itself.. :) otherwise we'd have memory leaks everywhere... > Example usage is with per-uid stats for resource limits. These > automatically free themselves with the last cred with given uid. This example doesn't give me enough information to decide what you mean.. Is there a hash table that we look up these cred structures? or are they referenced from an already existing reference? > This has its own problems (like constant creation and destruction of > stuff for the same cred), but seems ok enough for some cases. > > Otherwise we would have to actively free these structs somehow. I'm still not sure how your example addresses this.. I believe you wrote the data structure case, but it wasn't clear that is what you were doing, and as I said, I think it's a bug to have an implicit ref in such data structures w/o properly documenting them... Part of the reason why we need documentation to make sure people don't make mistakes like these... > > > That could be remedied for type stable object by having a generation > > > counter, but I doubt it's worth it. Not to mention objects we lock here > > > are freeable :) > > > > That's too heavy weight... > > > > > > > I decided to implement it as a common function. > > > > > > > > > > We have only refcount.h and I didn't want to bloat all including code > > > > > with additional definitions and as such I came up with a macro that has > > > > > to be used in .c file and that will define appropriate inline func. > > > > > > > > > > I'm definitely looking for better names for REFCOUNT_RELEASE_TAKE_USE_ > > > > > macro, assuming it has to stay. > > > > > > > > You could shorten it to REFCNT_REL_TAKE_ > > > > > > > > > > All function use full 'refcount_release' and the like, so that would be > > > inconsistent. > > > > > > Losing 'take' may be an option, I don't know. > > > > Yeh, the only advantage is that it only appears once per file used, > > so it's not THAT long... > > > > > > > Comments? > > > > > > > > > > > Will you update the refcount(9) man page w/ documentation before > > > > committing? > > > > > > Sure. > > > > Thanks. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."