Date: Thu, 19 May 2016 06:03:54 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Ian Lepore <ian@freebsd.org> Cc: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>, Nathan Whitehorn <nwhitehorn@freebsd.org>, Justin Hibbits <chmeeedalf@gmail.com>, Scott Long <scottl@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r300154 - head/sys/net Message-ID: <20160519043606.G1061@besplex.bde.org> In-Reply-To: <1463594263.1180.298.camel@freebsd.org> References: <201605181545.u4IFjCKD030751@repo.freebsd.org> <20160518105033.1eae7432@zhabar.knownspace> <759d085c-a485-c2ed-5d70-26eb4d27cdc2@freebsd.org> <1463592737.1180.294.camel@freebsd.org> <1EED3540-DCCF-40D2-91BA-CE0AA54C5D08@lists.zabbadoz.net> <1463594263.1180.298.camel@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 18 May 2016, Ian Lepore wrote: > On Wed, 2016-05-18 at 17:35 +0000, Bjoern A. Zeeb wrote: >>> On 18 May 2016, at 17:32 , Ian Lepore <ian@freebsd.org> wrote: >>> >>> On Wed, 2016-05-18 at 10:14 -0700, Nathan Whitehorn wrote: >>> ... >>> It may be more complicated than that, though. armv6 can do 64-bit >>> atomics even tho it's 32-bit. armv4, also 32-bit, can do 64-bit >>> atomics in the kernel but not in userland. >>> >>> Maybe machine/atomic.h needs a #define that says whether 64-bit ops >>> are >>> available in the current compilation unit. (And likewise for other >>> bit >>> sizes if we have arches that have other limitations.) >> >> Question because I didn=A2t follow the details, but how was this solved >> for the COUNTERS framework? Using special code that pessimizes old machines on 32-bit arches especially= =2E For example, incrementing a 32-bit network counter used to take 1 inlined counter++ statement (as little as 1 instruction on i386, but it is a read-modify-write instruction and thus no faster than separate instructions, and if the counter is shared this is almost as slow as a locked instruction in some cases). This now takes an if_inc_counter() function call which takes 33 instruction= s altogether on i386 with certain nonstandard not very optimal CFLAGS, and 9 instructions on amd64. (COUNTER64() code is inlined, and the function call is a separate pessimization. It costs about half of the 9 instructions on amd64 and its instructions are relatively heavyweight.) This is when i386 has cmpxchgb. The single cmpxchg8b instruction is heavier weight than a 32-bit memory increment and using it takes lots of control logic. > iirc, each platform implements counters its own way, probably the wrong > way on all of them except x86. I think other arches just use compatiblity code which uses critical sections. This is not so bad. It might be faster than using cmpxchg8b depending on how fast critical_enter() and critical_exit() are. Unfortunately, they are not very fast. They are functions too, and on i386 critical_enter() takes 20+ instructions. critical_exit() takes more, and debugging is broken and caused a panic when I tried to trace through critical_exit(). That is so slow that hard-disabling interrupts is probably faster. Network drivers were mostly written under the assumption that they are running on a UP system and incrementing a counter is inline and fast, so they increment counters without worrying about the overhead for this. 33 instructions for 2 if_inc_counter()s per packet is about a 1% pessimization for bge on my slow hardware. > For some crazy reason the docs for COUNTERS say that it does not use > atomics. I have no idea why the docs for an API are dictating > implementation, but I suspect it's because atomics are more expensive > on x86 than other alternatives. So the arm code slavishly avoids using > atomics in COUNTERS even though doing so would be more effecient than > the current copied-from-x86 code. Other places just hard-code use of PCPU although that is also more complicated and uglier than counter++. Counters in PCPU only exist because full atomics are probably slower on all arches and much slower on most arches. Most 64-bit PCPU accesses on 32-bit arches are broken since they are not atomic even for 1 CPU. COUNTER64() is more careful to a fault. arm PCPU_INC() seems to be broken even for 32-bit accesses. In the non-SMP case, it just does pcpu->pc_ ## member++ and in the SMP case it does the same with a register pointer instead of a global. I wrote some alternative x86 implementations that are at least 20% faster than the cmpxchg8b method, but the best method is clearly to use only 32-bit low-level counters and add up the counters in a daemon. The daemon shouldn't run very often. There aren't many counters except i/o byte counters that want to wrap in 32 bits more than once per hour. Even 100 Gbps ethernet can only do 150 Mpps so it takes at least 30 seconds to wrap. Bruce From owner-svn-src-all@freebsd.org Wed May 18 20:05:45 2016 Return-Path: <owner-svn-src-all@freebsd.org> Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 61B1AB41BD9; Wed, 18 May 2016 20:05:45 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id EC29A111D; Wed, 18 May 2016 20:05:44 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id A01833C57DF; Thu, 19 May 2016 06:05:37 +1000 (AEST) Date: Thu, 19 May 2016 06:05:36 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> X-X-Sender: bde@besplex.bde.org To: Bruce Evans <brde@optusnet.com.au> cc: yBryan Drewery <bdrewery@freebsd.org>, Gleb Smirnoff <glebius@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-releng@freebsd.org Subject: Re: svn commit: r300088 - in releng/9.3: . sys/conf sys/dev/kbd In-Reply-To: <20160518124147.V7042@besplex.bde.org> Message-ID: <20160519060448.B1270@besplex.bde.org> References: <201605172228.u4HMSbhj012124@repo.freebsd.org> <14a8d29d-bc14-3f96-57a4-81f1b6dfdd82@FreeBSD.org> <20160517230710.GB1015@FreeBSD.org> <38ca6091-5607-5796-9f6e-7f2d6c117707@FreeBSD.org> <20160518124147.V7042@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=YdjKeqWGE6BAugsnKCQA:9 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" <svn-src-all.freebsd.org> List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-all>, <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/> List-Post: <mailto:svn-src-all@freebsd.org> List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help> List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-all>, <mailto:svn-src-all-request@freebsd.org?subject=subscribe> X-List-Received-Date: Wed, 18 May 2016 20:05:45 -0000 On Wed, 18 May 2016, Bruce Evans wrote: > On Tue, 17 May 2016, Bryan Drewery wrote: > >> On 5/17/16 4:07 PM, Gleb Smirnoff wrote: >>> On Tue, May 17, 2016 at 03:59:26PM -0700, Bryan Drewery wrote: >>> B> > Author: glebius >>> B> > Date: Tue May 17 22:28:36 2016 >>> B> > New Revision: 300088 >>> B> > URL: https://svnweb.freebsd.org/changeset/base/300088 >>> B> > >>> B> > Log: >>> B> > - Use unsigned version of min() when handling arguments of SETFKEY >>> ioctl. >>> B> > - Validate that user supplied control message length in sendmsg(2) >>> B> > is not negative. >>> B> >>> B> The sendmsg(2) change is not included here (9.3) nor in the advisory >>> but >>> B> is in the commit log. Was it intended to be changed in 9.3? >>> >>> That was my failure to mention SA-16:19 in commit message for 9.3. It >>> doesn't >>> apply to 9.x. >>> >>> B> Plus the only consumer I see is sendit() which seems to be protected >>> B> already from negative values when not using COMPAT_43: >>> B> >>> B> > if (mp->msg_controllen < sizeof(struct cmsghdr) >>> B> > #ifdef COMPAT_OLDSOCK >>> B> > && mp->msg_flags != MSG_COMPAT >>> B> > #endif >>> B> > ) { >>> B> > error = EINVAL; >>> B> > goto bad; >>> B> > } >>> B> > error = sockargs(&control, mp->msg_control, >>> B> > mp->msg_controllen, MT_CONTROL); >>> >>> No, it isn't protected. In the comparison (mp->msg_controllen < >>> sizeof(struct cmsghdr)) >>> both values are unsigned. Later in sockargs() it is treated as signed. > > But it is protected (except on exotic unsupported arches). The above is > a complete bounds check for mp->msg_controllen, written in an obfuscated > way using an unsigned type botch/hack. Negative values are normally > promoted to large unsigned values, so they fail this check and sockargs() > is never called with them. > > On exotic arches, the analysis is more complicated and the hack doesn't > work. ... Oops. I read this sort of backwards. The unsign botches are somewhat larger, and more like the one in kbd.c: - this only checks the lower bound, so if the operands were negative then they would pass instead of fail the check after bogus unsign extension - the operands are actually both unsigned, since msg_controllen already has unsigned poisoning. It was poisoned even in FreeBSD-1. It was u_int then. It is still u_int in 4.4BSD-Lite*. Now it is socklen_t, which is uint32_t. POSIX has messes for this. At least in the 2001 version, it doesn't seem to require the poisoning, but recommends that applications not use values above 2**31-1 [since these values require the poisoning to represent, and are not very useful except for opening security holes like here]. - so after passing the lower bound check, msg_controllen may have a large unsigned 32-bit value. Undefined behaviour occurs when we pass this to sockargs() which doesn't have unsign poisoning. Except on unsupported exotic arches, the behaviour is to overflow to a negative value. - sockargs() then checks the overflowed value. This is robust enough if the overflow gives any value at all, but still bogus. Correct code would do bounds checks before calling sockargs (of the form val >= min && val <= INT_MAX), but there are several callers and it is convenient to check only in sockargs(). For that, sockargs must take a parameter with the same type as msg_controllen, although old unsign botches force this to be unsigned (precisely socklen_t). It is too late to change socklen_t back to int. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160519043606.G1061>