From owner-svn-src-head@freebsd.org Thu Jun 7 04:01:02 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 368D5FF45B9; Thu, 7 Jun 2018 04:01:02 +0000 (UTC) (envelope-from rpokala@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0E20D7BC4B; Thu, 7 Jun 2018 04:01:02 +0000 (UTC) (envelope-from rpokala@freebsd.org) Received: from [172.20.12.186] (unknown [38.64.177.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: rpokala) by smtp.freebsd.org (Postfix) with ESMTPSA id 913C8B950; Thu, 7 Jun 2018 04:01:01 +0000 (UTC) (envelope-from rpokala@freebsd.org) User-Agent: Microsoft-MacOutlook/10.d.1.180523 Date: Thu, 07 Jun 2018 00:01:00 -0400 Subject: Re: svn commit: r334702 - head/sys/sys From: Ravi Pokala To: "Jonathan T. Looney" CC: Mateusz Guzik , Mateusz Guzik , src-committers , , Message-ID: <468B8AB5-D2C7-4033-9F24-6E1F94DC7137@panasas.com> Thread-Topic: svn commit: r334702 - head/sys/sys References: <201806060508.w56586c9053686@repo.freebsd.org> <6E6E92B2-7536-4281-8EAF-72823E84902E@panasas.com> <47E06039-234C-4078-A732-BFF230D2472B@panasas.com> In-Reply-To: Mime-version: 1.0 Content-type: text/plain; charset="UTF-8" Content-transfer-encoding: quoted-printable X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jun 2018 04:01:02 -0000 > I believe the theory is that the compiler (remember, this is __builtin_me= mset) can optimize away portions of the zeroing, or can optimize zeroing for= small sizes. Ahhh! I didn't consider that the compiler would be doing analysis of the la= rger context, and potentially skipping zeroing parts that are set immediatel= y after the call. Thanks! -Ravi (rpokala@) =EF=BB=BF-----Original Message----- From: "Jonathan T. Looney" Date: 2018-06-06, Wednesday at 22:58 To: Ravi Pokala Cc: Mateusz Guzik , Mateusz Guzik , src= -committers , , Subject: Re: svn commit: r334702 - head/sys/sys > On Wed, Jun 6, 2018 at 10:14 PM, Ravi Pokala wrote: >> >> -----Original Message----- >> From: on behalf of Mateusz Guzik >> Date: 2018-06-06, Wednesday at 09:01 >> To: Ravi Pokala >> Cc: Mateusz Guzik , src-committers , , >> Subject: Re: svn commit: r334702 - head/sys/sys >> >>> On Wed, Jun 6, 2018 at 1:35 PM, Ravi Pokala wrote= : >>> >>>>> + * Passing the flag down requires malloc to blindly zero the entire = object. >>>>> + * In practice a lot of the zeroing can be avoided if most of the ob= ject >>>>> + * gets explicitly initialized after the allocation. Letting the com= piler >>>>> + * zero in place gives it the opportunity to take advantage of this = state. >>>> >>>> This part, I still don't understand. :-( >>> >>> The call to bzero() is still for the full length passed in, so how does= this help? >>> >>> bzero is: >>> #define bzero(buf, len) __builtin_memset((buf), 0, (len)) >>=20 >> I'm afraid that doesn't answer my question; you're passing the full leng= th to __builtin_memset() too. >=20 > I believe the theory is that the compiler (remember, this is __builtin_me= mset) can optimize away portions of the zeroing, or can optimize zeroing for= small sizes. >=20 > For example, imagine you do this: >=20 > struct foo { > uint32_t a; > uint32_t b; > }; >=20 > struct foo * > alloc_foo(void) > { > struct foo *rv; >=20 > rv =3D malloc(sizeof(*rv), M_TMP, M_WAITOK|M_ZERO); > rv->a =3D 1; > rv->b =3D 2; > return (rv); > } >=20 > In theory, the compiler can be smart enough to know that the entire struc= ture is initialized, so it is not necessary to zero it. >=20 > (I personally have not tested how well this works in practice. However, t= his change theoretically lets the compiler be smarter and optimize away unne= eded work.) >=20 > At minimum, it should let the compiler replace calls to memset() (and the= loops there) with optimal instructions to zero the exact amount of memory t= hat needs to be initialized. (Again, I haven't personally tested how smart t= he compilers we use are about producing optimal code in this situation.) >=20 > Jonathan