Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Jul 2014 03:53:10 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Ian Lepore <ian@freebsd.org>
Cc:        arch@freebsd.org, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: [CFR] mge driver / elf reloc
Message-ID:  <20140722022100.S2586@besplex.bde.org>
In-Reply-To: <1405955048.85788.74.camel@revolution.hippie.lan>
References:  <14D22EA6-B73C-47BA-9A86-A957D24F23B8@freebsd.org> <1405810447.85788.41.camel@revolution.hippie.lan> <20140720220514.GP45513@funkthat.com> <F6D53A17-FED0-4F08-BB5B-9F66C5AF5EF6@kientzle.com> <20140720231056.GQ45513@funkthat.com> <9464C309-B390-4A27-981A-E854921B1C98@bsdimp.com> <1405955048.85788.74.camel@revolution.hippie.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 21 Jul 2014, Ian Lepore wrote:

> On Mon, 2014-07-21 at 08:46 -0600, Warner Losh wrote:
>> On Jul 20, 2014, at 5:10 PM, John-Mark Gurney <jmg@funkthat.com> wrote:
>>
>>> Tim Kientzle wrote this message on Sun, Jul 20, 2014 at 15:25 -0700:
>>>>
>>>> On Jul 20, 2014, at 3:05 PM, John-Mark Gurney <jmg@funkthat.com> wrote=
:
>>>>
>>>>> Ian Lepore wrote this message on Sat, Jul 19, 2014 at 16:54 -0600:
>>>>>> Sorry to take so long to reply to this, I'm trying to get caught up.=
  I
>>>>>> see you've already committed the mge fixes.  I think the ELF alignme=
nt
>>>>>> fix looks good and should also be committed.
>>>>>
>>>>> So, re the elf alignment...
>>>>>
>>>>> I think we should get a set of macros that handle load/stores to/from
>>>>> unaligned addresses that are transparent to the caller....  I need
>>>>> these for some other code I'm writing...
>>>>>
>>>>> I thought Open/Net had these available, but I can't seem to find them
>>>>> right now...
>>>>
>>>> $ man 9 byteorder
>>>>
>>>> is most of what you want, lacking only some aliases to pick
>>>> the correct macro for native byte order.
>>>
>>> Um, those doesn't help if you want native endian order=85
>>
>> Ummm, yes they do. enc converts from native order. dec decodes to native=
 byte
>> order. They are more general cases than the ntoh* functions that are mor=
e traditional
>> since they also work on byte streams that may not be completely aligned =
when
>> sitting in memory. Which is what you are asking for.
>>
>>> Also, only the enc/dec functions are documented to work on non-aligned
>>> address, so that doesn't help in most cases=85
>>
>> They work on all addresses. They are even documented to work on any addr=
ess:
>>
>>      The be16enc(), be16dec(), be32enc(), be32dec(), be64enc(), be64dec(=
),
>>      le16enc(), le16dec(), le32enc(), le32dec(), le64enc(), and le64dec(=
)
>>      functions encode and decode integers to/from byte strings on any al=
ign-
>>      ment in big/little endian format.
>>
>> So they are quite useful in general. Peeking under the covers at the imp=
lementation
>> also shows they will work for any alignment, so I=92m having trouble und=
erstanding
>> where this objection is really coming from.
>
> The functionality requested was alignment-safe copy/assign without any
> endian change.  The code in question was conceptually something like
>
>   if (pointer & 0x03)
>      do-alignment-safe-thing
>   else
>      directly-deref-the-pointer

The enc/dec functions could be pessimized like that, but are actually
pessimized in other ways.

Pessimizations in the above include conditional branches for the check
and large code.  Everything has to be inlined else it is much slower
than direct dereference, but then it has scattered branches that may
exhaust the branch prediction cache (if any) and large code that may
exhaust other caches.

The enc/dec functions don't have any branches, but they have large code
like:

% static __inline void
% le32enc(void *pp, uint32_t u)
% {
% =09uint8_t *p =3D (uint8_t *)pp;
%=20
% =09p[0] =3D u & 0xff;
% =09p[1] =3D (u >> 8) & 0xff;
% =09p[2] =3D (u >> 16) & 0xff;
% =09p[3] =3D (u >> 24) & 0xff;
% }

Fastest on x86 (unless alignment check exceptions are enabled in CR0)
is to just do the access and let the hardware combine the bytes.  The
accesses should probably be written someything like le32enc() above,
but more carefully.  Compilers should convert the above to a single
32-bit access on x86 (unless alignment check exceptions...).  The
pessimizations are that compilers aren't that smart, and/or the
enc/dec functions are written in such a way that compilers are
constrained from doing this.  IIRC, clang does this for one direction
only and gcc-4.2.1 is not smart enough to do this for either direction.
IIRC, the problematic direction is the above one, where the value is
returned indirectly.  Sprinkling __restrict might help.

When there is endianness conversion, there must be both an access
(hopefully 32 bits) and swapping operation on a register, unless all
the bytes are copied from memory to memory 1 or 2 at a time.  clang
is smart about converting the large expressions in __bswapNN_gen()
into single bswap instructions if the original rvalue is in a
register, but IIRC it is not so smart for the equivalent conversions
written with bytes and indirections.  (x86 and IIRC some other arches
have the __bswapNN_gen() macros.  These macros are just as MI as the
enc/dec inlines and more carefully written but they have not been
deduplicated due to namespace problems inhibiting a complete cleanup of
the MD endian.h files.)

Accesses are often pessimized using the __packed mistake.  This bug is
still uncommon in ipv4 headers, but is used with minimal damage in
struct ip.  struct ip is declared as __packed and __aligned(4).  Here
__aligned(4) tells that the struct aligned normally although it is
packed.  ipv6 headers ask for full pessimizations by declaring almost
everything as __packed without __aligned(N).  This tells the compiler
that the struct might only be 1-byte aligned, so all accesses to it
must be 1 byte at a time except on arches like x86 where the above
optimization applies (the optimization is much easier to do when the
accesses are not expressed bytewise in the code).  Bytewise accesses are
also less inherently atomic.  I think there used to be a problem with
__packed and __aligned() attributes not being inherited by function
pointers, but can't fdind any problem now.

ia64 code for loading an int from a __packed struct (p.x):

%         addl r16 =3D @ltoffx(p#), r1
%         ;;
%         ld8.mov r16 =3D [r16], p#
%         ;;
%         mov r14 =3D r16
%         ;;
%         ld1 r15 =3D [r14], 1
%         ;;
%         ld1 r14 =3D [r14]
%         ;;
%         shl r14 =3D r14, 8
%         ;;
%         or r14 =3D r15, r14
%         adds r15 =3D 2, r16
%         ;;
%         ld1 r15 =3D [r15]
%         ;;
%         shl r15 =3D r15, 16
%         ;;
%         or r15 =3D r14, r15
%         adds r16 =3D 3, r16
%         ;;
%         ld1 r8 =3D [r16]
%         ;;
%         shl r8 =3D r8, 24
%         ;;
%         or r8 =3D r15, r8

It seems to have 5 memory references.  The enc/dec32 functions produce
a similar mess on x86 when the compiler can't optimize them.

This is with gcc.  clang doesn't work on ia64 and/or pluto.

ia64 code for loading an int from a __packed __aligned(4) struct (p.x):

         addl r14 =3D @ltoffx(p#), r1
         ;;
         ld8.mov r14 =3D [r14], p#
         ;;
         ld4 r8 =3D [r14]

This behaviour can probable be exploited in the enc/dec functions.
When there is no endianness conversion, write 32-bit accesses as
p->x where p is a pointer to a __packed __aligned(1) struct containing
x.  The compiler will then produce the above mess on ia64 and a single
32-bit access if possible.  No ifdefs required.  When there is an
endianess conversion, do it on a register with value p->x.

Lightly tested implementation:

% struct _le32 {
% =09uint32_t _x;
% } __packed;
%=20
% #define=09le32enc(p, u)=09(((struct _le32 *)(p))->_x =3D (u))

This is simpler than the inline version, and fixes some namespace errors.

If you want avoid the mess on ia64, this can be done at compile time if
the alignment is known to be more than 1 then.  I think if can often
be known, as for struct ip (don't generate misaligned struct ip, and
if you start with one then copy to an aligned one).  The enc/dec
inline functions could handle this by being converted to macros that
take the alignment parameter.  If the alignment is not known until
runtime...don't do that.

To add an alignment parameter to the above, use something like:

#define=09_sle32(a)=09struct _le32 { uint32_t _x; } __packed __aligned(a)
#define=09le32enc(p, u, a)=09(((_sle32(a) *)(p))->_x =3D (u))

Macroizing the struct declaration to add the alignment parameter to it
and avoiding backslashes made the code less readable but even shorter.

This was tested on i386 and ia64 with alignments 1 and 4 and and gave
the expected output in asm.

Bruce
From owner-freebsd-arch@FreeBSD.ORG  Mon Jul 21 18:28:22 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 58A3F2AB;
 Mon, 21 Jul 2014 18:28:22 +0000 (UTC)
Received: from mho-01-ewr.mailhop.org (mho-03-ewr.mailhop.org [204.13.248.66])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 140C624C6;
 Mon, 21 Jul 2014 18:28:21 +0000 (UTC)
Received: from c-50-155-136-3.hsd1.co.comcast.net ([50.155.136.3]
 helo=ilsoft.org)
 by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
 (Exim 4.72) (envelope-from <ian@FreeBSD.org>)
 id 1X9IJv-000FsY-QT; Mon, 21 Jul 2014 18:28:20 +0000
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by ilsoft.org (8.14.9/8.14.9) with ESMTP id s6LISIeP033020;
 Mon, 21 Jul 2014 12:28:18 -0600 (MDT) (envelope-from ian@FreeBSD.org)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 50.155.136.3
X-Report-Abuse-To: abuse@dyndns.com (see
 http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse
 reporting information)
X-MHO-User: U2FsdGVkX1/IbHu2Ip3HrQmS1MVVZC3X
X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan
 [172.22.42.240] claimed to be [172.22.42.240]
Subject: Re: [CFR] mge driver / elf reloc
From: Ian Lepore <ian@FreeBSD.org>
To: Warner Losh <imp@bsdimp.com>
In-Reply-To: <B23A4513-FB0E-4972-91BA-976DCDF346E5@bsdimp.com>
References: <14D22EA6-B73C-47BA-9A86-A957D24F23B8@freebsd.org>
 <1405810447.85788.41.camel@revolution.hippie.lan>
 <20140720220514.GP45513@funkthat.com>
 <F6D53A17-FED0-4F08-BB5B-9F66C5AF5EF6@kientzle.com>
 <20140720231056.GQ45513@funkthat.com>
 <9464C309-B390-4A27-981A-E854921B1C98@bsdimp.com>
 <20140721162559.GS45513@funkthat.com>
 <467619B1-F530-49AF-91BF-14CA3A31908B@bsdimp.com>
 <20140721165619.GT45513@funkthat.com>
 <B23A4513-FB0E-4972-91BA-976DCDF346E5@bsdimp.com>
Content-Type: text/plain; charset="windows-1251"
Date: Mon, 21 Jul 2014 12:28:17 -0600
Message-ID: <1405967297.85788.78.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by ilsoft.org id
 s6LISIeP033020
Cc: arch@freebsd.org, freebsd-arm <freebsd-arm@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>;
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Jul 2014 18:28:22 -0000

On Mon, 2014-07-21 at 11:16 -0600, Warner Losh wrote:
> On Jul 21, 2014, at 10:56 AM, John-Mark Gurney <jmg@funkthat.com> wrote=
:
>=20
> > Warner Losh wrote this message on Mon, Jul 21, 2014 at 10:31 -0600:
> >>=20
> >> On Jul 21, 2014, at 10:25 AM, John-Mark Gurney <jmg@funkthat.com> wr=
ote:
> >>=20
> >>> Warner Losh wrote this message on Mon, Jul 21, 2014 at 08:46 -0600:
> >>>>=20
> >>>> On Jul 20, 2014, at 5:10 PM, John-Mark Gurney <jmg@funkthat.com> w=
rote:
> >>>>=20
> >>>>> Tim Kientzle wrote this message on Sun, Jul 20, 2014 at 15:25 -07=
00:
> >>>>>>=20
> >>>>>> On Jul 20, 2014, at 3:05 PM, John-Mark Gurney <jmg@funkthat.com>=
 wrote:
> >>>>>>=20
> >>>>>>> Ian Lepore wrote this message on Sat, Jul 19, 2014 at 16:54 -06=
00:
> >>>>>>>> Sorry to take so long to reply to this, I'm trying to get caug=
ht up.  I
> >>>>>>>> see you've already committed the mge fixes.  I think the ELF a=
lignment
> >>>>>>>> fix looks good and should also be committed.
> >>>>>>>=20
> >>>>>>> So, re the elf alignment...
> >>>>>>>=20
> >>>>>>> I think we should get a set of macros that handle load/stores t=
o/from
> >>>>>>> unaligned addresses that are transparent to the caller....  I n=
eed
> >>>>>>> these for some other code I'm writing...=20
> >>>>>>>=20
> >>>>>>> I thought Open/Net had these available, but I can't seem to fin=
d them
> >>>>>>> right now...
> >>>>>>=20
> >>>>>> $ man 9 byteorder
> >>>>>>=20
> >>>>>> is most of what you want, lacking only some aliases to pick
> >>>>>> the correct macro for native byte order.
> >>>>>=20
> >>>>> Um, those doesn't help if you want native endian order?
> >>>>=20
> >>>> Ummm, yes they do. enc converts from native order. dec decodes to =
native byte
> >>>=20
> >>> No they don't.. If you want to read a value in memory that is nativ=
e
> >>> endian order to native endian order (no conversion), they cannot be
> >>> used w/o using something like below?
> >>=20
> >> Missed the native to native. bcopy works, but is ugly, as you note b=
elow.
> >>=20
> >>>> order. They are more general cases than the ntoh* functions that a=
re more traditional
> >>>> since they also work on byte streams that may not be completely al=
igned when
> >>>> sitting in memory. Which is what you are asking for.
> >>>=20
> >>> So, you're saying that I now need to write code like:
> >>> #if LITTLE_ENDIAN /* or how ever this is spelled*/
> >>> 	var =3D le32enc(foo);
> >>> #else
> >>> 	var =3D be32enc(foo);
> >>> #endif
> >>>=20
> >>> If I want to read a arch native endian value?  No thank you?
> >>=20
> >> I?m not saying that at all.
> >>=20
> >>>>> Also, only the enc/dec functions are documented to work on non-al=
igned
> >>>>> address, so that doesn't help in most cases?
> >>>>=20
> >>>> They work on all addresses. They are even documented to work on an=
y address:
> >>>>=20
> >>>>    The be16enc(), be16dec(), be32enc(), be32dec(), be64enc(), be64=
dec(),
> >>>>    le16enc(), le16dec(), le32enc(), le32dec(), le64enc(), and le64=
dec()
> >>>>    functions encode and decode integers to/from byte strings on an=
y align-
> >>>>    ment in big/little endian format.
> >>>>=20
> >>>> So they are quite useful in general. Peeking under the covers at t=
he implementation
> >>>> also shows they will work for any alignment, so I?m having trouble=
 understanding
> >>>> where this objection is really coming from.
> >>>=20
> >>> There are places where you write code such as:
> >>> 	int i;
> >>> 	memcpy(&i, inp, sizeof i);
> >>> 	/* use i */
> >>>=20
> >>> In order to avoid alignment faults...  None of the functions in byt=
eorder
> >>> do NO conversion of endian, or you have to know which endian you ar=
e but
> >>> that doesn't work on MI code...
> >>>=20
> >>> Did you read what the commited code did?
> >>=20
> >> No, I missed that bit beaded on your reply (which seemed to imply yo=
u needed
> >> endian conversion) which implied the enc/dec are only documented to =
work on non-aligned
> >> which is what I was correcting.
> >=20
> > Hmm...  It appears that byteorder(9) leaks to userland though isn't
> > documented to be available in userland...  Should we fix this and mak=
e
> > it an offical userland API?  How to document it?  In my case, the
> > enc/dec version would have been enough for what I need, but not "part=
 of
> > the userland API"...  There is an xref from byteorder(3), but no
> > comment on either that they will work..
>=20
> Yes. We should fix this now that it isn=92t confined to the kernel.
>=20
> >> But maybe the more basic question is why do you even have packed
> >> structures that are native endian that you want to access as natural=
ly
> >> aligned structures? How did they become native endian and why weren?=
t
> >> they converted to a more natural layout at that time?
> >=20
> > The original message said why=85
>=20
> I personally think the original code should unconditionally call load_p=
tr() and store_ptr()
> and if the optimization for aligned access is actually worth doing, the=
 test for it should be
> in those functions rather than inline throughout the code. The code wil=
l be clearer, and
> it would be easier to optimize those cases that actually matter.
>=20
> I=92m frankly surprised that these relocations are being generated unal=
igned. Perhaps that=92s
> the real bug here that should be fixed. While I=92m OK with the origina=
l patch (subject to the
> above), I=92d be curious what other cases there are for this functional=
ity. You had said that
> you had additional use cases in the network stack, but I=92m having tro=
uble groking the
> use cases.
>=20
> If this is a huge deal, then defining functions to do this is trivial. =
I=92m just not sure it is common
> enough to need a special macro/function call in the base.
>=20
> Warner
>=20

I was trying to figure out how unaligned relocs even happen, and I think
one possibility would be something like this...

  char thing[] =3D "x";
  struct {
    char x;
    char *something;
 } __packed foo =3D {'a', thing};

Which seems contrived enough that I wonder how it happens in real life.

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140722022100.S2586>