Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 21 Jan 2012 22:16:20 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Andreas Tobler <andreast@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: svn commit: r230390 - head/sys/conf
Message-ID:  <20120121194019.J2720@besplex.bde.org>
In-Reply-To: <4F1A6032.2040900@FreeBSD.org>
References:  <201201201849.q0KInmic054086@svn.freebsd.org> <20120121131914.W1596@besplex.bde.org> <4F1A6032.2040900@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 21 Jan 2012, Andreas Tobler wrote:

> On 21.01.12 03:52, Bruce Evans wrote:
>> On Fri, 20 Jan 2012, Andreas Tobler wrote:
>> 
>>> Log:
>>>   Disable GUPROF on archs other than i386/amd64 since the fine details are 
>>> not
>>>   implemented.
>> 
>> This was intentionally not done.  Just don't use config -pp on arches that
>> don't suppport it.  "profile 2" is already left out of NOTES for all
>> arches except amd64, i386 and powerpc.  But the configuration of "profile"
>> in the NOTES for these arches is broken anyway.  It doesn't exist.  Thus
>> even normal profiling is not tested by NOTES on these arches.
>
> I sent this patch to -CURRENT:
>
> http://lists.freebsd.org/pipermail/freebsd-current/2012-January/031095.html
>
> ...and got no feedback.
>
> Is there a better place to send such patches for review?

Probably not, but I don't read -current.

> I got positive feedback from marius regarding the sparc64 case.
>
>>> Modified: head/sys/conf/kern.pre.mk
>>> ==============================================================================
>>> --- head/sys/conf/kern.pre.mk	Fri Jan 20 17:25:15 2012 
>>> (r230389)
>>> +++ head/sys/conf/kern.pre.mk	Fri Jan 20 18:49:47 2012 
>>> (r230390)
>>> @@ -103,11 +103,14 @@ ASM_CFLAGS= -x assembler-with-cpp -DLOCO
>>> 
>>> .if defined(PROFLEVEL)&&  ${PROFLEVEL}>= 1
>>> CFLAGS+=	-DGPROF -falign-functions=16
>>> +PROF=	-pg
>>> .if ${PROFLEVEL}>= 2
>>> CFLAGS+=	-DGPROF4 -DGUPROF
>>> -PROF=	-pg -mprofiler-epilogue
>>> +.if ${MACHINE_CPUARCH} == "i386" || ${MACHINE_CPUARCH} == "amd64"
>> 
>> Style bug: unsorted tests.
>
> Copied from here:
>
> sys/conf/kern.post.mk: line 174

There are lots of bad examples to copy from.  I sometimes complain about
them to try to stop them spreading :-).

>
>> 
>>> +PROF+=	-mprofiler-epilogue

> Do you want me to revert?

Yes.  There is no point in having several layers of ifdefs for this.

I have already reduced the ifdef tangle for this once before.  In 2006
I removed the ifdefs for icc.  These reduced to a .error failure
followed by about 8 unreachable lines.  I changed this to just the
.error failure.  Someone later removed the icc support.  But none of
this was messy enough to be correct even on i386.  i386 NOTES puts
sets PROFLEVEL to 2 unconditionally, so icc can never have possibly
worked with NOTES.

Bute you might want to look at the porability of the ordinary profiling
case.  Here it is again, copied from above.  Beware that something
mangled the whitespace (space characters but not tabs):

% .if defined(PROFLEVEL)&&  ${PROFLEVEL}>= 1
% CFLAGS+=	-DGPROF -falign-functions=16
% +PROF=	-pg
% .if ${PROFLEVEL}>= 2
% ...

-DGPROF sets a kernel option so is OK.  -pg is unportable but the same
for all supported arches (?) so OK.  -falign-functions=LARGE is used
to reduce the size of profiling buffs.  Function alignment used to
default to 4 on i386 but is now often 16 or more.  Probably it is
larger on other arches.  It should match the definition of
FUNCTION_ALIGNMENT in <machine.h>, and in fact matches for all arches
except powerpc* and sparc64:

% amd64/include/profile.h:#define	FUNCTION_ALIGNMENT	16
% amd64/include/profile.h:#define	FUNCTION_ALIGNMENT	4
% arm/include/profile.h:#define	FUNCTION_ALIGNMENT	16

The tab after #define is corrupt only for arm.

% i386/include/profile.h:#define	FUNCTION_ALIGNMENT	16
% i386/include/profile.h:#define	FUNCTION_ALIGNMENT	4

On amd64 and i386, 4 is for userland and 16 is for the kernel.  Userland
should use the same as the kernel to save space in the same way, but
I never got around to fixing it.  (Same for the histcounter size.  It
should be 32 or 64 bits, since 16 bits overflows after as little as
64 seconds with profhz = 1024, or after a fraction of a second with
an adequately large profhz.  GUPROF uses 64-bit counters since even
32 bits overflowed ~10 years ago when the profiling pseudo-frequency
of the CPU clock frequency reached ~1GHz).

% ia64/include/profile.h:#define	FUNCTION_ALIGNMENT	16

arm and ia64 use 16 for userland too.  This seems too large for arm.
arm's ALIGN_TEXT gives no aligment at all (.align 0.  Isn't that
impossible?  It may be a misspelling of .p2align 0).  This seems too
small for ia64.  ia64's ALIGN_TEXT gives 32-byte alignment.

% mips/include/profile.h:#define	FUNCTION_ALIGNMENT	16
% mips/include/profile.h:#define	FUNCTION_ALIGNMENT	4

Mips copies i386 for the different kernel/user alignments.  Its asm.h
doesn't define ALIGN_TEXT, and only has one alignment statement -- an
apparently nonsensical ".align 3" one.  Maybe .align still means
.p2align on arm and mips, but this .align is weirdly placed (after
a data allocation for a string instead of before).  In asm code,
mips uses just 3 alignment statements: 1 .align 4 for data, 1 .align 4
for a label, and 1 .align 5 for another label.  Apparently .align does
still mean .p2align on mips.

% powerpc/include/profile.h:#define	FUNCTION_ALIGNMENT	4

powerpc* mostly uses .align 2.  This agrees with FUNCTION_ALIGNMENT,
assuming that .align actually means .p2align.  There are the following
warts and strangenesses:
- libc/powerpc*/SYS.h doesn't use ALIGN_TEXT from asm.h, but hard-codes
   .align 2 in 4 places in each.  No other arch uses hard-coded alignment
   in SYS.h.  Maybe you already fixed this (I checked the current version
   but not your latest patch).
- sys/powerpc/include/asm.h doesn't define ALIGN_TEXT.  It seems to use
   .align 4 for the function alignment in _ENTRY().  This doesn't match
   SYS.h, and assuming that .align actually means .p2align, it doesn't
   match FUNCTION_ALIGNMENT.  But it matches kern.pre.mk.  In asm code,
   powerpc uses many more hard-coded alignments than mips.  These are
   mostly .align 4, but INTERRUPT() uses .align 5 (i386 uses
   SUPERALIGN_TEXT for interrupts; this has always been 32).  Most
   of the hard-coded aligments are for data.

% sparc64/include/profile.h:#define	FUNCTION_ALIGNMENT	32

This agrees with ALIGN_TEXT but not with kern.pre.mk.  I think the
only consequence of making this too small in kern.pre.mk is wasting
space.

The alignments are too hard-coded even in the ALIGN_* macros.  Compilers
vary the alignment for C code a lot depending on -march -and -Os.  For
example, the default on i386 is now 16-byte alignment, and all march's
that I tried except i386 don't change change this.  -march=i386 changes
to 4-byte alignment.  -Os changes this to 1-byte alignment.  But
ALIGN_TEXT on i386 still always gives 4-byte alignment, since it was
tuned for original i386's and never changed.  Compilers now vary the 
alignment even more for branch targets, but asm code mostly doesn't
try to align labels to any better than what ALIGN_TEXT gives.  It
also mostly doesn't bother with the not-so-new syntax that allows
padding to a large alignment if and only if the padding isn't very
large or slow to execute.

No one really notices the time and space losses and gains from getting
this wrong or accidentally right, since the differences are even tinier
than from using -arch.  But it is inelegant to have macros ALIGN_* for
changing this easily and then not use them to change anything for
almost 20 years on i386.

The unportability of .align is (mis)documented in as.info, as a list
of systems on which .align gives the alignment in bytes.  The systems
on which .align gives the log2 of the aligmment in bytes is mostly
only documented as "other".  Since amd64 is too new to be ever mentioned
in as.info, it is claimed to have the log2 method, but it is actually
too new to have ever have had that.  The .p2align directive should
probably used for clarity, although it is only portable within gas.
gas also provides .balign.

i386 was supposed to have been converted to use .p2align because of
the possible confusion from this (IIRC, .align for it used to mean
.p2align.  as.info says that .align for i386 means .p2align for aout
but actually means .align for elf).  Most places in i386 were converted,
but some weren't (with the help of too man hard-coded .align's), leaving
them quite broken:

- acpica/acpi_wakecode.S: uses .align 4.  It's not clear whether it
   wants 4 or 16.  This is 16-bit code (with the .align misplaced outside
   of the .code16 section so I would expect it to have no effect on that
   section).  4-byte alignment is more than adequate for 16-bit code.
   Most hard-coded .align's break FUNCTION_ALIGNMENT for the profiling
   case by not using the switch to that alignment given by using the
   macros, but here this doesn't matter since the profiling code doesn't
   support 16-bit code.

- acpica/acpi_wakeup.c: uses .align 4.  This one is garbage and has no
   effect.  It is inside an inline asm for a whole function, which does
   the alignment correctly for the function using .p2align 2.  Then the
   .align 4 misplaced after the function label has no effect, since th
   alignment is already 4.  Profiling of this function is null (broken).
   The C macro ALIGN_TEXT is hard to use in inline functions (requires
   messy stringization and string concatenation.)  The inline asm is
   fairly horrible and has 2 functions in one asm statement.  The
   horriblness for the .align 4 is only in the first of these.

- bios/smapi_bios.S.  This starts with a .align directive with no args.
   I don't know what that means.  Profiling is null (broken) in this
   file.  This file abuses the user header <machine/asm.h>.  The kernel
   header for asm files is <machine/asmacros.h>.  (asmacros.h was
   originally a mistake, but I found it useful for keeping the kernel
   definitions separate).  Using the correct header should fix profiling,
   but profiling near the BIOS or 16-bit code might be fragile.

- i386/exception.s: kdtrace breakage: hard-coded .align 4, unlike in all
   old code in the file, but only for data.

- include/asmacros.h: xen breakage: the disgusting ELFNOTE() macro is
   only used by xen, but is defined unconditionally.  Other style bugs
   in it include:
   - duplication to support K&R (it even says that it is for the
     -traditional case, which gcc stopped supporting 10 years or so ago)
   - formatting totally unlike the rest of the file, with no indentation
     except 2 spaces for some directives, and for comments, and for the
     semicolons and backslashes which least need it
   - hard-coded alignment (only for data)
   - uses .align instead of .p2align for alignment.

So i386 was mostly converted to use .p2align.  In FreeBSD-4, the
only .align's in it were the 2 in acpica and many in biosboot, and
biosboot needed them since it was aout.

amd64 is cleaner because it somehow doesn't have any .align's in acpica/,
and it doesn't have bios/ or xen.  It only has the kdtrace breakage.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120121194019.J2720>