Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Jan 2017 13:10:33 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Mateusz Guzik <mjguzik@gmail.com>
Cc:        Bruce Evans <brde@optusnet.com.au>, freebsd-arch@freebsd.org
Subject:   Re: __read_only in the kernel
Message-ID:  <20170113130942.E1016@besplex.bde.org>
In-Reply-To: <20161229153138.GC29676@dft-labs.eu>
References:  <20161127212503.GA23218@dft-labs.eu> <20161130011033.GA20999@dft-labs.eu> <20161201070246.S1023@besplex.bde.org> <20161229114554.GA29676@dft-labs.eu> <20161229153138.GC29676@dft-labs.eu>

next in thread | previous in thread | raw e-mail | index | archive | help
[Re-sent to reach freebsd-arch.]

[This reply is late, so I quoted everything.]

On Thu, 29 Dec 2016, Mateusz Guzik wrote:

> On Thu, Dec 29, 2016 at 12:45:55PM +0100, Mateusz Guzik wrote:
>> On Thu, Dec 01, 2016 at 08:05:54AM +1100, Bruce Evans wrote:
>>> On Wed, 30 Nov 2016, Mateusz Guzik wrote:
>>>>> diff --git a/sys/amd64/include/param.h b/sys/amd64/include/param.h
>>>>> index a619e395..ab66e79 100644
>>>>> --- a/sys/amd64/include/param.h
>>>>> +++ b/sys/amd64/include/param.h
>>>>> @@ -152,4 +152,6 @@
>>>>> #define	INKERNEL(va) (((va) >= DMAP_MIN_ADDRESS && (va) < DMAP_MAX_ADDRESS) \
>>>>>     || ((va) >= VM_MIN_KERNEL_ADDRESS && (va) < VM_MAX_KERNEL_ADDRESS))
>>>>>
>>>>> +#define	__read_mostly	__attribute__((__section__(".data.read_mostly")))
>>>>> +
>>>>> #endif /* !_AMD64_INCLUDE_PARAM_H_ */
>>>
>>> 1. Hard-coded gccism: __attribute__().
>>> 1a.  Non-use of the FreeBSD macro __section() whose purpose is to make it
>>>    easy to avoid using __attribute__() for precisely what it is used for
>>>    here.
>>> 2. Misplaced definition.  Such definitions go in <sys/cdefs.h>.  This one
>>>    has no dependencies on amd64 except possibly for bugs elsewhere, but
>>>    is only in an amd64 header.
>>>
>> [..]
>>> According to userland tests, section statements like the above and the
>>> ones in <sys/cdefs.h> don't need any linker support to work, since they
>>> create sections as necessary.
>>>
>>> So the above definition in <sys/cdefs.h> should almost perfectly for
>>> all arches, even without linker support.  Style bug (1) is smaller if
>>> it is there.
>>>
>>
>> I wanted to avoid providing the definition for archs which don't have
>> the linker script bit and this was the only header I found which is md
>> and effectively always included.
>>
>> Indeed it seems the section is harmless even without the explicit
>> support.
>>
>>>>> diff --git a/sys/conf/ldscript.amd64 b/sys/conf/ldscript.amd64
>>>>> index 5d86b03..ae98447 100644
>>>>> --- a/sys/conf/ldscript.amd64
>>>>> +++ b/sys/conf/ldscript.amd64
>>>>> @@ -151,6 +151,11 @@ SECTIONS
>>>>>     KEEP (*(.gnu.linkonce.d.*personality*))
>>>>>   }
>>>>>   .data1          : { *(.data1) }
>>>>> +  .data_read_mostly :
>>>>> +  {
>>>>> +   *(.data.read_mostly)
>>>>> +   . = ALIGN(64);
>>>>> +  }
>>>>>   _edata = .; PROVIDE (edata = .);
>>>>>   __bss_start = .;
>>>>>   .bss            :
>>>
>>> For arches without this linker support, the variables would be grouped
>>> but not aligned so much.
>>>
>>> Aligning the subsection seems to be useless anyway.  This only aligns
>>> the first variable in the subsection.  Most variables are still only
>>> aligned according to their natural or specified alignment.  This is
>>> rarely as large as 64.  But I think variables in the subsection can
>>> be more aligned than the subsection.  If they had to be (as in a.out),
>>> then it is the responsibility of the linker to align the subsection
>>> to more than the default if a single variable in the subsection needs
>>> more than the default.
>>>
>>
>> With the indended use grouping side-by-side is beneficial - as the vars
>> in question are supposed to be rarely modified, there is no problem with
>> them sharing a cache line. Making them all use dedicated lines would
>> only waste memory.
>>
>> That said, what about the patch below. I also grepped the tree and found
>> 2 surprises, handled in the patch.
>>
>
> Scratch the previous patch. I extended it with __exclusive_cache_line
> (happy with ideas for a better name) - to be used for something which
> has to be alone in the cacheline, e.g. an atomic counter.
>
> diff --git a/sys/compat/linuxkpi/common/include/linux/compiler.h b/sys/compat/linuxkpi/common/include/linux/compiler.h
> index c780abc..c32f1fa 100644
> --- a/sys/compat/linuxkpi/common/include/linux/compiler.h
> +++ b/sys/compat/linuxkpi/common/include/linux/compiler.h
> @@ -67,7 +67,6 @@
> #define typeof(x)			__typeof(x)
>
> #define	uninitialized_var(x)		x = x
> -#define	__read_mostly __attribute__((__section__(".data.read_mostly")))
> #define	__always_unused			__unused
> #define	__must_check			__result_use_check
>
> diff --git a/sys/conf/ldscript.amd64 b/sys/conf/ldscript.amd64
> index 5d86b03..45685a4 100644
> --- a/sys/conf/ldscript.amd64
> +++ b/sys/conf/ldscript.amd64
> @@ -150,6 +150,17 @@ SECTIONS
>     *(.data .data.* .gnu.linkonce.d.*)
>     KEEP (*(.gnu.linkonce.d.*personality*))
>   }
> +  . = ALIGN(64);
> +  .data.read_mostly :
> +  {
> +    *(.data.read_mostly)
> +  }
> +  . = ALIGN(64);
> +  .data.exclusive_cache_line :
> +  {
> +    *(.data.exclusive_cache_line)
> +  }
> +  . = ALIGN(64);
>   .data1          : { *(.data1) }
>   _edata = .; PROVIDE (edata = .);
>   __bss_start = .;

I don't really understand this.  Is it still needed?  I forget if you
need separate sections to keep things together for more than the
purpose of aligning them.  I think objects aligned to the cache line
size or larger can live almost anywhere in the address space without
making much difference to cache effectiveness on most arches including
amd64.

> diff --git a/sys/dev/drm2/drm_os_freebsd.h b/sys/dev/drm2/drm_os_freebsd.h
> index dc01c6a..11c9feb 100644
> --- a/sys/dev/drm2/drm_os_freebsd.h
> +++ b/sys/dev/drm2/drm_os_freebsd.h
> @@ -80,7 +80,6 @@ typedef void			irqreturn_t;
>
> #define	__init
> #define	__exit
> -#define	__read_mostly
>
> #define	BUILD_BUG_ON(x)		CTASSERT(!(x))
> #define	BUILD_BUG_ON_NOT_POWER_OF_2(x)
> diff --git a/sys/sys/systm.h b/sys/sys/systm.h
> index a1ce9b4..719e063 100644
> --- a/sys/sys/systm.h
> +++ b/sys/sys/systm.h
> @@ -445,4 +445,8 @@ extern void (*softdep_ast_cleanup)(void);
>
> void counted_warning(unsigned *counter, const char *msg);
>
> +#define	__read_mostly		__section(".data.read_mostly")
> +#define	__exclusive_cache_line	__aligned(CACHE_LINE_SIZE) \
> +				    __section(".data.exclusive_cache_line")
> +
> #endif /* !_SYS_SYSTM_H_ */

systm.h is better for the kernel, but will we want this for userland?

systm.h is randomly ordered, and appending to it with the new definitions
inversely ordered improves its randomness.  Similarly for style.  Indentation
for multi-line #define's is more random now.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170113130942.E1016>