Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Aug 2017 17:05:09 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Alan Somers <asomers@freebsd.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r322258 - head/sys/kern
Message-ID:  <20170809141608.I1096@besplex.bde.org>
In-Reply-To: <201708081614.v78GEVGY066448@repo.freebsd.org>
References:  <201708081614.v78GEVGY066448@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 8 Aug 2017, Alan Somers wrote:

> Log:
>  Make p1003_1b.aio_listio_max a tunable
>
>  p1003_1b.aio_listio_max is now a tunable. Its value is reflected in the
>  sysctl of the same name, and the sysconf(3) variable _SC_AIO_LISTIO_MAX.
>  Its value will be bounded from below by the compile-time constant
>  AIO_LISTIO_MAX and from above by the compile-time constant
>  MAX_AIO_QUEUE_PER_PROC and the tunable vfs.aio.max_aio_queue.

I don't like tunables or sysctls being limited or silently limiting them.
This is not done for more important sysctls like kern.maxfilesperproc.
Lots of silent runtime limiting is done for a few more important limits
like kern.maxfilesperproc.

The limiting is also buggy:
- vfs.aio.max_aio_queue isn't a tunable as stated.  It is a r/w sysctl
- if vfs.aio.max_aio_queue were a r/o tunable, then there would be a
   solvable ordering problem initializing it before using it
- since vfs.aio.max_aio_queue is only a sysctl, it is unusable at boot
   time (and module load time), but it is used there.

Other bugs in vfs.aio.max_aio_queue:
- its name is too long, yet not very descriptive
- its name repeats "aio"
- its name but doesn't say "len", so it looks like it limits a number of
   queues and not the length of a single queue.  This is backwards relative
   to the corresponding variable name.  That is missing "aio" when it
   is needed, but only spells "length" badly as "count".  It is
   max_queue_count, but should be something like aio_max_qlen.  All
   sysctl variables should have an aio_ prefix which is removed in the
   leaf name in the sysctl tree.
Other names have similar or worse bugs.  Bugs start with parameter names
MAX_AIO_* instead of AIO_*_MAX, and variable names track this.  There is
some ambiguity from global vs per-process limits and counts, and the
noise word "num" is sprinkled to get names which differ without describing
any difference.

> Modified: head/sys/kern/vfs_aio.c
> ==============================================================================
> --- head/sys/kern/vfs_aio.c	Tue Aug  8 16:06:16 2017	(r322257)
> +++ head/sys/kern/vfs_aio.c	Tue Aug  8 16:14:31 2017	(r322258)
> @@ -102,6 +102,7 @@ static uint64_t jobseqno;
> #endif
>
> FEATURE(aio, "Asynchronous I/O");
> +SYSCTL_DECL(_p1003_1b);
>
> static MALLOC_DEFINE(M_LIO, "lio", "listio aio control block list");
>
> @@ -168,6 +169,11 @@ static int max_buf_aio = MAX_BUF_AIO;
> SYSCTL_INT(_vfs_aio, OID_AUTO, max_buf_aio, CTLFLAG_RW, &max_buf_aio, 0,
>     "Maximum buf aio requests per process (stored in the process)");
>
> +static int aio_listio_max = AIO_LISTIO_MAX;
> +SYSCTL_INT(_p1003_1b, CTL_P1003_1B_AIO_LISTIO_MAX, aio_listio_max,
> +    CTLFLAG_RDTUN | CTLFLAG_CAPRD, &aio_listio_max, 0,
> +    "Maximum aio requests for a single lio_listio call");
> +

The POSIX namespace is missing all of the bugs described above.  It has
aio first (except, for historical reasons it unfortunately has p10031b
instead of vfs.aio for the sysctl prefix), and max last.

> #ifdef COMPAT_FREEBSD6
> typedef struct oaiocb {
> 	int	aio_fildes;		/* File descriptor */
> @@ -388,6 +394,11 @@ static int
> aio_onceonly(void)
> {
>
> +	if (aio_listio_max < AIO_LISTIO_MAX)
> +		aio_listio_max = AIO_LISTIO_MAX;
> +	if (aio_listio_max > MIN(MAX_AIO_QUEUE_PER_PROC, max_queue_count))
> +		aio_listio_max = MIN(MAX_AIO_QUEUE_PER_PROC, max_queue_count);
> +

max_queue_count is not initialized here, except to a default at compile
time.  Except for module (re)load, it always has the default value
MAX_AIO_QUEUE = 1024
   (this 1024 is bogusly ifdefed.  MAX_AIO_QUEUE is not defined in any
   header file and is not a supported option, and the tunable makes the
   option less needed than before).
MAX_AIO_QUEUE_PER_PROC = 256, modulo a similar unsupported option.
Thus MIN(MAX_AIO_QUEUE_PER_PROC, max_queue_count) is an obfuscated spelling
of 256.

MAX_AIO_QUEUE_PER_PROC is also just the default for a variable
(max_aio_queue_per_proc), so using it here has the same bugs except for
the obfuscation.

Using MIN() instead of imin() is a style bug.

OTOH, AIO_LISTIO_MAX is defined (as 16) in a header file, and is not
bogusly ifdefed.  It is 1 of 3 aio limits specified by POSIX.

The checking would be buggy if it were done later, since max_queue_count
is not checked and limiting to it can corrupt aio_listio_max (when it is
negative or just too small).

The compile-time definition of AIO_LISTIO_MAX seems to be broken.  I think
POSIX species that AIO_LISTIO_MAX shall not be defined if the value of
{AIO_LISTIO_MAX} varies at runtime, but it is still defined.  FreeBSD
has this bug for many other sysconf() variables, e.g., {OPEN_MAX}.
Perhaps AIO_LISTIO_MAX is easier to fix since it is not hard-coded as
often as OPEN_MAX.

AIO_LISTIO_MAX is now just the FreeBSD default, but the first check in
the above restricts {AIO_LISTIO_MAX} to this default (16) instead of to
the POSIX limit of _POSIX_IO_LISTIO_MAX (2).  Relaxing this check would
cause more problems with hard-coded AIO_LISTIO_MAX.

If the second check in the above were done at runtime so that it worked
for changed max_queue_count, then it would reduce aio_listio_max below
the POSIX limit of 2 slightly before it reduces to garbage zero and negative
values.  The first check should be large, or if you want to give a preference
for >= the old limit of 16 and a hard limit of the POSIX limit of 2, check
keep checking the old limit first and the POSIX limit last.

> 	exit_tag = EVENTHANDLER_REGISTER(process_exit, aio_proc_rundown, NULL,
> 	    EVENTHANDLER_PRI_ANY);
> 	exec_tag = EVENTHANDLER_REGISTER(process_exec, aio_proc_rundown_exec,
> @@ -405,14 +416,13 @@ aio_onceonly(void)
> 	    NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
> 	aiocb_zone = uma_zcreate("AIOCB", sizeof(struct kaiocb), NULL, NULL,
> 	    NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
> -	aiol_zone = uma_zcreate("AIOL", AIO_LISTIO_MAX*sizeof(intptr_t) , NULL,
> -	    NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
> +	aiol_zone = uma_zcreate("AIOL", aio_listio_max * sizeof(intptr_t) ,
> +	    NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);

Allocations like this are hard to adjust later (I forget if the size limit
is hard or soft, but it should be hard).  The user must be allowed to set
aio_listio_max to value larger than the default, so as to prepare for
expansion of the other limits later.  However, this doesn't work, since
it leaves the other limits inconsistent until they are expanded.

It seems best to adjust all the other limits based on the aio_listio_max
tunable.  Usually this is not set.  Use the old parameters then, but remove
all the macros and compile-time initializers for them.  Do onceonly
initialization like

 	if (aio_listio_max == 16) {
 		max_queue_count = 1024;	/* historical MAX_AIO_QUEUE */
 		...
 	}

When aio_listio_max != 16, calculate the parameters somehow.  I don't know
how to do this.  At least make them consistent.

After initialization, the other parameters are still restricted by the
choice for aio_listio_max.  Leave it to the user to not choose bad
combinations.

The aio sysctls seem to already break many of the POSIX sysctls.  E.g.,
MAX_AIO_QUEUE is used as the default for the r/w sysctl
vfs.aio.max_aio_queue and as the value for the sysctl r/o sysctl
p1003_1b.aio_max.  Any change to the first sysctl breaks the second
sysctl.

The 3 POSIX (2001) aio sysctls thus have quite different invariance
bugs:
- {AIO_MAX}: AIO_MAX is correctly not defined.  {AIO_MAX} is variable
   by a low-level sysctl but the p1003 sysctl used to report {AIO_MAX}
   doesn't see the change.
- {AIO_PRIO_DELTA_MAX}: no low-level sysctl or variable to break it
   (value is always 0)
- {AIO_LISTIO_MAX}: was constant until this commit.  There is not
   low-level sysctl for it, and the low-level variable for it is
   correctly exported, but AIO_LISTIO_MAX was incorrectly defined
   so it is hard to change now.  {AIO_LISTIO_MAX} is inconsistent
   with the macro whenever the tunable is used to change its default.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170809141608.I1096>