Date: Wed, 9 Aug 2017 17:05:09 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Alan Somers <asomers@freebsd.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r322258 - head/sys/kern Message-ID: <20170809141608.I1096@besplex.bde.org> In-Reply-To: <201708081614.v78GEVGY066448@repo.freebsd.org> References: <201708081614.v78GEVGY066448@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 8 Aug 2017, Alan Somers wrote: > Log: > Make p1003_1b.aio_listio_max a tunable > > p1003_1b.aio_listio_max is now a tunable. Its value is reflected in the > sysctl of the same name, and the sysconf(3) variable _SC_AIO_LISTIO_MAX. > Its value will be bounded from below by the compile-time constant > AIO_LISTIO_MAX and from above by the compile-time constant > MAX_AIO_QUEUE_PER_PROC and the tunable vfs.aio.max_aio_queue. I don't like tunables or sysctls being limited or silently limiting them. This is not done for more important sysctls like kern.maxfilesperproc. Lots of silent runtime limiting is done for a few more important limits like kern.maxfilesperproc. The limiting is also buggy: - vfs.aio.max_aio_queue isn't a tunable as stated. It is a r/w sysctl - if vfs.aio.max_aio_queue were a r/o tunable, then there would be a solvable ordering problem initializing it before using it - since vfs.aio.max_aio_queue is only a sysctl, it is unusable at boot time (and module load time), but it is used there. Other bugs in vfs.aio.max_aio_queue: - its name is too long, yet not very descriptive - its name repeats "aio" - its name but doesn't say "len", so it looks like it limits a number of queues and not the length of a single queue. This is backwards relative to the corresponding variable name. That is missing "aio" when it is needed, but only spells "length" badly as "count". It is max_queue_count, but should be something like aio_max_qlen. All sysctl variables should have an aio_ prefix which is removed in the leaf name in the sysctl tree. Other names have similar or worse bugs. Bugs start with parameter names MAX_AIO_* instead of AIO_*_MAX, and variable names track this. There is some ambiguity from global vs per-process limits and counts, and the noise word "num" is sprinkled to get names which differ without describing any difference. > Modified: head/sys/kern/vfs_aio.c > ============================================================================== > --- head/sys/kern/vfs_aio.c Tue Aug 8 16:06:16 2017 (r322257) > +++ head/sys/kern/vfs_aio.c Tue Aug 8 16:14:31 2017 (r322258) > @@ -102,6 +102,7 @@ static uint64_t jobseqno; > #endif > > FEATURE(aio, "Asynchronous I/O"); > +SYSCTL_DECL(_p1003_1b); > > static MALLOC_DEFINE(M_LIO, "lio", "listio aio control block list"); > > @@ -168,6 +169,11 @@ static int max_buf_aio = MAX_BUF_AIO; > SYSCTL_INT(_vfs_aio, OID_AUTO, max_buf_aio, CTLFLAG_RW, &max_buf_aio, 0, > "Maximum buf aio requests per process (stored in the process)"); > > +static int aio_listio_max = AIO_LISTIO_MAX; > +SYSCTL_INT(_p1003_1b, CTL_P1003_1B_AIO_LISTIO_MAX, aio_listio_max, > + CTLFLAG_RDTUN | CTLFLAG_CAPRD, &aio_listio_max, 0, > + "Maximum aio requests for a single lio_listio call"); > + The POSIX namespace is missing all of the bugs described above. It has aio first (except, for historical reasons it unfortunately has p10031b instead of vfs.aio for the sysctl prefix), and max last. > #ifdef COMPAT_FREEBSD6 > typedef struct oaiocb { > int aio_fildes; /* File descriptor */ > @@ -388,6 +394,11 @@ static int > aio_onceonly(void) > { > > + if (aio_listio_max < AIO_LISTIO_MAX) > + aio_listio_max = AIO_LISTIO_MAX; > + if (aio_listio_max > MIN(MAX_AIO_QUEUE_PER_PROC, max_queue_count)) > + aio_listio_max = MIN(MAX_AIO_QUEUE_PER_PROC, max_queue_count); > + max_queue_count is not initialized here, except to a default at compile time. Except for module (re)load, it always has the default value MAX_AIO_QUEUE = 1024 (this 1024 is bogusly ifdefed. MAX_AIO_QUEUE is not defined in any header file and is not a supported option, and the tunable makes the option less needed than before). MAX_AIO_QUEUE_PER_PROC = 256, modulo a similar unsupported option. Thus MIN(MAX_AIO_QUEUE_PER_PROC, max_queue_count) is an obfuscated spelling of 256. MAX_AIO_QUEUE_PER_PROC is also just the default for a variable (max_aio_queue_per_proc), so using it here has the same bugs except for the obfuscation. Using MIN() instead of imin() is a style bug. OTOH, AIO_LISTIO_MAX is defined (as 16) in a header file, and is not bogusly ifdefed. It is 1 of 3 aio limits specified by POSIX. The checking would be buggy if it were done later, since max_queue_count is not checked and limiting to it can corrupt aio_listio_max (when it is negative or just too small). The compile-time definition of AIO_LISTIO_MAX seems to be broken. I think POSIX species that AIO_LISTIO_MAX shall not be defined if the value of {AIO_LISTIO_MAX} varies at runtime, but it is still defined. FreeBSD has this bug for many other sysconf() variables, e.g., {OPEN_MAX}. Perhaps AIO_LISTIO_MAX is easier to fix since it is not hard-coded as often as OPEN_MAX. AIO_LISTIO_MAX is now just the FreeBSD default, but the first check in the above restricts {AIO_LISTIO_MAX} to this default (16) instead of to the POSIX limit of _POSIX_IO_LISTIO_MAX (2). Relaxing this check would cause more problems with hard-coded AIO_LISTIO_MAX. If the second check in the above were done at runtime so that it worked for changed max_queue_count, then it would reduce aio_listio_max below the POSIX limit of 2 slightly before it reduces to garbage zero and negative values. The first check should be large, or if you want to give a preference for >= the old limit of 16 and a hard limit of the POSIX limit of 2, check keep checking the old limit first and the POSIX limit last. > exit_tag = EVENTHANDLER_REGISTER(process_exit, aio_proc_rundown, NULL, > EVENTHANDLER_PRI_ANY); > exec_tag = EVENTHANDLER_REGISTER(process_exec, aio_proc_rundown_exec, > @@ -405,14 +416,13 @@ aio_onceonly(void) > NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); > aiocb_zone = uma_zcreate("AIOCB", sizeof(struct kaiocb), NULL, NULL, > NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); > - aiol_zone = uma_zcreate("AIOL", AIO_LISTIO_MAX*sizeof(intptr_t) , NULL, > - NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); > + aiol_zone = uma_zcreate("AIOL", aio_listio_max * sizeof(intptr_t) , > + NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); Allocations like this are hard to adjust later (I forget if the size limit is hard or soft, but it should be hard). The user must be allowed to set aio_listio_max to value larger than the default, so as to prepare for expansion of the other limits later. However, this doesn't work, since it leaves the other limits inconsistent until they are expanded. It seems best to adjust all the other limits based on the aio_listio_max tunable. Usually this is not set. Use the old parameters then, but remove all the macros and compile-time initializers for them. Do onceonly initialization like if (aio_listio_max == 16) { max_queue_count = 1024; /* historical MAX_AIO_QUEUE */ ... } When aio_listio_max != 16, calculate the parameters somehow. I don't know how to do this. At least make them consistent. After initialization, the other parameters are still restricted by the choice for aio_listio_max. Leave it to the user to not choose bad combinations. The aio sysctls seem to already break many of the POSIX sysctls. E.g., MAX_AIO_QUEUE is used as the default for the r/w sysctl vfs.aio.max_aio_queue and as the value for the sysctl r/o sysctl p1003_1b.aio_max. Any change to the first sysctl breaks the second sysctl. The 3 POSIX (2001) aio sysctls thus have quite different invariance bugs: - {AIO_MAX}: AIO_MAX is correctly not defined. {AIO_MAX} is variable by a low-level sysctl but the p1003 sysctl used to report {AIO_MAX} doesn't see the change. - {AIO_PRIO_DELTA_MAX}: no low-level sysctl or variable to break it (value is always 0) - {AIO_LISTIO_MAX}: was constant until this commit. There is not low-level sysctl for it, and the low-level variable for it is correctly exported, but AIO_LISTIO_MAX was incorrectly defined so it is hard to change now. {AIO_LISTIO_MAX} is inconsistent with the macro whenever the tunable is used to change its default. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170809141608.I1096>