Date: Thu, 24 Aug 2017 13:43:47 -0600 From: Alan Somers <asomers@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: "src-committers@freebsd.org" <src-committers@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org> Subject: Re: svn commit: r322258 - head/sys/kern Message-ID: <CAOtMX2jS2SdN9mgbu0E_9WytCMFYUeZ6RPg=EGcAFe%2BBj17=TA@mail.gmail.com> In-Reply-To: <20170809141608.I1096@besplex.bde.org> References: <201708081614.v78GEVGY066448@repo.freebsd.org> <20170809141608.I1096@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 9, 2017 at 1:05 AM, Bruce Evans <brde@optusnet.com.au> wrote: > On Tue, 8 Aug 2017, Alan Somers wrote: > >> Log: >> Make p1003_1b.aio_listio_max a tunable >> >> p1003_1b.aio_listio_max is now a tunable. Its value is reflected in the >> sysctl of the same name, and the sysconf(3) variable _SC_AIO_LISTIO_MAX. >> Its value will be bounded from below by the compile-time constant >> AIO_LISTIO_MAX and from above by the compile-time constant >> MAX_AIO_QUEUE_PER_PROC and the tunable vfs.aio.max_aio_queue. > > > I don't like tunables or sysctls being limited or silently limiting them. > This is not done for more important sysctls like kern.maxfilesperproc. > Lots of silent runtime limiting is done for a few more important limits > like kern.maxfilesperproc. > > The limiting is also buggy: > - vfs.aio.max_aio_queue isn't a tunable as stated. It is a r/w sysctl > - if vfs.aio.max_aio_queue were a r/o tunable, then there would be a > solvable ordering problem initializing it before using it > - since vfs.aio.max_aio_queue is only a sysctl, it is unusable at boot > time (and module load time), but it is used there. > > Other bugs in vfs.aio.max_aio_queue: > - its name is too long, yet not very descriptive > - its name repeats "aio" > - its name but doesn't say "len", so it looks like it limits a number of > queues and not the length of a single queue. This is backwards relative > to the corresponding variable name. That is missing "aio" when it > is needed, but only spells "length" badly as "count". It is > max_queue_count, but should be something like aio_max_qlen. All > sysctl variables should have an aio_ prefix which is removed in the > leaf name in the sysctl tree. > Other names have similar or worse bugs. Bugs start with parameter names > MAX_AIO_* instead of AIO_*_MAX, and variable names track this. There is > some ambiguity from global vs per-process limits and counts, and the > noise word "num" is sprinkled to get names which differ without describing > any difference. > >> Modified: head/sys/kern/vfs_aio.c >> >> ============================================================================== >> --- head/sys/kern/vfs_aio.c Tue Aug 8 16:06:16 2017 (r322257) >> +++ head/sys/kern/vfs_aio.c Tue Aug 8 16:14:31 2017 (r322258) >> @@ -102,6 +102,7 @@ static uint64_t jobseqno; >> #endif >> >> FEATURE(aio, "Asynchronous I/O"); >> +SYSCTL_DECL(_p1003_1b); >> >> static MALLOC_DEFINE(M_LIO, "lio", "listio aio control block list"); >> >> @@ -168,6 +169,11 @@ static int max_buf_aio = MAX_BUF_AIO; >> SYSCTL_INT(_vfs_aio, OID_AUTO, max_buf_aio, CTLFLAG_RW, &max_buf_aio, 0, >> "Maximum buf aio requests per process (stored in the process)"); >> >> +static int aio_listio_max = AIO_LISTIO_MAX; >> +SYSCTL_INT(_p1003_1b, CTL_P1003_1B_AIO_LISTIO_MAX, aio_listio_max, >> + CTLFLAG_RDTUN | CTLFLAG_CAPRD, &aio_listio_max, 0, >> + "Maximum aio requests for a single lio_listio call"); >> + > > > The POSIX namespace is missing all of the bugs described above. It has > aio first (except, for historical reasons it unfortunately has p10031b > instead of vfs.aio for the sysctl prefix), and max last. > >> #ifdef COMPAT_FREEBSD6 >> typedef struct oaiocb { >> int aio_fildes; /* File descriptor */ >> @@ -388,6 +394,11 @@ static int >> aio_onceonly(void) >> { >> >> + if (aio_listio_max < AIO_LISTIO_MAX) >> + aio_listio_max = AIO_LISTIO_MAX; >> + if (aio_listio_max > MIN(MAX_AIO_QUEUE_PER_PROC, max_queue_count)) >> + aio_listio_max = MIN(MAX_AIO_QUEUE_PER_PROC, >> max_queue_count); >> + > > > max_queue_count is not initialized here, except to a default at compile > time. Except for module (re)load, it always has the default value > MAX_AIO_QUEUE = 1024 > (this 1024 is bogusly ifdefed. MAX_AIO_QUEUE is not defined in any > header file and is not a supported option, and the tunable makes the > option less needed than before). > MAX_AIO_QUEUE_PER_PROC = 256, modulo a similar unsupported option. > Thus MIN(MAX_AIO_QUEUE_PER_PROC, max_queue_count) is an obfuscated spelling > of 256. > > MAX_AIO_QUEUE_PER_PROC is also just the default for a variable > (max_aio_queue_per_proc), so using it here has the same bugs except for > the obfuscation. > > Using MIN() instead of imin() is a style bug. > > OTOH, AIO_LISTIO_MAX is defined (as 16) in a header file, and is not > bogusly ifdefed. It is 1 of 3 aio limits specified by POSIX. > > The checking would be buggy if it were done later, since max_queue_count > is not checked and limiting to it can corrupt aio_listio_max (when it is > negative or just too small). > > The compile-time definition of AIO_LISTIO_MAX seems to be broken. I think > POSIX species that AIO_LISTIO_MAX shall not be defined if the value of > {AIO_LISTIO_MAX} varies at runtime, but it is still defined. FreeBSD > has this bug for many other sysconf() variables, e.g., {OPEN_MAX}. > Perhaps AIO_LISTIO_MAX is easier to fix since it is not hard-coded as > often as OPEN_MAX. What you describe is Linux's behavior, but the POSIX requirement is a bit more general. All POSIX says is that "The value returned [by sysconf(3)] shall not be more restrictive than the corresponding value described to the application when it was compiled with the implementation's <limits.h> or <unistd.h>". > > AIO_LISTIO_MAX is now just the FreeBSD default, but the first check in > the above restricts {AIO_LISTIO_MAX} to this default (16) instead of to > the POSIX limit of _POSIX_IO_LISTIO_MAX (2). Relaxing this check would > cause more problems with hard-coded AIO_LISTIO_MAX. > > If the second check in the above were done at runtime so that it worked > for changed max_queue_count, then it would reduce aio_listio_max below > the POSIX limit of 2 slightly before it reduces to garbage zero and negative > values. The first check should be large, or if you want to give a > preference > for >= the old limit of 16 and a hard limit of the POSIX limit of 2, check > keep checking the old limit first and the POSIX limit last. > >> exit_tag = EVENTHANDLER_REGISTER(process_exit, aio_proc_rundown, >> NULL, >> EVENTHANDLER_PRI_ANY); >> exec_tag = EVENTHANDLER_REGISTER(process_exec, >> aio_proc_rundown_exec, >> @@ -405,14 +416,13 @@ aio_onceonly(void) >> NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); >> aiocb_zone = uma_zcreate("AIOCB", sizeof(struct kaiocb), NULL, >> NULL, >> NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); >> - aiol_zone = uma_zcreate("AIOL", AIO_LISTIO_MAX*sizeof(intptr_t) , >> NULL, >> - NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); >> + aiol_zone = uma_zcreate("AIOL", aio_listio_max * sizeof(intptr_t) >> , >> + NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); > > > Allocations like this are hard to adjust later (I forget if the size limit > is hard or soft, but it should be hard). The user must be allowed to set > aio_listio_max to value larger than the default, so as to prepare for > expansion of the other limits later. However, this doesn't work, since > it leaves the other limits inconsistent until they are expanded. > > It seems best to adjust all the other limits based on the aio_listio_max > tunable. Usually this is not set. Use the old parameters then, but remove > all the macros and compile-time initializers for them. Do onceonly > initialization like > > if (aio_listio_max == 16) { > max_queue_count = 1024; /* historical MAX_AIO_QUEUE */ > ... > } > > When aio_listio_max != 16, calculate the parameters somehow. I don't know > how to do this. At least make them consistent. > > After initialization, the other parameters are still restricted by the > choice for aio_listio_max. Leave it to the user to not choose bad > combinations. > > The aio sysctls seem to already break many of the POSIX sysctls. E.g., > MAX_AIO_QUEUE is used as the default for the r/w sysctl > vfs.aio.max_aio_queue and as the value for the sysctl r/o sysctl > p1003_1b.aio_max. Any change to the first sysctl breaks the second > sysctl. > > The 3 POSIX (2001) aio sysctls thus have quite different invariance > bugs: > - {AIO_MAX}: AIO_MAX is correctly not defined. {AIO_MAX} is variable > by a low-level sysctl but the p1003 sysctl used to report {AIO_MAX} > doesn't see the change. > - {AIO_PRIO_DELTA_MAX}: no low-level sysctl or variable to break it > (value is always 0) > - {AIO_LISTIO_MAX}: was constant until this commit. There is not > low-level sysctl for it, and the low-level variable for it is > correctly exported, but AIO_LISTIO_MAX was incorrectly defined > so it is hard to change now. {AIO_LISTIO_MAX} is inconsistent > with the macro whenever the tunable is used to change its default. > > Bruce I dug deeper and found that there wasn't any good reason for the aio_listio_max limit to exist in the first place. This DR eliminates it, which I think will satisfy most of your concerns. https://reviews.freebsd.org/D12120 -Alan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2jS2SdN9mgbu0E_9WytCMFYUeZ6RPg=EGcAFe%2BBj17=TA>