From owner-svn-src-all@freebsd.org Thu Aug 24 19:43:50 2017 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 91282DE6DD1; Thu, 24 Aug 2017 19:43:50 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com [IPv6:2a00:1450:400c:c09::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2795963B12; Thu, 24 Aug 2017 19:43:50 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-wm0-x22c.google.com with SMTP id b189so3149022wmd.0; Thu, 24 Aug 2017 12:43:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=y/0fwu3Xl4bqXMxVANB+aKACcKCWJD6hBBkf5WnueSE=; b=vaVlETX9ke4S5s8RBgwrKx8AS2H1K18wziDozIGszTGIQoGNvECKz3yxYLXdlSmNZv Zp8MYBJ4ZVWuK5Kp+h25raz4ZLx5nG627YQ8fSSCgzRXZAveGStDBxAUKAEW1simKcd5 BPfuwXgmO74hAgoXu1rZpMcnsYW2SDdPZK01fz1mwxWm3lemfXa1Toc4IDYs/oD9xIk4 Oa2PIcKYm7XHM1RAxLz4cUlsir+wyi0exHQA0c0d5x6UybPkb3Q3wCqXAvrmg+sa8Ezk xt5CYsRTCCRLEV9cVxQDjKNKihURXCt1Li209j6k8kNVnEX73X8hX8EDnwsjUPthB2oT IJrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=y/0fwu3Xl4bqXMxVANB+aKACcKCWJD6hBBkf5WnueSE=; b=b9qGNf1GQG82aSuvxHewvbpqpGEY1vjKPfrfQyoyAj/exFmVab8ZsDcjx+8p1ybFnz aEJXVN6muh49SqbOPo5kcWfoAzv0dwPio1nHZrvBfi8rxL5UlJ+iJZjb2KGzrLq8xZzP yKxTbxzsAKrFQCibIRAYekdClfDsr2CJf2WRJL1xUbdgik/h0GEyifSZQgOCCCj7KlLA SjnNUuI+Ryru5t10eJUkMa1W1YgDDnpwJoicYPp4s+fb6FI3N5bIpi1YjqNlPeD9+KRM SzAtOyCkhIPy98dSLwnrbdDILyWZL03L0CAKC1/KaPgdqeBmD3sSWlucwLe3bI6swnEB C0Jg== X-Gm-Message-State: AHYfb5hibtHz0I92mbTAVBWofUJFLcJ7RBrHJdJI6OkQWL0HhE6H+Xli lNK7B4raXWtYh4UQLBuZAp2PVEygiF9+ X-Received: by 10.28.98.66 with SMTP id w63mr4343968wmb.34.1503603828264; Thu, 24 Aug 2017 12:43:48 -0700 (PDT) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 10.28.56.194 with HTTP; Thu, 24 Aug 2017 12:43:47 -0700 (PDT) In-Reply-To: <20170809141608.I1096@besplex.bde.org> References: <201708081614.v78GEVGY066448@repo.freebsd.org> <20170809141608.I1096@besplex.bde.org> From: Alan Somers Date: Thu, 24 Aug 2017 13:43:47 -0600 X-Google-Sender-Auth: g3YuGXWyus_al5iZstj16XhjgrA Message-ID: Subject: Re: svn commit: r322258 - head/sys/kern To: Bruce Evans Cc: "src-committers@freebsd.org" , "svn-src-all@freebsd.org" , "svn-src-head@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Aug 2017 19:43:50 -0000 On Wed, Aug 9, 2017 at 1:05 AM, Bruce Evans wrote: > On Tue, 8 Aug 2017, Alan Somers wrote: > >> Log: >> Make p1003_1b.aio_listio_max a tunable >> >> p1003_1b.aio_listio_max is now a tunable. Its value is reflected in the >> sysctl of the same name, and the sysconf(3) variable _SC_AIO_LISTIO_MAX. >> Its value will be bounded from below by the compile-time constant >> AIO_LISTIO_MAX and from above by the compile-time constant >> MAX_AIO_QUEUE_PER_PROC and the tunable vfs.aio.max_aio_queue. > > > I don't like tunables or sysctls being limited or silently limiting them. > This is not done for more important sysctls like kern.maxfilesperproc. > Lots of silent runtime limiting is done for a few more important limits > like kern.maxfilesperproc. > > The limiting is also buggy: > - vfs.aio.max_aio_queue isn't a tunable as stated. It is a r/w sysctl > - if vfs.aio.max_aio_queue were a r/o tunable, then there would be a > solvable ordering problem initializing it before using it > - since vfs.aio.max_aio_queue is only a sysctl, it is unusable at boot > time (and module load time), but it is used there. > > Other bugs in vfs.aio.max_aio_queue: > - its name is too long, yet not very descriptive > - its name repeats "aio" > - its name but doesn't say "len", so it looks like it limits a number of > queues and not the length of a single queue. This is backwards relative > to the corresponding variable name. That is missing "aio" when it > is needed, but only spells "length" badly as "count". It is > max_queue_count, but should be something like aio_max_qlen. All > sysctl variables should have an aio_ prefix which is removed in the > leaf name in the sysctl tree. > Other names have similar or worse bugs. Bugs start with parameter names > MAX_AIO_* instead of AIO_*_MAX, and variable names track this. There is > some ambiguity from global vs per-process limits and counts, and the > noise word "num" is sprinkled to get names which differ without describing > any difference. > >> Modified: head/sys/kern/vfs_aio.c >> >> ============================================================================== >> --- head/sys/kern/vfs_aio.c Tue Aug 8 16:06:16 2017 (r322257) >> +++ head/sys/kern/vfs_aio.c Tue Aug 8 16:14:31 2017 (r322258) >> @@ -102,6 +102,7 @@ static uint64_t jobseqno; >> #endif >> >> FEATURE(aio, "Asynchronous I/O"); >> +SYSCTL_DECL(_p1003_1b); >> >> static MALLOC_DEFINE(M_LIO, "lio", "listio aio control block list"); >> >> @@ -168,6 +169,11 @@ static int max_buf_aio = MAX_BUF_AIO; >> SYSCTL_INT(_vfs_aio, OID_AUTO, max_buf_aio, CTLFLAG_RW, &max_buf_aio, 0, >> "Maximum buf aio requests per process (stored in the process)"); >> >> +static int aio_listio_max = AIO_LISTIO_MAX; >> +SYSCTL_INT(_p1003_1b, CTL_P1003_1B_AIO_LISTIO_MAX, aio_listio_max, >> + CTLFLAG_RDTUN | CTLFLAG_CAPRD, &aio_listio_max, 0, >> + "Maximum aio requests for a single lio_listio call"); >> + > > > The POSIX namespace is missing all of the bugs described above. It has > aio first (except, for historical reasons it unfortunately has p10031b > instead of vfs.aio for the sysctl prefix), and max last. > >> #ifdef COMPAT_FREEBSD6 >> typedef struct oaiocb { >> int aio_fildes; /* File descriptor */ >> @@ -388,6 +394,11 @@ static int >> aio_onceonly(void) >> { >> >> + if (aio_listio_max < AIO_LISTIO_MAX) >> + aio_listio_max = AIO_LISTIO_MAX; >> + if (aio_listio_max > MIN(MAX_AIO_QUEUE_PER_PROC, max_queue_count)) >> + aio_listio_max = MIN(MAX_AIO_QUEUE_PER_PROC, >> max_queue_count); >> + > > > max_queue_count is not initialized here, except to a default at compile > time. Except for module (re)load, it always has the default value > MAX_AIO_QUEUE = 1024 > (this 1024 is bogusly ifdefed. MAX_AIO_QUEUE is not defined in any > header file and is not a supported option, and the tunable makes the > option less needed than before). > MAX_AIO_QUEUE_PER_PROC = 256, modulo a similar unsupported option. > Thus MIN(MAX_AIO_QUEUE_PER_PROC, max_queue_count) is an obfuscated spelling > of 256. > > MAX_AIO_QUEUE_PER_PROC is also just the default for a variable > (max_aio_queue_per_proc), so using it here has the same bugs except for > the obfuscation. > > Using MIN() instead of imin() is a style bug. > > OTOH, AIO_LISTIO_MAX is defined (as 16) in a header file, and is not > bogusly ifdefed. It is 1 of 3 aio limits specified by POSIX. > > The checking would be buggy if it were done later, since max_queue_count > is not checked and limiting to it can corrupt aio_listio_max (when it is > negative or just too small). > > The compile-time definition of AIO_LISTIO_MAX seems to be broken. I think > POSIX species that AIO_LISTIO_MAX shall not be defined if the value of > {AIO_LISTIO_MAX} varies at runtime, but it is still defined. FreeBSD > has this bug for many other sysconf() variables, e.g., {OPEN_MAX}. > Perhaps AIO_LISTIO_MAX is easier to fix since it is not hard-coded as > often as OPEN_MAX. What you describe is Linux's behavior, but the POSIX requirement is a bit more general. All POSIX says is that "The value returned [by sysconf(3)] shall not be more restrictive than the corresponding value described to the application when it was compiled with the implementation's or ". > > AIO_LISTIO_MAX is now just the FreeBSD default, but the first check in > the above restricts {AIO_LISTIO_MAX} to this default (16) instead of to > the POSIX limit of _POSIX_IO_LISTIO_MAX (2). Relaxing this check would > cause more problems with hard-coded AIO_LISTIO_MAX. > > If the second check in the above were done at runtime so that it worked > for changed max_queue_count, then it would reduce aio_listio_max below > the POSIX limit of 2 slightly before it reduces to garbage zero and negative > values. The first check should be large, or if you want to give a > preference > for >= the old limit of 16 and a hard limit of the POSIX limit of 2, check > keep checking the old limit first and the POSIX limit last. > >> exit_tag = EVENTHANDLER_REGISTER(process_exit, aio_proc_rundown, >> NULL, >> EVENTHANDLER_PRI_ANY); >> exec_tag = EVENTHANDLER_REGISTER(process_exec, >> aio_proc_rundown_exec, >> @@ -405,14 +416,13 @@ aio_onceonly(void) >> NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); >> aiocb_zone = uma_zcreate("AIOCB", sizeof(struct kaiocb), NULL, >> NULL, >> NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); >> - aiol_zone = uma_zcreate("AIOL", AIO_LISTIO_MAX*sizeof(intptr_t) , >> NULL, >> - NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); >> + aiol_zone = uma_zcreate("AIOL", aio_listio_max * sizeof(intptr_t) >> , >> + NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); > > > Allocations like this are hard to adjust later (I forget if the size limit > is hard or soft, but it should be hard). The user must be allowed to set > aio_listio_max to value larger than the default, so as to prepare for > expansion of the other limits later. However, this doesn't work, since > it leaves the other limits inconsistent until they are expanded. > > It seems best to adjust all the other limits based on the aio_listio_max > tunable. Usually this is not set. Use the old parameters then, but remove > all the macros and compile-time initializers for them. Do onceonly > initialization like > > if (aio_listio_max == 16) { > max_queue_count = 1024; /* historical MAX_AIO_QUEUE */ > ... > } > > When aio_listio_max != 16, calculate the parameters somehow. I don't know > how to do this. At least make them consistent. > > After initialization, the other parameters are still restricted by the > choice for aio_listio_max. Leave it to the user to not choose bad > combinations. > > The aio sysctls seem to already break many of the POSIX sysctls. E.g., > MAX_AIO_QUEUE is used as the default for the r/w sysctl > vfs.aio.max_aio_queue and as the value for the sysctl r/o sysctl > p1003_1b.aio_max. Any change to the first sysctl breaks the second > sysctl. > > The 3 POSIX (2001) aio sysctls thus have quite different invariance > bugs: > - {AIO_MAX}: AIO_MAX is correctly not defined. {AIO_MAX} is variable > by a low-level sysctl but the p1003 sysctl used to report {AIO_MAX} > doesn't see the change. > - {AIO_PRIO_DELTA_MAX}: no low-level sysctl or variable to break it > (value is always 0) > - {AIO_LISTIO_MAX}: was constant until this commit. There is not > low-level sysctl for it, and the low-level variable for it is > correctly exported, but AIO_LISTIO_MAX was incorrectly defined > so it is hard to change now. {AIO_LISTIO_MAX} is inconsistent > with the macro whenever the tunable is used to change its default. > > Bruce I dug deeper and found that there wasn't any good reason for the aio_listio_max limit to exist in the first place. This DR eliminates it, which I think will satisfy most of your concerns. https://reviews.freebsd.org/D12120 -Alan