Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Nov 2017 00:14:10 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r326073 - head/usr.bin/systat
Message-ID:  <20171124225011.V1289@besplex.bde.org>
In-Reply-To: <20171124105720.GW2272@kib.kiev.ua>
References:  <201711211955.vALJtWhg047906@repo.freebsd.org> <20171122071838.R1172@besplex.bde.org> <20171122103917.GS2272@kib.kiev.ua> <20171123021646.M1933@besplex.bde.org> <20171122220538.GT2272@kib.kiev.ua> <20171123224032.A992@besplex.bde.org> <20171123151849.GU2272@kib.kiev.ua> <20171124184535.E980@besplex.bde.org> <20171124105720.GW2272@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 24 Nov 2017, Konstantin Belousov wrote:

> On Fri, Nov 24, 2017 at 08:15:06PM +1100, Bruce Evans wrote:
>> On Thu, 23 Nov 2017, Konstantin Belousov wrote:
>* ...
>>>> #define	pgtok(p)	((uintmax_t)(p) * pageKilo)
>>> Amusingly there is already MD macro in machine/param.h with the same name
>>> and same intent, but as you formulate it, sloppy implementation.  It uses
>>> unsigned long cast on almost all 64bit arches, except powerpc.  For 32bit
>>> arches, the cast is not done, unfortunately.
>>
>> I already pointed out the system pgtok().
>>
>> It was almost correct to cast to u_int on all arches because the
>> implementation can know the size of all its page counters.  That was int
>> or u_int, and the expansion factor is small.  In practice, 32-bit systems
>> never have enough memory to overflow in K (that happens at 4TB), and
>> 64-bit systems that overflow in K are close to overflowing the page
>> counters (that happens at 16TB with 4K-pages).
>>
>> The pgtok() is just unusable because the page size can vary.
> No, it is unusable only due to the implementation not ensuring the consistent
> output type.

Hmm, I couldn't find any arch with even a compile-time variable PAGE_SIZE.
It is currently just unportable in theory to use hard-coded PAGE_SIZE or
macros that use it.

This might be another leftover from vax days.  getpagesize(2) says:

X      The page size is a system page size and may not be the same as the under-
X      lying hardware page size.
X ...
X      The getpagesize() function appeared in 4.2BSD.

In vax days or even Mach days, PAGE_SIZE might have been the underlying page
size and different from the system page size, so getpagesize() was needed to
provided the latter.  This is sort of backwards.  The system page should be
some small size like 512 that divides the hardware page size for all arches.

Dyson's rewrite might have reversed this.  Anyway, it removed most of the
distinctions between hardware and virtual page sizes.  It still has "clicks"
via the btoc() and other macros, but clicks are conflated with pages of
size PAGE_SIZE.  i386 has i386_btop() which I think is for physical pages,
amd64 has amd646_btop() but never uses it.  It would be better to not
pretend to support "clicks".

POSIX has limits {PAGESIZE} and {PAGE_SIZE} (the latter for XSI).  These
are only runtime-invariant like {OPEN_MAX} was before it supported
setrlimit(), so are hard to use.  BSD utilities like the vmstat have
the same problem with getpagesize() unless its API is changed to
guaratee that it returns PAGE_SIZE and that uis not ifdefed.

The output type isn't a problem.  Consistently uintmax_t would be easier to
use, but would be a pessimization for arches that don't need large sizes.

>> Perhaps it is a design error to allow the page size to vary or be
>> anything except 512 in APIs, just like for disk sector sizes.  Most
>> disk APIs uses units of bytes or 512-blocks.  The physical size may
>> be different.  Applications need to know the physical memory page size
>> even less than they need to know the physical sector size.
> Not everybody share the warm memories about VAX.  I think there were no
> single significant architecture with hardware support for virtual memory,
> after the VAX, which used less than 4K sized pages.

512 still gives good units.  1024 would be even better.

>> Here are my old fixes for this function (to clean it up before removing it):
>
> I picked some of this, mainly I do not want to change the output format.
> I am sure that there are scripts around which parse it.

It is hard to parse.

Another bug here is that the vmmeter 'v' values are very easy to parse by
reading them 1 at a time using sysctl -n, but there are no individual
sysctls for the vmtotal 't' values.  Sysctl could usefully expand the
struct as fake integer sysctls (1 per line), but instead prints it
ornately.

I just noticed another bug: sysctl -x is documented to give a raw hex
dump, but for vm.vmtotal it still gives the ornate output.  All sysctls
with special formatting and all strings generated by the kernel seem to
have this bug.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171124225011.V1289>