Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Sep 2007 17:06:23 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Peter Wemm <peter@freebsd.org>
Cc:        cvs-src@freebsd.org, src-committers@freebsd.org, cvs-all@freebsd.org
Subject:   Re: cvs commit: src/sys/i386/i386 genassym.c src/sys/amd64/amd64  genassym.c
Message-ID:  <20070918153428.Q32735@delplex.bde.org>
In-Reply-To: <200709172155.l8HLtSJ1015353@repoman.freebsd.org>
References:  <200709172155.l8HLtSJ1015353@repoman.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 17 Sep 2007, Peter Wemm wrote:

> peter       2007-09-17 21:55:28 UTC
>
>  FreeBSD src repository
>
>  Modified files:
>    sys/i386/i386        genassym.c
>    sys/amd64/amd64      genassym.c
>  Log:
>  Fix an undefined symbol that as/ld neglected to flag as a problem.  It
>  was used in assembler code in such a way that no unresolved relocation
>  records were generated, so ld didn't flag the problem.   You can see
>  this with an 'nm' of the kernel.  There will be 'U MAXCPU' on SMP systems.
>
>  The impact of this is that the intrcount/intrnames arrays do not have
>  the intended amount of space reserved.  This could lead to interesting
>  problems due to the arrays being present in the middle of kernel code.
>  An overflow would be rather interesting as executable code would be used
>  as per-cpu incrementing interrupt counters.

I think there was no problem in practice due to the disgustingly bloat in
the array sizes since RELENG_4.  The arrays normally had 769 slots: 769
= 1 + (256 + 128) * 2, where 256 is the (max) number of non-MSI intrs,
128 is the (max) number of MSI intrs, and 2 is space for expansion.  Now
they normally have 1025 slots (another 256 for (1 + 7) * MAXCPU).  The
space allocation used to be about 3K for the counters and 16K for the
strings.  Allocation is by incrementing the index with no overflow checks.

All this when most systems have about 4 active interrupts and about
10 used ones and about 30 allocated ones (e.g., freefall now has only
1 clk, 1 rtc, 1 disk and 1 network interrupt active; sio0 fired about
40000 times; fdc0 and ppp0 fired just once (presumably due to races
in the initialization of unused devices), and another 26 interrupts
(mainly stray interrupts) are allocated but never fired after 67 days
uptime).

>  This fixes it for now by exporting MAXCPU to the assembler.  A better fix
>  might be to define these data structures in C - they're only referenced
>  in the kernel from C code these days anyway.

The "end" labels cannot be declared in C.  I don't know how to declare
them even in gcc asm.

The end labels are related to the bloat.  Userland expects at least
the names to be packed into a statically allocated string space, with
the size determined by the end pointers.  (It uses the labels directly
for the dead kernel case, and the sysctls are no more aware of the
real end of the used part of the arrays than is userland, so they
return the whole arrays then userland has even less chance of avoiding
looking at the unused parts of the arrays than in the dead kernel
case).

Now the strings still form a string space, but the space+time savings
from this are turned into space+time losses by padding the strings to
the same length.  Userland now wastes time skipping the padding and the
enormous arrays, and has bugs from not actually understanding the
padding.  The padding consists of trailing spaces which should normally
be skipped before display, but applications like vmstat -i don't
understand the padding so they display the whole strings, resulting
in ugly formatting.  vmstat -i also doesn't expect there to be about
1000 never-used slots, so it always iterates over the whole arrays.
This is not so bad for the names because names in the unused slots
aren't padded.

All this is handled better in RELENG_4.  The arrays are much smaller
(normally 49 slots for non-SMP (= non APIC)).  Allocation is more
dynamic.  Slots are reused if possible.  If the arrays fill up and
reuse is impossible, then the problem is actually detected, and a
preallocated bit-bucket slot is used so that there is somewhere to put
the counts (but under a wrong name).  The locking for this is not quite
right but would be easy to do right even using fully dynamic allocation.
Just lock everything while updating the pointers to the counters, etc.
Updating is very rare so a global exclusive-writer/shared/reader lock
would work fine, and it doesn't matter if the lock needs to be held
for a long time while realloc()ing everything.  Realloc()ing is only
a problem for the dead kernel API.

It should be possible for statically allocated arrays to fill up no
matter how large they are, since counts should be per-device and devices
are dynamic so there can be any number of them.  A malicious driver that
changes its name on every reload should cause -current to panic after
only about 1000 reloads.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070918153428.Q32735>