From owner-cvs-all@FreeBSD.ORG Tue Sep 18 13:26:05 2007 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D618C16A46C; Tue, 18 Sep 2007 13:26:05 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx01.syd.optusnet.com.au (fallbackmx01.syd.optusnet.com.au [211.29.132.93]) by mx1.freebsd.org (Postfix) with ESMTP id C2CC113C46A; Tue, 18 Sep 2007 13:26:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by fallbackmx01.syd.optusnet.com.au (8.12.11.20060308/8.12.11) with ESMTP id l8I77Boj001997; Tue, 18 Sep 2007 17:07:11 +1000 Received: from c220-239-235-248.carlnfd3.nsw.optusnet.com.au (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l8I76NZc023746 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Sep 2007 17:06:26 +1000 Date: Tue, 18 Sep 2007 17:06:23 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Peter Wemm In-Reply-To: <200709172155.l8HLtSJ1015353@repoman.freebsd.org> Message-ID: <20070918153428.Q32735@delplex.bde.org> References: <200709172155.l8HLtSJ1015353@repoman.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: cvs-src@freebsd.org, src-committers@freebsd.org, cvs-all@freebsd.org Subject: Re: cvs commit: src/sys/i386/i386 genassym.c src/sys/amd64/amd64 genassym.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2007 13:26:06 -0000 On Mon, 17 Sep 2007, Peter Wemm wrote: > peter 2007-09-17 21:55:28 UTC > > FreeBSD src repository > > Modified files: > sys/i386/i386 genassym.c > sys/amd64/amd64 genassym.c > Log: > Fix an undefined symbol that as/ld neglected to flag as a problem. It > was used in assembler code in such a way that no unresolved relocation > records were generated, so ld didn't flag the problem. You can see > this with an 'nm' of the kernel. There will be 'U MAXCPU' on SMP systems. > > The impact of this is that the intrcount/intrnames arrays do not have > the intended amount of space reserved. This could lead to interesting > problems due to the arrays being present in the middle of kernel code. > An overflow would be rather interesting as executable code would be used > as per-cpu incrementing interrupt counters. I think there was no problem in practice due to the disgustingly bloat in the array sizes since RELENG_4. The arrays normally had 769 slots: 769 = 1 + (256 + 128) * 2, where 256 is the (max) number of non-MSI intrs, 128 is the (max) number of MSI intrs, and 2 is space for expansion. Now they normally have 1025 slots (another 256 for (1 + 7) * MAXCPU). The space allocation used to be about 3K for the counters and 16K for the strings. Allocation is by incrementing the index with no overflow checks. All this when most systems have about 4 active interrupts and about 10 used ones and about 30 allocated ones (e.g., freefall now has only 1 clk, 1 rtc, 1 disk and 1 network interrupt active; sio0 fired about 40000 times; fdc0 and ppp0 fired just once (presumably due to races in the initialization of unused devices), and another 26 interrupts (mainly stray interrupts) are allocated but never fired after 67 days uptime). > This fixes it for now by exporting MAXCPU to the assembler. A better fix > might be to define these data structures in C - they're only referenced > in the kernel from C code these days anyway. The "end" labels cannot be declared in C. I don't know how to declare them even in gcc asm. The end labels are related to the bloat. Userland expects at least the names to be packed into a statically allocated string space, with the size determined by the end pointers. (It uses the labels directly for the dead kernel case, and the sysctls are no more aware of the real end of the used part of the arrays than is userland, so they return the whole arrays then userland has even less chance of avoiding looking at the unused parts of the arrays than in the dead kernel case). Now the strings still form a string space, but the space+time savings from this are turned into space+time losses by padding the strings to the same length. Userland now wastes time skipping the padding and the enormous arrays, and has bugs from not actually understanding the padding. The padding consists of trailing spaces which should normally be skipped before display, but applications like vmstat -i don't understand the padding so they display the whole strings, resulting in ugly formatting. vmstat -i also doesn't expect there to be about 1000 never-used slots, so it always iterates over the whole arrays. This is not so bad for the names because names in the unused slots aren't padded. All this is handled better in RELENG_4. The arrays are much smaller (normally 49 slots for non-SMP (= non APIC)). Allocation is more dynamic. Slots are reused if possible. If the arrays fill up and reuse is impossible, then the problem is actually detected, and a preallocated bit-bucket slot is used so that there is somewhere to put the counts (but under a wrong name). The locking for this is not quite right but would be easy to do right even using fully dynamic allocation. Just lock everything while updating the pointers to the counters, etc. Updating is very rare so a global exclusive-writer/shared/reader lock would work fine, and it doesn't matter if the lock needs to be held for a long time while realloc()ing everything. Realloc()ing is only a problem for the dead kernel API. It should be possible for statically allocated arrays to fill up no matter how large they are, since counts should be per-device and devices are dynamic so there can be any number of them. A malicious driver that changes its name on every reload should cause -current to panic after only about 1000 reloads. Bruce