FreeBSD Mail Archives

Date:      Sun, 14 Jul 2002 21:35:37 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Don Lewis <dl-freebsd@catspoiler.org>
Cc:        arch@FreeBSD.ORG
Subject:   Re: wiring the sysctl output buffer
Message-ID:  <20020714201020.F35894-100000@gamplex.bde.org>
In-Reply-To: <200207140916.g6E9Ghwr020552@gw.catspoiler.org>

On Sun, 14 Jul 2002, Don Lewis wrote:

> A number of places in the kernel call SYSCTL_OUT() while holding locks.
> The problem is that SYSCTL_OUT() can block while it wires down the "old"
> buffer in user space.  If the data is small, such as an integer value,

This seems to have been broken by FreeBSD.  In rev.1.1, vslock() was
called at the start of the sysctl, so the only problem with it was its
pessimalness (*).  Now I think the vslock() is just a pessimization,
since the point of doing it was to avoid blocking during the copyout(),
but blocking during vslock() is equivalent.  Also, vslock() is insufficent
if the kernel is preemptible.

> the best solution may be to copy the data and defer calling SYSCTL_OUT()
> until after the locks are released.  This extra copy could be wasteful
> if the data is large, and it may be cumbersome if the data size is not
> known ahead of time, since determining the data size may be expensive.

The extra copy is unlikely to be more wasteful than vslock()'ing, since
vfslock() is a very expensive operation.  Some instruction counts on
a UP i386 kernel with no INVARIANTS or WITNESS:

(*) vslock():   about 2700 instructions (for a length of just 16)
    vsunlock(): about 1400
    useracc():  about  400

`oldlen' is known ahead of time, so a large enough buffer could be
allocated in most cases.  Perhaps vslock() iff oldlen is very large.
The data could still change while it is being copied to a malloc()'ed
buffer (or earlier) if the kernel is preemptible.

> The data size could also potentially change between the time the size is
> calculatated and the time when the data is copied to the temporary
> buffer if the locks must be released so that MALLOC() can be called to
> allocate the buffer.

Hopefully the sysctl locking is enough to prevent at least *req changing
underneath us.

> I think the best solution to this problem is to add a sysctl system API
> to prewire the output buffer that can be called before grabbing the
> locks.  Doing so allows the existing code to operate properly with only
> minimal changes.

> Here's a patch that implements this new API and illustrates how it can
> be used to fix the kern.function_list sysctl, which wants to call
> SYSCTL_OUT() multiple times while walking a locked data structure.  I
> think this is the best fix, though I'd like some feedback on whether
> this is the best API.

I think the callers of SYSCTL_OUT() don't need a new API, but they
should supply the data in a form suitable for copying out directly.
This probably involves them making a copy while holding suitable
domain-specific locks.  They can't just point to an active kernel
struct since it might change.

I would just make a copy at a high level for now.

Bruce

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020714201020.F35894-100000>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation