Date: Sun, 14 Jul 2002 21:35:37 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Don Lewis <dl-freebsd@catspoiler.org> Cc: arch@FreeBSD.ORG Subject: Re: wiring the sysctl output buffer Message-ID: <20020714201020.F35894-100000@gamplex.bde.org> In-Reply-To: <200207140916.g6E9Ghwr020552@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 14 Jul 2002, Don Lewis wrote: > A number of places in the kernel call SYSCTL_OUT() while holding locks. > The problem is that SYSCTL_OUT() can block while it wires down the "old" > buffer in user space. If the data is small, such as an integer value, This seems to have been broken by FreeBSD. In rev.1.1, vslock() was called at the start of the sysctl, so the only problem with it was its pessimalness (*). Now I think the vslock() is just a pessimization, since the point of doing it was to avoid blocking during the copyout(), but blocking during vslock() is equivalent. Also, vslock() is insufficent if the kernel is preemptible. > the best solution may be to copy the data and defer calling SYSCTL_OUT() > until after the locks are released. This extra copy could be wasteful > if the data is large, and it may be cumbersome if the data size is not > known ahead of time, since determining the data size may be expensive. The extra copy is unlikely to be more wasteful than vslock()'ing, since vfslock() is a very expensive operation. Some instruction counts on a UP i386 kernel with no INVARIANTS or WITNESS: (*) vslock(): about 2700 instructions (for a length of just 16) vsunlock(): about 1400 useracc(): about 400 `oldlen' is known ahead of time, so a large enough buffer could be allocated in most cases. Perhaps vslock() iff oldlen is very large. The data could still change while it is being copied to a malloc()'ed buffer (or earlier) if the kernel is preemptible. > The data size could also potentially change between the time the size is > calculatated and the time when the data is copied to the temporary > buffer if the locks must be released so that MALLOC() can be called to > allocate the buffer. Hopefully the sysctl locking is enough to prevent at least *req changing underneath us. > I think the best solution to this problem is to add a sysctl system API > to prewire the output buffer that can be called before grabbing the > locks. Doing so allows the existing code to operate properly with only > minimal changes. > Here's a patch that implements this new API and illustrates how it can > be used to fix the kern.function_list sysctl, which wants to call > SYSCTL_OUT() multiple times while walking a locked data structure. I > think this is the best fix, though I'd like some feedback on whether > this is the best API. I think the callers of SYSCTL_OUT() don't need a new API, but they should supply the data in a form suitable for copying out directly. This probably involves them making a copy while holding suitable domain-specific locks. They can't just point to an active kernel struct since it might change. I would just make a copy at a high level for now. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020714201020.F35894-100000>