From owner-svn-src-all@FreeBSD.ORG Mon Jun 15 00:16:14 2009 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 11D7B106566C; Mon, 15 Jun 2009 00:16:14 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 894708FC0C; Mon, 15 Jun 2009 00:16:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-106-159-184.carlnfd1.nsw.optusnet.com.au (c122-106-159-184.carlnfd1.nsw.optusnet.com.au [122.106.159.184]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n5F0G9lb026850 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 15 Jun 2009 10:16:10 +1000 Date: Mon, 15 Jun 2009 10:16:09 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Sam Leffler In-Reply-To: <4A356A0F.3050800@freebsd.org> Message-ID: <20090615075134.K24645@delplex.bde.org> References: <200906141801.n5EI1Zti056239@svn.freebsd.org> <4A356A0F.3050800@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Ed Schouten Subject: Re: svn commit: r194204 - in head/sys: amd64/conf i386/conf X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2009 00:16:14 -0000 On Sun, 14 Jun 2009, Sam Leffler wrote: >> Log: >> Enable PRINTF_BUFR_SIZE on i386 and amd64 by default. >> >> In the past there have been some reports of PRINTF_BUFR_SIZE not >> functioning correctly. Instead of having garbled console messages, we >> should just see whether the issues are still there and analyze them. > > When I suggested enabling this by default for SMP configurations I was > reminded it wasn't enabled because of potential stack overflow. Stack overflow shouldn't be a problem, since printf shouldn't be called recursively. However, recursive calls are possible from panic(), and recursive calls are handled badly if they happen. E.g., PRINTF_BUFR_SIZE gives unlimited stack growth, and the locking in cnputs() may give deadlock. Deadlock should occur if cnputs() is preempted and the new thread panics, since panicing should prevent the preempted thread from running (and the system state may be so messed up that the preempted thread cannot run anyway). I wouldn't use an implementation anything like FreeBSD's. printf()s must be serialized, and once printf() is serialized there is no need for buffering or cnputs(). Put the locking in kvprintf() only. To avoid deadlock, the locking must not be strict. Either allow it to be bypassed (needed for calls from db_printf() and panic()), or automatically bypass it after a timeout, or automatically abort the printf() after a timeout. I use the latter to fix related bugs in kernel profiling (for some arches, MCOUNT_ENTER() deadlocks on NMIs and ddb traps if trap occurs while the lock is held, and for other arches MCOUNT_ENTER() is missing locking). For printf(), an automatic bypass after a timeout seems best -- in the worst case, it just gives interleaved messages like now, but that should rarely happen; the timeout can be very long for a CPU but short enough on a human scale for important messages to not be blocked for significantly long by unimportant messages or locking bugs. Here is my locking for kernel profiling (not showing simplications in MCOUNT_ENTER/EXIT()): % Index: mcount.c % =================================================================== % RCS file: /home/ncvs/src/sys/libkern/mcount.c,v % retrieving revision 1.24 % diff -u -2 -r1.24 mcount.c % --- mcount.c 13 Jun 2007 06:17:48 -0000 1.24 % +++ mcount.c 13 Jun 2007 06:18:33 -0000 % @@ -34,7 +34,5 @@ % #include % #ifdef _KERNEL % -#ifndef GUPROF % #include % -#endif % #include % #include % @@ -43,4 +41,61 @@ % % /* % + * Locking strategy: use a simple spin mutex, but don't wait very long % + * for it. When we don't acquire it, just give up. We avoid using a % + * normal mutex since normal mutex code is profiled and might be % + * inefficent. Not waiting very long avoids deadlock. Our callers % + * should have used the usual anti-deadlock strategy of disabling % + * interrupts on the current CPU, but that is especially insufficient % + * for us since it doesn't affect debugger traps and NMIs, and at least % + * the debugger code is profiled. % + * % + * Note: the atomic functions used here, like all functions used in this % + * file, must be compiled without profiling. This normally happens for % + * the atomic functions because they are inline. % + */ % + % +static __inline int % +mcount_trylock(struct gmonparam *p, int want_both) % +{ % +#ifdef SMP % + int timeout; % + % + timeout = 1000; /* XXX ~= 10 times #statements in mcount(). */ % + do { % + /* XXX having 2 "on" states complicates things. */ % +#ifdef GUPROF % + if (atomic_cmpset_acq_int(&p->state, GMON_PROF_HIRES, % + GMON_PROF_BUSY)) % + return (GMON_PROF_HIRES); % +#endif % + if (want_both && atomic_cmpset_acq_int(&p->state, GMON_PROF_ON, % + GMON_PROF_BUSY)) % + return (GMON_PROF_ON); % + } while (--timeout != 0); % +#else /* !SMP */ % + /* Ugliness for efficiency. */ % +#ifdef GUPROF % + if (p->state == GMON_PROF_HIRES) { % + /* XXX want acq. */ % + atomic_store_rel_int(&p->state, GMON_PROF_BUSY); % + return (GMON_PROF_HIRES); % + } % +#endif % + if (want_both && p->state == GMON_PROF_ON) { % + atomic_store_rel_int(&p->state, GMON_PROF_BUSY); % + return (GMON_PROF_ON); % + } % +#endif /* SMP */ % + return (GMON_PROF_BUSY); /* Might actually be OFF or ERROR. */ % +} % + % +static __inline void % +mcount_unlock(struct gmonparam *p, int ostate) % +{ % + % + atomic_store_rel_int(&p->state, ostate); % +} % + % +/* % * mcount is called on entry to each function compiled with the profiling % * switch set. _mcount(), which is declared in a machine-dependent way % @@ -68,4 +123,5 @@ % struct gmonparam *p; % long toindex; % + int ostate; % #ifdef _KERNEL % MCOUNT_DECL(s) % @@ -73,17 +129,17 @@ % % p = &_gmonparam; % -#ifndef GUPROF /* XXX */ % +#ifndef GUPROF % /* % - * check that we are profiling % - * and that we aren't recursively invoked. % + * Quick check that we are profiling. In caller for GUPROF, and % + * should always be there. % */ % - if (p->state != GMON_PROF_ON) % + if (p->state == GMON_PROF_OFF) % return; % #endif % #ifdef _KERNEL % MCOUNT_ENTER(s); % -#else % - p->state = GMON_PROF_BUSY; % #endif % + if ((ostate = mcount_trylock(p, 1)) == GMON_PROF_BUSY) % + return; % % #ifdef _KERNEL % @@ -101,5 +157,5 @@ % % #ifdef GUPROF % - if (p->state == GMON_PROF_HIRES) { % + if (ostate == GMON_PROF_HIRES) { % /* % * Count the time since cputime() was previously called % @@ -224,12 +281,11 @@ % } % done: % + mcount_unlock(p, ostate); % #ifdef _KERNEL % MCOUNT_EXIT(s); % -#else % - p->state = GMON_PROF_ON; % #endif % return; % overflow: % - p->state = GMON_PROF_ERROR; % + mcount_unlock(p, GMON_PROF_ERROR); % #ifdef _KERNEL % MCOUNT_EXIT(s); % @@ -251,6 +307,9 @@ % struct gmonparam *p; % uintfptr_t selfpcdiff; % + int ostate; % % p = &_gmonparam; % + if ((ostate = mcount_trylock(p, 0)) == GMON_PROF_BUSY) % + return; % selfpcdiff = selfpc - (uintfptr_t)p->lowpc; % if (selfpcdiff < p->textsize) { % @@ -267,4 +326,5 @@ % *p->mexitcount_count += p->mexitcount_overhead; % } % + mcount_unlock(p, ostate); % } % The above makes mcount()'s kernel locking strategy very similar to its userland one -- try to obtain the lock, and if it cannot be obtained then do nothing and just lose some statistics. In userland, recursive calls to mcount() are possible for signal handlers and for threads. For signal handlers, waiting forever could deadlock and the old code avoids that using a simple state flag; it assumes atomic accesses to p->state. The threaded case was mostly broken -- it won't have the necessary atomic accesses in the SMP case, and contention on mcount() and consequent lost statistics is much more common for threads. The above is supposed to fix the locking in userland in the same way as in the kernel using the same trylock function, but hasn't been tested in userland. In the kernel, there is enormous contention on the lock iff multiple threads are in the kernel. mcount() needs to have per-CPU context to run fast enough in this case. For printf(), things are much simpler because results are only needed on a human time scale, and printf()'s internals are reentrant enough. Instead of: if ((ostate = mcount_trylock(p, 0)) == GMON_PROF_BUSY) return; ... mcount_unlock(p, ostate); kvprintf() can just do: printf_trylock(); /* * Continue (after a timeout) even if we didn't acquire the lock * exclusively, so that printf() is guaranteed to complete * eventually. Use a global in-use counter instead of p->state * and ostate above. */ ... printf_unlock(); Normal mutex locking should not be used here. (For profiling, it cannot be used, since mutex code is profiled). mtx_lock_spin() is more unusuable than I remembered, for the above and where it is now used in cnputs(). It does recursion inline even when MTX_RECURSE is not configured for the mutex. _mtx_lock_spin_flags() checks for recursion on non-recursive mutexes but won't be called in the usual case where the recursion is done inline. This results in erroneous recursive calls to cnputs() not deadlocking like they should; instead the locking just doesn't work. The effectiveness of the automatic bypass of printf()'s locking after a timeout can be tested easily by debugging printf() and/or its locking code using ddb without using a special bypass for ddb's printf()s. Once the top level printf() acquires the lock, ddb's printf()s should become quite slow since they have to wait for the lock, but they should still occur (interleaved with the top-level i/o, which should be messed up further by ddb's printfs interfering with device accesses for the top-level i/o when you step through the lowest levels), and they might even be useable if the timeout is short enough. The timeout needs to be configured better than by `timeout = 1000; /* XXX */' Use something like DELAY(1000) tested 1000 times. Use a timeout of 0 for ddb once the above tests are done. Bruce