Date: Fri, 3 Jun 2011 15:11:07 -0600 From: "Kenneth D. Merry" <ken@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Andriy Gapon <avg@freebsd.org> Subject: Re: svn commit: r222537 - in head/sys: kern sys Message-ID: <20110603211107.GA91421@nargothrond.kdm.org> In-Reply-To: <20110603204653.B1551@besplex.bde.org> References: <201105311729.p4VHTwrZ033296@svn.freebsd.org> <4DE5D72A.1020405@FreeBSD.org> <20110601193030.A1275@besplex.bde.org> <20110602213611.GA47880@nargothrond.kdm.org> <20110603204653.B1551@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 03, 2011 at 20:47:00 +1000, Bruce Evans wrote: > On Thu, 2 Jun 2011, Kenneth D. Merry wrote: > > >On Wed, Jun 01, 2011 at 20:07:31 +1000, Bruce Evans wrote: > >>On Wed, 1 Jun 2011, Andriy Gapon wrote: > >> > >>>on 31/05/2011 20:29 Kenneth D. Merry said the following: > >>>>+ mtx_init(&mbp->msg_lock, "msgbuf", NULL, MTX_SPIN); > >>> > >>>Sorry that I didn't gather myself together for a review before this > >>>change > >>>got > >>>actually committed. > >>>Do you see any reason not to make this spinlock recursive? > > It does need to be recursive to avoid deadlock (as I described), but this > prevents it actually working as a lock. I think the new code needs it to > work as a lock. Yes, it does. > >>I see reasons why it must not exist. The message buffer code cannot use > >>any normal locking, and was carefully written to avoid doing so. > >> > >>>I am a little bit worried about "exotic" situations like receiving an NMI > >>>in the > >>>middle of printing and wanting to print in the NMI context, or similar > >>>things > >>>that penetrate contexts with disabled interrupts - e.g. Machine Check > >>>Exception. > >>>Also it's not clear to me if there won't any bigger damage in the > >>>situations > >>>like those described above. > >>> > >>>P.S. I have been thinking about fixing the problem in a different > >>>fashion, > >>>via > >>>reserving portions of dmesg buffer for a whole message using CAS: > >>>http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html > >> > >>Something like this might work. > >> > >>PRINTF_BUFR size should not exist either, especially now that it is > >>much more complicated and broken. Here is some of my old mail about > >>adding (necessarily not normal) locking to to printf. No one replied, > >>so I never finished this :-(. > > > >That particular solution doesn't lock the normal kernel printf path, and so > >won't fix what we're trying to fix. (We've got lots of disks in each > >system, and when there are various SES events and disks arriving and > >departing, there are multiple kernel contexts doing printfs simultaneously, > >and that results in scrambled dmesg output.) > > But it does. PRINTF_BUFR_SIZE (at least used to) only be applied to > vprintf(), but my printf_lock() is applied to all calls to kvprintf() > that set TOLOG. (If I missed some, this takes 2 lines each to fix.) > The calls in tprintf() and log() have comments saying that this is > intentional "to serialize msgbuf", though I fear this might not be safe > for at least tprintf(), since it may block unboundedly. But even when > it blocks unboundedly, nothing really bad happens: the output just > becomes very slow due to everything waiting for the timeout until the > blockage clears, and may be interleaved. Recursion gives similar > behaviour, except it takes unrecursing to clear the blockage. I > consider the very slow output to be a feature. Unbounded blockages > and recursion are bugs and the slow output makes them more obvious. Ahh. It wasn't 100% clear from looking at the patch which calls had the lock applied. In this case at least, it's skipping the lock when the priority is -1. Is this tprintf()? % /* % * Warn that a system table is full. % @@ -198,5 +221,9 @@ % pca.flags = flags; % va_start(ap, fmt); % + if (pri != -1) % + printf_lock(); /* To serialize msgbuf. XXX: long-held lock? */ % kvprintf(fmt, putchar, &pca, 10, ap); % + if (pri != -1) % + printf_unlock(); % va_end(ap); % if (sess != NULL) % > >I think the goals should be: > > > >- console output, syslog output, and dmesg output are not scrambled. > > (i.e. individual kernel printfs make it out atomically, or at least > > with a certain granularity, like PRINTF_BUFR_SIZE.) > > Console output includes debugger output. This must be able to break into > any otherwise-atomic kernel printf. Possibly similarly for NMIs (if they > print. They really shouldn't print, but they at least used to get this > horribly wrong from the beginning on i386). Similarly for certain (non > maskable) traps. These really shouldn't happen while we're printing, but > if they do then they may need to panic and then print for things like > machine check exceptions. > > Debugger output is the easiest way of showing why printf cannot use > any normal locking. There is an option (that I don't like or use) > which supports debugger output going through normal printf. I think > the message buffer is used for this. So message buffer accesses cannout > use normal locking either. > > Apart from this, I don't know any reason why a printf or even a msgbuf > write can't wait for a while (a very long while if the output is to a > slow device -- don't forget to test with a 50 bps serial console) until > any previous printfs complete. My simple serialization gives this if > the other printfs are on other CPUs, and possibly if they are on other > ithreads (the simple serialization doesn't give up control for simplicity, > since it is hard to tell when it is safe to give up control, but > preemption while we are spinning might get back to the other printfs). > And when the printf is recursive, there is no way back to the other > printfs, so we must proceed. We could proceed immediately, but my > simple serialization waits in this case too for simplicity. This case > should be even rarer than waiting too long for another CPU. Hopefully the delay won't be too bad for most consumers. It'll take some testing to see what happens. > >- Can be called by any kernel routine (i.e. doesn't sleep) > > Some complications for tprintf and uprintf (and maybe ordinary printf if > it is connected to a pty). I think we already have the necessary > complications. Something involving printing only to the message buffer > in the non-sleeping part. > > >- Offers a string at a time interface. > > I don't provide this. So printf's a character or word or 2 at a time > may be interleaved. It is perhaps not to much to ask that important > printfs that are likely to be interleaved preprint their strings into > lines. The lack of a string interface isn't that big of a deal, since your solution already serializes printf calls. > >- Does the right thing for log messages (priority, etc.) > > > >It looks like we should add "does not use normal locking" to the list as > >well. > > > >As Justin mentioned, we started off down the path of using atomic > >operations, and then figured out that wouldn't fully work. Then we > >discussed doing a per-CPU message buffer, with each message tagged with a > >sequence number, and having the reader re-serialize the messages in the > >right order. The complexity started to get large enough that we decided > >that using a spin lock would be a much easier approach. > > But almost eveyone who has worked on this before knows that a spinlock > cannot be used. We haven't worked on this before, of course. :) > >cnputs() already uses a spinlock, so we're no worse off than before from a > >locking standpoint. > > An obvious bug for PRINTF_BUFR_SIZE. But as I said, this one might > work well enough if it were recursive. This depends on the lower-level > console drivers supporting recursion. Most don't. Some have simple > serialization, but have to bypass it open to support debugger output, > much like higher levels except they have more control over the timing. Ahh. > But in the message buffer accesses, I think you used the lock to provide > exlusive accesses. Yes. The lock is used to prevent the buffer getting scrambled. Recursion probably wouldn't work very well. > >We could perhaps make the message buffer mutex recursive and set the > >MTX_NOWITNESS flag as well, that might make things a little better from a > >side effect standpoint. > > I don't see how a recursive mutex can protect the message buffer. Don't > forget to trace through the code protected by the mutex using ddb, while > at the same time using this code for ddb itself. It's just too hard to > make this work. Normally, this type of problem is avoided by avoiding > not using code like this for the low levels like ddb itself. Ddb mostly > does this, but has complications: > > % static void > % db_putc(int c) > % { > % > % /* > % * If not in the debugger or the user requests it, output data to > % * both the console and the message buffer. > % */ > % if (!kdb_active || ddb_use_printf) { > % printf("%c", c); > > Here it uses ordinary printf, so it uses whatever printf uses. In > particular, PRINTF_BUFR_SIZE makes printf use cnputs() and the buggy > locking in it, and printf always uses the message buffer and I think > the new code gives buggy locking there unconditionally. > > This problem is not very large, since it takes some foot shooting to get > here: > - there is some buggy code that uses db_printf() when !kdb_active. > Perhaps this is not used recursively. > - we can configure ddb to use printf. I think it is a feature that the > message buffer gets used in this mode. > > % if (!kdb_active) > % return; > % if (c == '\r' || c == '\n') > % db_check_interrupt(); > % if (c == '\n' && db_maxlines > 0) { > % db_newlines++; > % if (db_newlines >= db_maxlines) > % db_pager(); > % } > % return; > % } > % > % /* Otherwise, output data directly to the console. */ > % if (c > ' ' && c <= '~') { > % /* > % * Printing character. > % * If we have spaces to print, print them first. > % * Use tabs if possible. > % */ > % db_force_whitespace(); > % cnputc(c); > % db_capture_writech(c); > % db_output_position++; > % db_last_non_space = db_output_position; > % } > > I think this routine is used mainly by db_printf() calling kvprintf() and > kvprintf() calling here. kvprintf() is supposed to be reentrant and in > particular not use any locking to support this. I think it still does. > > Bruce I'll try to get some time to run some tests with your locking code and see how it works. We just need something that works, whether it's a spinlock or hand-rolled lock doesn't matter so much. Ken -- Kenneth Merry ken@FreeBSD.ORG
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110603211107.GA91421>