FreeBSD Mail Archives

Date:      Fri, 3 Jun 2011 15:11:07 -0600
From:      "Kenneth D. Merry" <ken@freebsd.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Andriy Gapon <avg@freebsd.org>
Subject:   Re: svn commit: r222537 - in head/sys: kern sys
Message-ID:  <20110603211107.GA91421@nargothrond.kdm.org>
In-Reply-To: <20110603204653.B1551@besplex.bde.org>
References:  <201105311729.p4VHTwrZ033296@svn.freebsd.org> <4DE5D72A.1020405@FreeBSD.org> <20110601193030.A1275@besplex.bde.org> <20110602213611.GA47880@nargothrond.kdm.org> <20110603204653.B1551@besplex.bde.org>

On Fri, Jun 03, 2011 at 20:47:00 +1000, Bruce Evans wrote:
> On Thu, 2 Jun 2011, Kenneth D. Merry wrote:
> 
> >On Wed, Jun 01, 2011 at 20:07:31 +1000, Bruce Evans wrote:
> >>On Wed, 1 Jun 2011, Andriy Gapon wrote:
> >>
> >>>on 31/05/2011 20:29 Kenneth D. Merry said the following:
> >>>>+	mtx_init(&mbp->msg_lock, "msgbuf", NULL, MTX_SPIN);
> >>>
> >>>Sorry that I didn't gather myself together for a review before this 
> >>>change
> >>>got
> >>>actually committed.
> >>>Do you see any reason not to make this spinlock recursive?
> 
> It does need to be recursive to avoid deadlock (as I described), but this
> prevents it actually working as a lock.  I think the new code needs it to
> work as a lock.

Yes, it does.

> >>I see reasons why it must not exist.  The message buffer code cannot use
> >>any normal locking, and was carefully written to avoid doing so.
> >>
> >>>I am a little bit worried about "exotic" situations like receiving an NMI
> >>>in the
> >>>middle of printing and wanting to print in the NMI context, or similar
> >>>things
> >>>that penetrate contexts with disabled interrupts - e.g. Machine Check
> >>>Exception.
> >>>Also it's not clear to me if there won't any bigger damage in the
> >>>situations
> >>>like those described above.
> >>>
> >>>P.S. I have been thinking about fixing the problem in a different 
> >>>fashion,
> >>>via
> >>>reserving portions of dmesg buffer for a whole message using CAS:
> >>>http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html
> >>
> >>Something like this might work.
> >>
> >>PRINTF_BUFR size should not exist either, especially now that it is
> >>much more complicated and broken.  Here is some of my old mail about
> >>adding (necessarily not normal) locking to to printf.  No one replied,
> >>so I never finished this :-(.
> >
> >That particular solution doesn't lock the normal kernel printf path, and so
> >won't fix what we're trying to fix.  (We've got lots of disks in each
> >system, and when there are various SES events and disks arriving and
> >departing, there are multiple kernel contexts doing printfs simultaneously,
> >and that results in scrambled dmesg output.)
> 
> But it does.  PRINTF_BUFR_SIZE (at least used to) only be applied to
> vprintf(), but my printf_lock() is applied to all calls to kvprintf()
> that set TOLOG.  (If I missed some, this takes 2 lines each to fix.)
> The calls in tprintf() and log() have comments saying that this is
> intentional "to serialize msgbuf", though I fear this might not be safe
> for at least tprintf(), since it may block unboundedly.  But even when
> it blocks unboundedly, nothing really bad happens: the output just
> becomes very slow due to everything waiting for the timeout until the
> blockage clears, and may be interleaved.  Recursion gives similar
> behaviour, except it takes unrecursing to clear the blockage.  I
> consider the very slow output to be a feature.  Unbounded blockages
> and recursion are bugs and the slow output makes them more obvious.

Ahh.  It wasn't 100% clear from looking at the patch which calls had the
lock applied.  In this case at least, it's skipping the lock when the
priority is -1.  Is this tprintf()?

%  /*
%   * Warn that a system table is full.
% @@ -198,5 +221,9 @@
%       pca.flags = flags;
%       va_start(ap, fmt);
% +     if (pri != -1)
% +             printf_lock();  /* To serialize msgbuf.  XXX: long-held
lock? */
%       kvprintf(fmt, putchar, &pca, 10, ap);
% +     if (pri != -1)
% +             printf_unlock();
%       va_end(ap);
%       if (sess != NULL)
%

> >I think the goals should be:
> >
> >- console output, syslog output, and dmesg output are not scrambled.
> >  (i.e. individual kernel printfs make it out atomically, or at least
> >  with a certain granularity, like PRINTF_BUFR_SIZE.)
> 
> Console output includes debugger output.  This must be able to break into
> any otherwise-atomic kernel printf.  Possibly similarly for NMIs (if they
> print.  They really shouldn't print, but they at least used to get this
> horribly wrong from the beginning on i386).  Similarly for certain (non
> maskable) traps.  These really shouldn't happen while we're printing, but
> if they do then they may need to panic and then print for things like
> machine check exceptions.
> 
> Debugger output is the easiest way of showing why printf cannot use
> any normal locking.  There is an option (that I don't like or use)
> which supports debugger output going through normal printf.  I think
> the message buffer is used for this.  So message buffer accesses cannout
> use normal locking either.
> 
> Apart from this, I don't know any reason why a printf or even a msgbuf
> write can't wait for a while (a very long while if the output is to a
> slow device -- don't forget to test with a 50 bps serial console) until
> any previous printfs complete.  My simple serialization gives this if
> the other printfs are on other CPUs, and possibly if they are on other
> ithreads (the simple serialization doesn't give up control for simplicity,
> since it is hard to tell when it is safe to give up control, but
> preemption while we are spinning might get back to the other printfs).
> And when the printf is recursive, there is no way back to the other
> printfs, so we must proceed.  We could proceed immediately, but my
> simple serialization waits in this case too for simplicity.  This case
> should be even rarer than waiting too long for another CPU.

Hopefully the delay won't be too bad for most consumers.  It'll take some
testing to see what happens.

> >- Can be called by any kernel routine (i.e. doesn't sleep)
> 
> Some complications for tprintf and uprintf (and maybe ordinary printf if
> it is connected to a pty).  I think we already have the necessary
> complications.  Something involving printing only to the message buffer
> in the non-sleeping part.
> 
> >- Offers a string at a time interface.
> 
> I don't provide this.  So printf's a character or word or 2 at a time
> may be interleaved.  It is perhaps not to much to ask that important
> printfs that are likely to be interleaved preprint their strings into
> lines.

The lack of a string interface isn't that big of a deal, since your
solution already serializes printf calls.

> >- Does the right thing for log messages (priority, etc.)
> >
> >It looks like we should add "does not use normal locking" to the list as
> >well.
> >
> >As Justin mentioned, we started off down the path of using atomic
> >operations, and then figured out that wouldn't fully work.  Then we
> >discussed doing a per-CPU message buffer, with each message tagged with a
> >sequence number, and having the reader re-serialize the messages in the
> >right order.  The complexity started to get large enough that we decided
> >that using a spin lock would be a much easier approach.
> 
> But almost eveyone who has worked on this before knows that a spinlock
> cannot be used.

We haven't worked on this before, of course. :)

> >cnputs() already uses a spinlock, so we're no worse off than before from a
> >locking standpoint.
> 
> An obvious bug for PRINTF_BUFR_SIZE.  But as I said, this one might
> work well enough if it were recursive.  This depends on the lower-level
> console drivers supporting recursion.  Most don't.  Some have simple
> serialization, but have to bypass it open to support debugger output,
> much like higher levels except they have more control over the timing.

Ahh.

> But in the message buffer accesses, I think you used the lock to provide
> exlusive accesses.

Yes.  The lock is used to prevent the buffer getting scrambled.  Recursion
probably wouldn't work very well.

> >We could perhaps make the message buffer mutex recursive and set the
> >MTX_NOWITNESS flag as well, that might make things a little better from a
> >side effect standpoint.
> 
> I don't see how a recursive mutex can protect the message buffer.  Don't
> forget to trace through the code protected by the mutex using ddb, while
> at the same time using this code for ddb itself.  It's just too hard to
> make this work.  Normally, this type of problem is avoided by avoiding
> not using code like this for the low levels like ddb itself.  Ddb mostly
> does this, but has complications:
> 
> % static void
> % db_putc(int c)
> % {
> % 
> % 	/*
> % 	 * If not in the debugger or the user requests it, output data to
> % 	 * both the console and the message buffer.
> % 	 */
> % 	if (!kdb_active || ddb_use_printf) {
> % 		printf("%c", c);
> 
> Here it uses ordinary printf, so it uses whatever printf uses.  In
> particular, PRINTF_BUFR_SIZE makes printf use cnputs() and the buggy
> locking in it, and printf always uses the message buffer and I think
> the new code gives buggy locking there unconditionally.
> 
> This problem is not very large, since it takes some foot shooting to get
> here:
> - there is some buggy code that uses db_printf() when !kdb_active.
>   Perhaps this is not used recursively.
> - we can configure ddb to use printf.  I think it is a feature that the
>   message buffer gets used in this mode.
> 
> % 		if (!kdb_active)
> % 			return;
> % 		if (c == '\r' || c == '\n')
> % 			db_check_interrupt();
> % 		if (c == '\n' && db_maxlines > 0) {
> % 			db_newlines++;
> % 			if (db_newlines >= db_maxlines)
> % 				db_pager();
> % 		}
> % 		return;
> % 	}
> % 
> % 	/* Otherwise, output data directly to the console. */
> % 	if (c > ' ' && c <= '~') {
> % 	    /*
> % 	     * Printing character.
> % 	     * If we have spaces to print, print them first.
> % 	     * Use tabs if possible.
> % 	     */
> % 	    db_force_whitespace();
> % 	    cnputc(c);
> % 	    db_capture_writech(c);
> % 	    db_output_position++;
> % 	    db_last_non_space = db_output_position;
> % 	}
> 
> I think this routine is used mainly by db_printf() calling kvprintf() and
> kvprintf() calling here.  kvprintf() is supposed to be reentrant and in
> particular not use any locking to support this.  I think it still does.
> 
> Bruce

I'll try to get some time to run some tests with your locking code and see
how it works.

We just need something that works, whether it's a spinlock or hand-rolled
lock doesn't matter so much.

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110603211107.GA91421>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation