FreeBSD Mail Archives

Date:      Fri, 17 Feb 2012 23:13:17 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Marcel Moolenaar <marcel@xcllnt.net>
Cc:        marcel@FreeBSD.org, Eitan Adler <eadler@FreeBSD.org>, svn-src-all@FreeBSD.org, "Kenneth D. Merry" <ken@FreeBSD.org>, Andriy Gapon <avg@FreeBSD.org>, src-committers@FreeBSD.org, Bruce Evans <brde@optusnet.com.au>, svn-src-head@FreeBSD.org, Julian Elischer <julian@FreeBSD.org>
Subject:   Re: svn commit: r231814 - in head/sys: kern sys
Message-ID:  <20120217191017.U1871@besplex.bde.org>
In-Reply-To: <9CB7ECE8-FF10-43BE-9EBD-16953BE3B193@xcllnt.net>
References:  <201202160511.q1G5BZNk099785@svn.freebsd.org> <20120216181210.K1423@besplex.bde.org> <4F3CC40D.4000307@freebsd.org> <4F3CC5C4.7020501@FreeBSD.org> <4F3CC8A5.3030107@FreeBSD.org> <20120216174758.GA64180@nargothrond.kdm.org> <20120217053341.R1256@besplex.bde.org> <20120217000846.GA7641@nargothrond.kdm.org> <4F3D9D03.6020507@FreeBSD.org> <9CB7ECE8-FF10-43BE-9EBD-16953BE3B193@xcllnt.net>

On Thu, 16 Feb 2012, Marcel Moolenaar wrote:

> I think we should lift above the immediate problem and allow for
> single- and multi-line messages that are atomically appended to
> the message buffer. Console output and propagation of messages
> outside of the kernel should all come out of the message buffer
> and preserving the atomicity of the messages.

This is just wrong for (low-level) console messages.  Console messages
are for emergencies.  They must not be buffered or otherwise delayed,
and should be rare, like the default for stderr in userland but much
more so.  In userland, it is sometimes but not always possible to keep
messages on stdout ordered and atomic relative to other messages, by
being very careful about fflush'ing stdout, as most applications aren't.

There is an analogy between stdio and write(2) in userland and log(9)
and printf(9) in the kernel.  The application might be using stdio
normally but want to use write() for emergency messages in signal
handlers.  It is unreasonable for it to add mounds of locking around
stdio (or ask the implementation to add locking) to allow these
emergency writes to be correctly interleaved with stdio ones.

Applications that care about atomicity should build up buffers using
interfaces like snprintf() and write the buffers using write(), since
printf() provides no guarantees of atomicity (and in practice provides
none unless you are careful with block sizes).  Similarly in the kernel,
except it is easy to make individual printf()s usually atomic.  printf()
in the kernel corresponds to unbuffered printf() to stderr in userland.
Even the latter is not guaranteed to be atomic, and is practice is not
unless you are careful to write only small amounts, since its
implementation is to use a buffer with the bogus size BUFSIZ and write
that out up to BUFSIZ bytes at a time; this gives non-atomicity at
every write boundary.

> The message buffer does not have to be a chunk of memory that
> we circularly scribble to. It can be a per-cpu linked list of
> messages even.

Now it cannot really be used for emergency writes.  Well, it can be
used for the non-physical part.  You can write to your per-cpu memory
either before or after doing the physical write (better before), with
no contention provided you disable interrupts on your CPU (which is
bad for interrupt latency, and must be done separately from the slow
physical i/o part, especially with only 1 CPU), and then combine the
buffers at leisure, but with complications at that point.  All
messages would probably need fine-grained timestamps coherent across
CPUs, or ids coherent across CPUs.

The linked list to scattered memory would also give nice complications
and fragility across reboots.

> The advantage of thinking along these lines is that:
> 1.  Console output can be made optional very easily, allowing
>    us to implement quiet boots without loosing the ability
>    to look at messages collected during boot.

This is already optional very easily in 3 ways:
- RB_MUTE boot flag
- sysctl kern.consmute to mute or to undo RB_MUTE
- sysctl kern.console to give a null list of consoles.  The older
   muting method is not really needed with this.  I forget if muting
   kills input like this does.  In better implementations of multiple
   consoles, the input and output devices can be attached and detached
   independently.

> 2.  Atomicity allows us to parse the messages reliably, which
>    works very well in the embedded space where monitoring of
>    kernel messages is common.

My old, simple changes gave atomicity and serialization for tprintf(),
log(), printf() and vprintf() whenever possible (not in emergencies
that would deadlock, and not in contention with ddb) by locking
everything with a common nonstandard lock.  ddb output doesn't go
into the message buffer (except for bugs, and maybe the option to use
printf() for ddb output).  Serialization of as much as possible
is useful too.

> 3.  You can decouple writing into the message buffer from
>    extracting messages out of the message buffer, allowing
>    the low-level console to become just another channel to
>    send messages to, rather than be fundamental for printf.

The low level console is fundamental for ddb and emergency messages.
The console parts of these are much lower level and harder to get right
than the msgbuf parts, since they touch much more state than can be
accessed by a single atomic op, so they need locks, but deadlock is
close.  printf() output is also recorded in the message buffer so you
can read it later.  This should not be allowed to complicate or break
the lower level.  But it is relatively easy once the lower level works.
You just make the console i/o as serialized and atomic as possible,
and copy it to the message buffer in the same order, but you have to
copy to the message buffer carefully using atomic ops, since in the
rare cases where the console i/o is not serialized/atomic due to
deadlock avoidance etc., copying to the message buffer is not
automatically atomic.

I used the same lock for log() as for *printf() even when log()
doesn't go to the console.  This case has not been tested under
load, but it's hard for log() alone to generate much contention
since it doesn't have to wait for slow low-level (serial) console
hardware like printf() does.  log() just has to wait for printf()
(or log() to the console), but printf() should only be used for
emergencies and thus rare.  log() vs [tu]printf() is more
interesting.  Perhaps users can generate too many [tu]printf()s,
but in practice these are rarer than printf().

> 4.  A linked list (for example) eliminates the problem of
>    scribbling over old messages and possibly leaving partial
>    output that gets misinterpreted.

Until memory runs out.  realloc()ing the message buffer has the
same problem but is less fragile.

> 5.  A per-cpu message buffer eliminates serialization to
>    guarantee atomcity and with timestamping can very easily
>    be turned into a sequential log.

Now it needs to disable interrupts to give exclusive access.
Expanding on the above, disabling interrupts is especially bad for
single CPUs where the message buffer grew up, which partly explains
why it is not done.  At 300 bps, an 80-character message takes 2.7
seconds to print.  Consoles slower than that can easily be arranged.
Or use multiple consoles with 100's of them to take 270 seconds (due
to lack of parallism for console output).  Disabling interrupts for
that long would be nasty.  So the message would have to be converted
to a string and buffered somewhere while holding interrupts disabled,
before printing it with interrupts reenabled.  The message buffer
almost works here.  There are still complications for emergency
messages (e.g., from the interrupt handlers that we have carefully
allowed):
- the natural handling is to append the messages to the message buffer
   as they come in), but the emergency output must be in the middle of
   old messages.  Then if we can see the message buffer later, its order
   won't be the same as the console output.  The console order seems
   better.  It could be recorded, so that the logging code can see both
   orders, as markup in the emergency messages.
- nesting of emergency messages may be large, so we need to at least
   consider the complications for when it causes the message buffer to
   wrap.  Of course, at emergency time we can't expand the message
   buffer to prevent it wrapping.

> 6.  We haven't introduced complications (e.g. locking) to
>    solve these problems and that make using printf in low-
>    level code impossible. Thank trap handlers or interrupt
>    handlers.

I think you have just moved them slightly.  A per-cpu msgbuf doesn't
completely reduce to the !SMP case, because output from multiple
CPUs still has to go through shared i/o hardware.

My idea is that after doing the necessary serialization for the hardware,
nothing more is needed for msgbuf.  How did you intend to do the
serialization for the hardware anyway?  With multiple CPUs, there must
be a lock to limit contention, and it should be at least per-printf,
not per-char.  Pre-formatting the message to a buffer before calling
the console driver does nothing for the hardware part.  The locking
for the hardware part is currently mainly in cnputs(), where I
don't like it at all.  Apart from giving deadlock in various ddb and
emergency cases, it fails to protect upper levels, so you need locks
to give atomicity for them too, starting with msgbuf.  But the slow
part is the i/o.  So with concurrent printf()s, everything blocks for
about the same time as it would if the locks were at the top level,
for only negative benefits like the complexity and bugs, and having
the msgbuf order possibly different from the i/o order.  The separate
locks provide some benefits for concurrent log()s and printfs() when
log() is not to the console.  Now log() doesn't have to wait for console
i/o.  I think these benefits are insignficant since the cases in which
log() has to wait for printf() should be rare.  This wait can probably
be handled with an advisory lock too, not very differently than how
it is done now in msgbuf code, but I don't want the complexity unless
there is an actual problem (in a reasonable configuration, not including
100 300 bps consoles :-).

The msgbuf problems fixed by ken's change seem to be mainly scrambled
markup.  Is there anything else important?  My advisory locking handles
this by giving atomicity unless the system seems to have been in a
deadlock state for about 1 second.  Scrambling may occur after this
1 second.  Even emergency messages have to wait for up to 1 second
in this state, or for the few microseconds or milliseconds that it
takes for a few other printf()s to complete (the timeout is too short
to give atomicity for even 1 300 bps console), or for the few nanoseconds
or microseconds for a few log()s to the msgbuf.

Bruce

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120217191017.U1871>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation