Date: Fri, 17 Feb 2012 23:13:17 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Marcel Moolenaar <marcel@xcllnt.net> Cc: marcel@FreeBSD.org, Eitan Adler <eadler@FreeBSD.org>, svn-src-all@FreeBSD.org, "Kenneth D. Merry" <ken@FreeBSD.org>, Andriy Gapon <avg@FreeBSD.org>, src-committers@FreeBSD.org, Bruce Evans <brde@optusnet.com.au>, svn-src-head@FreeBSD.org, Julian Elischer <julian@FreeBSD.org> Subject: Re: svn commit: r231814 - in head/sys: kern sys Message-ID: <20120217191017.U1871@besplex.bde.org> In-Reply-To: <9CB7ECE8-FF10-43BE-9EBD-16953BE3B193@xcllnt.net> References: <201202160511.q1G5BZNk099785@svn.freebsd.org> <20120216181210.K1423@besplex.bde.org> <4F3CC40D.4000307@freebsd.org> <4F3CC5C4.7020501@FreeBSD.org> <4F3CC8A5.3030107@FreeBSD.org> <20120216174758.GA64180@nargothrond.kdm.org> <20120217053341.R1256@besplex.bde.org> <20120217000846.GA7641@nargothrond.kdm.org> <4F3D9D03.6020507@FreeBSD.org> <9CB7ECE8-FF10-43BE-9EBD-16953BE3B193@xcllnt.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 16 Feb 2012, Marcel Moolenaar wrote: > I think we should lift above the immediate problem and allow for > single- and multi-line messages that are atomically appended to > the message buffer. Console output and propagation of messages > outside of the kernel should all come out of the message buffer > and preserving the atomicity of the messages. This is just wrong for (low-level) console messages. Console messages are for emergencies. They must not be buffered or otherwise delayed, and should be rare, like the default for stderr in userland but much more so. In userland, it is sometimes but not always possible to keep messages on stdout ordered and atomic relative to other messages, by being very careful about fflush'ing stdout, as most applications aren't. There is an analogy between stdio and write(2) in userland and log(9) and printf(9) in the kernel. The application might be using stdio normally but want to use write() for emergency messages in signal handlers. It is unreasonable for it to add mounds of locking around stdio (or ask the implementation to add locking) to allow these emergency writes to be correctly interleaved with stdio ones. Applications that care about atomicity should build up buffers using interfaces like snprintf() and write the buffers using write(), since printf() provides no guarantees of atomicity (and in practice provides none unless you are careful with block sizes). Similarly in the kernel, except it is easy to make individual printf()s usually atomic. printf() in the kernel corresponds to unbuffered printf() to stderr in userland. Even the latter is not guaranteed to be atomic, and is practice is not unless you are careful to write only small amounts, since its implementation is to use a buffer with the bogus size BUFSIZ and write that out up to BUFSIZ bytes at a time; this gives non-atomicity at every write boundary. > The message buffer does not have to be a chunk of memory that > we circularly scribble to. It can be a per-cpu linked list of > messages even. Now it cannot really be used for emergency writes. Well, it can be used for the non-physical part. You can write to your per-cpu memory either before or after doing the physical write (better before), with no contention provided you disable interrupts on your CPU (which is bad for interrupt latency, and must be done separately from the slow physical i/o part, especially with only 1 CPU), and then combine the buffers at leisure, but with complications at that point. All messages would probably need fine-grained timestamps coherent across CPUs, or ids coherent across CPUs. The linked list to scattered memory would also give nice complications and fragility across reboots. > The advantage of thinking along these lines is that: > 1. Console output can be made optional very easily, allowing > us to implement quiet boots without loosing the ability > to look at messages collected during boot. This is already optional very easily in 3 ways: - RB_MUTE boot flag - sysctl kern.consmute to mute or to undo RB_MUTE - sysctl kern.console to give a null list of consoles. The older muting method is not really needed with this. I forget if muting kills input like this does. In better implementations of multiple consoles, the input and output devices can be attached and detached independently. > 2. Atomicity allows us to parse the messages reliably, which > works very well in the embedded space where monitoring of > kernel messages is common. My old, simple changes gave atomicity and serialization for tprintf(), log(), printf() and vprintf() whenever possible (not in emergencies that would deadlock, and not in contention with ddb) by locking everything with a common nonstandard lock. ddb output doesn't go into the message buffer (except for bugs, and maybe the option to use printf() for ddb output). Serialization of as much as possible is useful too. > 3. You can decouple writing into the message buffer from > extracting messages out of the message buffer, allowing > the low-level console to become just another channel to > send messages to, rather than be fundamental for printf. The low level console is fundamental for ddb and emergency messages. The console parts of these are much lower level and harder to get right than the msgbuf parts, since they touch much more state than can be accessed by a single atomic op, so they need locks, but deadlock is close. printf() output is also recorded in the message buffer so you can read it later. This should not be allowed to complicate or break the lower level. But it is relatively easy once the lower level works. You just make the console i/o as serialized and atomic as possible, and copy it to the message buffer in the same order, but you have to copy to the message buffer carefully using atomic ops, since in the rare cases where the console i/o is not serialized/atomic due to deadlock avoidance etc., copying to the message buffer is not automatically atomic. I used the same lock for log() as for *printf() even when log() doesn't go to the console. This case has not been tested under load, but it's hard for log() alone to generate much contention since it doesn't have to wait for slow low-level (serial) console hardware like printf() does. log() just has to wait for printf() (or log() to the console), but printf() should only be used for emergencies and thus rare. log() vs [tu]printf() is more interesting. Perhaps users can generate too many [tu]printf()s, but in practice these are rarer than printf(). > 4. A linked list (for example) eliminates the problem of > scribbling over old messages and possibly leaving partial > output that gets misinterpreted. Until memory runs out. realloc()ing the message buffer has the same problem but is less fragile. > 5. A per-cpu message buffer eliminates serialization to > guarantee atomcity and with timestamping can very easily > be turned into a sequential log. Now it needs to disable interrupts to give exclusive access. Expanding on the above, disabling interrupts is especially bad for single CPUs where the message buffer grew up, which partly explains why it is not done. At 300 bps, an 80-character message takes 2.7 seconds to print. Consoles slower than that can easily be arranged. Or use multiple consoles with 100's of them to take 270 seconds (due to lack of parallism for console output). Disabling interrupts for that long would be nasty. So the message would have to be converted to a string and buffered somewhere while holding interrupts disabled, before printing it with interrupts reenabled. The message buffer almost works here. There are still complications for emergency messages (e.g., from the interrupt handlers that we have carefully allowed): - the natural handling is to append the messages to the message buffer as they come in), but the emergency output must be in the middle of old messages. Then if we can see the message buffer later, its order won't be the same as the console output. The console order seems better. It could be recorded, so that the logging code can see both orders, as markup in the emergency messages. - nesting of emergency messages may be large, so we need to at least consider the complications for when it causes the message buffer to wrap. Of course, at emergency time we can't expand the message buffer to prevent it wrapping. > 6. We haven't introduced complications (e.g. locking) to > solve these problems and that make using printf in low- > level code impossible. Thank trap handlers or interrupt > handlers. I think you have just moved them slightly. A per-cpu msgbuf doesn't completely reduce to the !SMP case, because output from multiple CPUs still has to go through shared i/o hardware. My idea is that after doing the necessary serialization for the hardware, nothing more is needed for msgbuf. How did you intend to do the serialization for the hardware anyway? With multiple CPUs, there must be a lock to limit contention, and it should be at least per-printf, not per-char. Pre-formatting the message to a buffer before calling the console driver does nothing for the hardware part. The locking for the hardware part is currently mainly in cnputs(), where I don't like it at all. Apart from giving deadlock in various ddb and emergency cases, it fails to protect upper levels, so you need locks to give atomicity for them too, starting with msgbuf. But the slow part is the i/o. So with concurrent printf()s, everything blocks for about the same time as it would if the locks were at the top level, for only negative benefits like the complexity and bugs, and having the msgbuf order possibly different from the i/o order. The separate locks provide some benefits for concurrent log()s and printfs() when log() is not to the console. Now log() doesn't have to wait for console i/o. I think these benefits are insignficant since the cases in which log() has to wait for printf() should be rare. This wait can probably be handled with an advisory lock too, not very differently than how it is done now in msgbuf code, but I don't want the complexity unless there is an actual problem (in a reasonable configuration, not including 100 300 bps consoles :-). The msgbuf problems fixed by ken's change seem to be mainly scrambled markup. Is there anything else important? My advisory locking handles this by giving atomicity unless the system seems to have been in a deadlock state for about 1 second. Scrambling may occur after this 1 second. Even emergency messages have to wait for up to 1 second in this state, or for the few microseconds or milliseconds that it takes for a few other printf()s to complete (the timeout is too short to give atomicity for even 1 300 bps console), or for the few nanoseconds or microseconds for a few log()s to the msgbuf. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120217191017.U1871>