Date: Fri, 3 Jun 2011 20:47:00 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: "Kenneth D. Merry" <ken@freebsd.org> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Andriy Gapon <avg@freebsd.org>, Bruce Evans <brde@optusnet.com.au> Subject: Re: svn commit: r222537 - in head/sys: kern sys Message-ID: <20110603204653.B1551@besplex.bde.org> In-Reply-To: <20110602213611.GA47880@nargothrond.kdm.org> References: <201105311729.p4VHTwrZ033296@svn.freebsd.org> <4DE5D72A.1020405@FreeBSD.org> <20110601193030.A1275@besplex.bde.org> <20110602213611.GA47880@nargothrond.kdm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2 Jun 2011, Kenneth D. Merry wrote: > On Wed, Jun 01, 2011 at 20:07:31 +1000, Bruce Evans wrote: >> On Wed, 1 Jun 2011, Andriy Gapon wrote: >> >>> on 31/05/2011 20:29 Kenneth D. Merry said the following: >>>> + mtx_init(&mbp->msg_lock, "msgbuf", NULL, MTX_SPIN); >>> >>> Sorry that I didn't gather myself together for a review before this change >>> got >>> actually committed. >>> Do you see any reason not to make this spinlock recursive? It does need to be recursive to avoid deadlock (as I described), but this prevents it actually working as a lock. I think the new code needs it to work as a lock. >> I see reasons why it must not exist. The message buffer code cannot use >> any normal locking, and was carefully written to avoid doing so. >> >>> I am a little bit worried about "exotic" situations like receiving an NMI >>> in the >>> middle of printing and wanting to print in the NMI context, or similar >>> things >>> that penetrate contexts with disabled interrupts - e.g. Machine Check >>> Exception. >>> Also it's not clear to me if there won't any bigger damage in the >>> situations >>> like those described above. >>> >>> P.S. I have been thinking about fixing the problem in a different fashion, >>> via >>> reserving portions of dmesg buffer for a whole message using CAS: >>> http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html >> >> Something like this might work. >> >> PRINTF_BUFR size should not exist either, especially now that it is >> much more complicated and broken. Here is some of my old mail about >> adding (necessarily not normal) locking to to printf. No one replied, >> so I never finished this :-(. > > That particular solution doesn't lock the normal kernel printf path, and so > won't fix what we're trying to fix. (We've got lots of disks in each > system, and when there are various SES events and disks arriving and > departing, there are multiple kernel contexts doing printfs simultaneously, > and that results in scrambled dmesg output.) But it does. PRINTF_BUFR_SIZE (at least used to) only be applied to vprintf(), but my printf_lock() is applied to all calls to kvprintf() that set TOLOG. (If I missed some, this takes 2 lines each to fix.) The calls in tprintf() and log() have comments saying that this is intentional "to serialize msgbuf", though I fear this might not be safe for at least tprintf(), since it may block unboundedly. But even when it blocks unboundedly, nothing really bad happens: the output just becomes very slow due to everything waiting for the timeout until the blockage clears, and may be interleaved. Recursion gives similar behaviour, except it takes unrecursing to clear the blockage. I consider the very slow output to be a feature. Unbounded blockages and recursion are bugs and the slow output makes them more obvious. > I think the goals should be: > > - console output, syslog output, and dmesg output are not scrambled. > (i.e. individual kernel printfs make it out atomically, or at least > with a certain granularity, like PRINTF_BUFR_SIZE.) Console output includes debugger output. This must be able to break into any otherwise-atomic kernel printf. Possibly similarly for NMIs (if they print. They really shouldn't print, but they at least used to get this horribly wrong from the beginning on i386). Similarly for certain (non maskable) traps. These really shouldn't happen while we're printing, but if they do then they may need to panic and then print for things like machine check exceptions. Debugger output is the easiest way of showing why printf cannot use any normal locking. There is an option (that I don't like or use) which supports debugger output going through normal printf. I think the message buffer is used for this. So message buffer accesses cannout use normal locking either. Apart from this, I don't know any reason why a printf or even a msgbuf write can't wait for a while (a very long while if the output is to a slow device -- don't forget to test with a 50 bps serial console) until any previous printfs complete. My simple serialization gives this if the other printfs are on other CPUs, and possibly if they are on other ithreads (the simple serialization doesn't give up control for simplicity, since it is hard to tell when it is safe to give up control, but preemption while we are spinning might get back to the other printfs). And when the printf is recursive, there is no way back to the other printfs, so we must proceed. We could proceed immediately, but my simple serialization waits in this case too for simplicity. This case should be even rarer than waiting too long for another CPU. > - Can be called by any kernel routine (i.e. doesn't sleep) Some complications for tprintf and uprintf (and maybe ordinary printf if it is connected to a pty). I think we already have the necessary complications. Something involving printing only to the message buffer in the non-sleeping part. > - Offers a string at a time interface. I don't provide this. So printf's a character or word or 2 at a time may be interleaved. It is perhaps not to much to ask that important printfs that are likely to be interleaved preprint their strings into lines. > - Does the right thing for log messages (priority, etc.) > > It looks like we should add "does not use normal locking" to the list as > well. > > As Justin mentioned, we started off down the path of using atomic > operations, and then figured out that wouldn't fully work. Then we > discussed doing a per-CPU message buffer, with each message tagged with a > sequence number, and having the reader re-serialize the messages in the > right order. The complexity started to get large enough that we decided > that using a spin lock would be a much easier approach. But almost eveyone who has worked on this before knows that a spinlock cannot be used. > cnputs() already uses a spinlock, so we're no worse off than before from a > locking standpoint. An obvious bug for PRINTF_BUFR_SIZE. But as I said, this one might work well enough if it were recursive. This depends on the lower-level console drivers supporting recursion. Most don't. Some have simple serialization, but have to bypass it open to support debugger output, much like higher levels except they have more control over the timing. But in the message buffer accesses, I think you used the lock to provide exlusive accesses. > We could perhaps make the message buffer mutex recursive and set the > MTX_NOWITNESS flag as well, that might make things a little better from a > side effect standpoint. I don't see how a recursive mutex can protect the message buffer. Don't forget to trace through the code protected by the mutex using ddb, while at the same time using this code for ddb itself. It's just too hard to make this work. Normally, this type of problem is avoided by avoiding not using code like this for the low levels like ddb itself. Ddb mostly does this, but has complications: % static void % db_putc(int c) % { % % /* % * If not in the debugger or the user requests it, output data to % * both the console and the message buffer. % */ % if (!kdb_active || ddb_use_printf) { % printf("%c", c); Here it uses ordinary printf, so it uses whatever printf uses. In particular, PRINTF_BUFR_SIZE makes printf use cnputs() and the buggy locking in it, and printf always uses the message buffer and I think the new code gives buggy locking there unconditionally. This problem is not very large, since it takes some foot shooting to get here: - there is some buggy code that uses db_printf() when !kdb_active. Perhaps this is not used recursively. - we can configure ddb to use printf. I think it is a feature that the message buffer gets used in this mode. % if (!kdb_active) % return; % if (c == '\r' || c == '\n') % db_check_interrupt(); % if (c == '\n' && db_maxlines > 0) { % db_newlines++; % if (db_newlines >= db_maxlines) % db_pager(); % } % return; % } % % /* Otherwise, output data directly to the console. */ % if (c > ' ' && c <= '~') { % /* % * Printing character. % * If we have spaces to print, print them first. % * Use tabs if possible. % */ % db_force_whitespace(); % cnputc(c); % db_capture_writech(c); % db_output_position++; % db_last_non_space = db_output_position; % } I think this routine is used mainly by db_printf() calling kvprintf() and kvprintf() calling here. kvprintf() is supposed to be reentrant and in particular not use any locking to support this. I think it still does. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110603204653.B1551>