From owner-svn-src-all@FreeBSD.ORG Wed Jun 1 10:07:38 2011 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4867A106566B; Wed, 1 Jun 2011 10:07:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au [211.29.132.187]) by mx1.freebsd.org (Postfix) with ESMTP id 9A8228FC14; Wed, 1 Jun 2011 10:07:37 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p51A7VN3009677 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 1 Jun 2011 20:07:33 +1000 Date: Wed, 1 Jun 2011 20:07:31 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Andriy Gapon In-Reply-To: <4DE5D72A.1020405@FreeBSD.org> Message-ID: <20110601193030.A1275@besplex.bde.org> References: <201105311729.p4VHTwrZ033296@svn.freebsd.org> <4DE5D72A.1020405@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, "Kenneth D. Merry" Subject: Re: svn commit: r222537 - in head/sys: kern sys X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Jun 2011 10:07:38 -0000 On Wed, 1 Jun 2011, Andriy Gapon wrote: > on 31/05/2011 20:29 Kenneth D. Merry said the following: >> + mtx_init(&mbp->msg_lock, "msgbuf", NULL, MTX_SPIN); > > Sorry that I didn't gather myself together for a review before this change got > actually committed. > Do you see any reason not to make this spinlock recursive? I see reasons why it must not exist. The message buffer code cannot use any normal locking, and was carefully written to avoid doing so. > I am a little bit worried about "exotic" situations like receiving an NMI in the > middle of printing and wanting to print in the NMI context, or similar things > that penetrate contexts with disabled interrupts - e.g. Machine Check Exception. > Also it's not clear to me if there won't any bigger damage in the situations > like those described above. > > P.S. I have been thinking about fixing the problem in a different fashion, via > reserving portions of dmesg buffer for a whole message using CAS: > http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html Something like this might work. PRINTF_BUFR size should not exist either, especially now that it is much more complicated and broken. Here is some of my old mail about adding (necessarily not normal) locking to to printf. No one replied, so I never finished this :-(. %%% >From bde@optusnet.com.au Mon Jun 15 16:18:03 2009 +1000 Date: Mon, 15 Jun 2009 16:17:59 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: Sam Leffler , Ed Schouten , src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, svn-src-head@FreeBSD.org Subject: Re: svn commit: r194204 - in head/sys: amd64/conf i386/conf In-Reply-To: <20090615114142.B775@besplex.bde.org> Message-ID: <20090615153040.R1080@besplex.bde.org> References: <200906141801.n5EI1Zti056239@svn.freebsd.org> <4A356A0F.3050800@freebsd.org> <20090615075134.K24645@delplex.bde.org> <4A359AA6.7010101@freebsd.org> <20090615114142.B775@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Status: RO X-Status: A X-Keywords: X-UID: 913 On Mon, 15 Jun 2009, Bruce Evans wrote: > On Sun, 14 Jun 2009, Sam Leffler wrote: >> I'd love to see someone properly lock printf/console output instead of >> kludges like PRINTF_BUFR_SIZE. > > Almost done (for FreeBSD-~5.2): non-production version with lots of debugging > cruft: This version for RELENG_7 seems to work as intended. jhb should look at its atomic ops. I only have 1 normal source of interleaved messages -- the messages about waiting for system processes on shutdown on SMP systems -- and this seems to be fixed. The patch applies even without removing PRINTF_BUFR_SIZE or cnputs(), and works, at least if PRINTF_BUFR_SIZE and the per-cpu buffer are not enabled. Note that it serializes accesses to the message buffer too. The locks automatically cover the message buffer if they are applied, and are applied to all cases that set TOLOG, though this might be too dangerous for tprintf() since printing to a user's console shouldn't block other parts of the kernel. printf() to a redirected console is more careful about this -- it extracts messages from the message buffer and can handle long delays provided the message buffer doesn't wrap around. This might be the best method for all cases. Accesses to the message buffer need to be serialized for this to work properly, and serializing ony accesses to the message buffer is easier since there are no hardware delays to worry about -- strict locking with no timeouts will work provided the halting problem is solvable for kvprintf() to the message buffer alone. No attempt is made to buffer messages across printf()s. Neither does PRINTF_BUFR_SIZE. PRINTF_BUFR_SIZE uses an auto buffer so it obviously has to flush at the end of each printf. Always extracting from the message buffer might be the best way to handle this too. Normally you would extract complete lines, but there must be a timeout to handle partial lines. The 1 second timeout here isn't best if the timeout is needed for avoiding deadlock, as may happen when a printf() is interrupted by panic(). panic could reasonably() blow open all locks including printf_lock(), but this is not easy to do without letting other CPUs proceed and/or complicating the usual case. Very slow terminals, say 50 bps, need a timeout of say 16 seconds so that the can print an 80-character line without being interrupted. % Index: subr_prf.c % =================================================================== % RCS file: /home/ncvs/src/sys/kern/subr_prf.c,v % retrieving revision 1.130.2.1 % diff -u -2 -r1.130.2.1 subr_prf.c % --- subr_prf.c 21 Jan 2009 00:26:45 -0000 1.130.2.1 % +++ subr_prf.c 15 Jun 2009 05:32:03 -0000 % @@ -112,4 +112,27 @@ % &always_console_output, 0, "Always output to console despite TIOCCONS."); % % +static int printf_lockcount; % + % +static void % +printf_lock(void) % +{ % + int timeout; % + % + timeout = 1000; % + do { % + if (atomic_cmpset_acq_int(&printf_lockcount, 0, 1)) % + return; % + DELAY(1000); % + } while (--timeout != 0); % + atomic_add_acq_int(&printf_lockcount, 1); % +} % + % +static void % +printf_unlock(void) % +{ % + % + atomic_add_rel_int(&printf_lockcount, -1); % +} % + % /* % * Warn that a system table is full. % @@ -198,5 +221,9 @@ % pca.flags = flags; % va_start(ap, fmt); % + if (pri != -1) % + printf_lock(); /* To serialize msgbuf. XXX: long-held lock? */ % kvprintf(fmt, putchar, &pca, 10, ap); % + if (pri != -1) % + printf_unlock(); % va_end(ap); % if (sess != NULL) % @@ -243,5 +270,7 @@ % % va_start(ap, fmt); % + printf_lock(); /* Even for TOLOG, for serializing msgbuf. */ % kvprintf(fmt, putchar, &pca, 10, ap); % + printf_unlock(); % va_end(ap); % % @@ -312,5 +341,7 @@ % #endif % % + printf_lock(); % retval = kvprintf(fmt, putchar, &pca, 10, ap); % + printf_unlock(); % va_end(ap); % % @@ -350,5 +381,7 @@ % #endif % % + printf_lock(); % retval = kvprintf(fmt, putchar, &pca, 10, ap); % + printf_unlock(); % % #ifdef PRINTF_BUFR_SIZE Bruce >From bde@optusnet.com.au Tue Jun 16 08:16:06 2009 +1000 Date: Tue, 16 Jun 2009 08:16:03 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Bruce Evans cc: Sam Leffler , Ed Schouten , jhb@freebsd.org, src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, svn-src-head@FreeBSD.org Subject: Re: svn commit: r194204 - in head/sys: amd64/conf i386/conf In-Reply-To: <20090615153040.R1080@besplex.bde.org> Message-ID: <20090616070732.Q25544@delplex.bde.org> References: <200906141801.n5EI1Zti056239@svn.freebsd.org> <4A356A0F.3050800@freebsd.org> <20090615075134.K24645@delplex.bde.org> <4A359AA6.7010101@freebsd.org> <20090615114142.B775@besplex.bde.org> <20090615153040.R1080@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Status: O X-Status: X-Keywords: X-UID: 914 On Mon, 15 Jun 2009, Bruce Evans wrote: > ... > This version for RELENG_7 seems to work as intended. jhb should look at > its atomic ops. No reply yet. I thought of some problems with this version. Mainly with the delaying code: - Some versions of DELAY() need to lock the hardware, so calling DELAY() can deadlock. E.g., on amd64 before the TSC is initialized, and on i386 with no TSC and/or before the TSC is initialized, and when kdb is not active on both, DELAY() calls getit(), and getit() locks the clock hardware unconditionally using a non-recursive spin mutex. Contrary to what I said in previous mail, detection of erroneous recursion isn't broken in the usual case. The usual case is probably INVARIANTS, and then recursion is detected. The handling of erroneous version then causes endless recursion on printf(): it is a failing KASSERT() which will call panic(), which will call printf(), which will reach the failing KASSERT() again. The non-recursive spinlock in cnputs() has the same bug (deadlock --> recursive deadlock). This problem in DELAY() is well known, so it is worked around when kdb is active by not calling getit() then. Nearby bugs in DELAY(): if DELAY() is first called after the TSC is initialized, then its debugging code is never reached. Its debugging code is a non-NOTEd non-option and could have been removed after the getit() version of DELAY() was verified to give reasonably accurate timing, but it is more useful for the TSC version since the TSC version has not been verified to give reasonably accurate timing. The TSC version must fail any reasonable verifiications except probably for P-state invariant TSCs since the TSC varies and DELAY() makes no attempt to compensate for its variance). If DELAY() is first called before the TSC is initialized, then the debugging code still works for the i8254 but its placement is confusing, and when the counter is changed to the TSC there is no code to debug the change. - timecounters are no better than DELAY() for implementing the delaying, since apart from them possibly not working on cold and/or deadlocked systems, although the upper-level timecounter code is lock-free, the timecounter hardware code might need to use a lock. Again, the i8254 timecounter hardware code uses the clock spinlock. - KTR uses plain printf(), and KTR can produce a lot of output, so the delay should be as short as possible, as for mcount_trylock(), and 1ms is too long. Recursion is a relatively unimportant part of the problem here. Too-long delays are possible in normal operation, when one CPU is in a normal printf() and other CPUs want to do KTR printfs. Serialization of the printf()s is especially important for voluminous concurrent KTR output, but so is printing such output fast. jhb should look at this too. I use KTR less than once a year. > % Index: subr_prf.c > % =================================================================== > % RCS file: /home/ncvs/src/sys/kern/subr_prf.c,v > % retrieving revision 1.130.2.1 > % diff -u -2 -r1.130.2.1 subr_prf.c > % --- subr_prf.c 21 Jan 2009 00:26:45 -0000 1.130.2.1 > % +++ subr_prf.c 15 Jun 2009 05:32:03 -0000 > % @@ -112,4 +112,27 @@ > % &always_console_output, 0, "Always output to console despite > TIOCCONS."); > % % +static int printf_lockcount; > % + > % +static void > % +printf_lock(void) > % +{ > % + int timeout; > % + > % + timeout = 1000; > % + do { > % + if (atomic_cmpset_acq_int(&printf_lockcount, 0, 1)) > % + return; > % + DELAY(1000); > % + } while (--timeout != 0); > % + atomic_add_acq_int(&printf_lockcount, 1); > % +} If the DELAY() is removed, the initial value of `timeout' would need to be (possibly dynamically) calibrated. The timeouts for the panics for spinlocks and threadlocks in kern_mutex.c have similar problems and could benefit from calibration. First they do up to 10 million cpu_spinwait()s. 10 million might be too small or too large. Then they do up to 60 million DELAY(1)s. DELAY() can deadlock as above. 60 million is supposed to give a delay of 60 seconds, but short delays can be very inaccurate (the minimum DELAY() is about 5 us with i8254 hardware on fast CPUs and about 30 us with i8254 hardware on 1990's CPUs), so the timeouts can be more like 5 minutes than 1 minute. A non-dynamically calibrated loop using uncached memory or device accesses has a better chance of working accurately than the non-dynamically calibrated TSC loop in the amd64 and i8254 DELAY()s, since throttling of the TSC is more common than throttling of memory systems. Bruce %%% -current still has the broken cnputs(): % void % cnputs(char *p) This function was only used by PRINTF_BUR_SIZE. I is under the ifdef for one call, but the larger code in putcons() that calls it is not ifdefed although it was only used if ap->p_bufr != NULL and only PRINTF_BUFR_SIZE sets p_bufr. % { % int c; % int unlock_reqd = 0; % % if (use_cnputs_mtx) { % mtx_lock_spin(&cnputs_mtx); Ordinary locking cannot be used here. cnputs_mtx is not even recursive, so any recursive call here gives deadlock. Recursive calls can easily be arranged using ddb, perhaps just by tracing through this code, or possibly by NMIs. A recursive mutex would cause other problems (not so bad) when recursion occurse. The new msg_lock seems to have the same problems and more. It isn't recursive. cnputs_mtx is at least NOWITNESS. If msg_lock were recursive, then problems from this would probably be quite bad, unlike for cnputs_mtx -- with cnputs_mtx, you should just get interleaved output, while with msg_lock you would probably get a corrupt message buffer. % unlock_reqd = 1; % } % % while ((c = *p++) != '\0') % cnputc(c); % % if (unlock_reqd) % mtx_unlock_spin(&cnputs_mtx); % } Bruce