From owner-freebsd-bugs@FreeBSD.ORG Fri Feb 16 22:32:13 2007 Return-Path: X-Original-To: freebsd-bugs@freebsd.org Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C57FE16A402; Fri, 16 Feb 2007 22:32:13 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1-3.pacific.net.au [61.8.2.210]) by mx1.freebsd.org (Postfix) with ESMTP id 587BB13C48E; Fri, 16 Feb 2007 22:32:13 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout1.pacific.net.au (Postfix) with ESMTP id DFB285A7CE3; Sat, 17 Feb 2007 09:32:11 +1100 (EST) Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (Postfix) with ESMTP id 24F358C11; Sat, 17 Feb 2007 09:32:10 +1100 (EST) Date: Sat, 17 Feb 2007 09:32:08 +1100 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: Greg Ansley In-Reply-To: <200702161513.l1GFDqi5006017@www.freebsd.org> Message-ID: <20070217093159.K705@epsplex.bde.org> References: <200702161513.l1GFDqi5006017@www.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-bugs@freebsd.org, freebsd-gnats-submit@freebsd.org Subject: Re: kern/109232: [sio][patch] ibufsize calculation wrong causing data loss X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Feb 2007 22:32:13 -0000 On Fri, 16 Feb 2007, Greg Ansley wrote: >> Description: > sio comwakeup time (sio_timeout) is limited to 200 hz (see siosettimeout()) but calculation for sio ibufsize uses hz which is now >> 200 hz causing insufficiently sized interrupt level sio buffers. This in turn causes lost incoming data and the kernel issues the "interrupt-level buffer overflow" message when receiving long strings of high speed (baud > 19200) data. The problem is actually a little different: - the timeout is normally limited to 1 second and is not normally used (except in FreeBSD-1 and maybe FreeBSD-2). It is used mainly for for polling (when polling is enabled on some sio device, the timeout is scalled by hz but limited to 200 Hz) but the polling case is broken in other ways. - ibufsize is limited from below to 128. This should be plenty for low speeds up to 57600 bps, and should barely work at the low speed of 115200 bps too. 128 character times at 115200 bps = 11.1111 msec, so the system only needs to have a worst-case tty software interrupt latency of than 11.1111 msec to avoid interrupt-level buffer overruns. However, the broken interrupt latency in RELENG_[5-7] is apparently worse than that in some cases, and the calculation of ibufsize is supposed to be leaving a safety margin of a factor of 4. - A period of 11.1111 msec is a frequency of just 90 Hz. The system also needs a worst-case _clock_ software interrupt latency of slightly better than that just to keep up with the old default HZ of 100. Both the clock software interrupt handler (SWI) and the tty SWI have some Giant locking, so they can't actually keep up with that in all cases, even on very fast CPUs. The default for HZ is now 10 times larger, so there are many more cases in which the system can't actually keep up. This is not usually a problem, provided the CPU is not very slow -- the system can keep up in most cases, and the other cases work provided everything uses enormous buffers or doesn't mind dropping i/o. - the calculation of ibufsize is written under the naive assumption that the _clock_ interrupt handler can actually keep up with HZ. Note that sio is essentially independent of HZ. The calculation just uses hz as a hint of what the system can keep up with. The tty SWI has a higher priority than the clock SWI. Thus if the clock SWI can keep up with HZ, the tty SWI can keep up too. At least before RELENG_5. With mutex locking, priority inversion is possible; also, it is possible for the clock SWI to not have any Giant locking, while the tty SWI always has it; thus the clock SWI might be less limited by Giant locking than the tty SWI. I think that in practice the priorities don't affect this problem much, but Giant locking does, and that the clock SWI normally blocks at least once on Giant, and that it only takes one long blockage on Giant in the clock SWI to destroy its worst-case latency in the same way as for the tty SWI. > Probalby has bearing on kern/26261 although I don't know when hz was raised above 200. kern/26261 seems to be mostly about lower-level problems. Some of these problems are smaller now that "fast" interrupt handlers can be shared. Some of them are larger since shared interrupt handlers can only be "fast", unless they cooperate with each other (using a scheduler a bit like the one used for DEVICE_POLLING), but the sio interrupt handler works best when it is fast. >> How-To-Repeat: > Open serial port at > 19200 baud and recieve long (> 128 bytes) of continuous full speed data. >> Fix: > Index: sio.c > sio.c:1929 > > Change: > cp4ticks = speed / 10 / hz * 4; > to > cp4ticks = speed / 10 / (hz > 200 ? 200 : hz) * 4; The 200 shouldn't be used here, except possibly in the polling case. I previously suggested bumping the factor of 4 safety margin to a factor of 40. This is also not quite right. Since memory is now plentiful and hz is only indirectly relevant, it might be best to make the buffer size independent of hz. E.g., scale it for hz = 10 or hz = 1, since no one should set hz that low. The tty layer already does this. IIRC, it uses a size of (1 second's worth of data) + 2 * ibufsize It needs a reserve of (2 * ibufsize) since the ibuf level may want to add up to ibufsize characters to it on short notice. The factor is 2 instead of 1 to support spacing of watermarks sufficient for flow control. This shows that 1 second's worth of buffering in ibuf would be too much -- it would almost triple the size of the ttybuf. 1/10 second's worth would be OK. ibuf needs a similar reserve adding up to characters to it at short notice. This is currently only handled indirectly via the factor of 4. Some UARTs have an rx fifo size of 128 or 256, but it is currently a confguration error to use these since using sizes >= 16 has only small possible positive benefits and has negative benefits in practice since sio's watermark processing is not quite right for larger sizes. In particular, the watermark/ flow control stuff is more primitive than in the tty layer, but for large rx fifos it needs to be similar. So I suggest changing cp4ticks = speed / 10 / hz * 4; to cp4ticks = speed / 10 / 10; and renaming the variable and adjusting comments, since hz, ticks and 4 are no longer involved. The wastage would be small since the tty layer already uses huge buffers. (However, the tty layer uses dynamic allocation while ibuf is malloced. The complexities for dynamic allocation are silly given plentiful memory. The tty subsystem only has small memory requirements, but it manages its memory as if its requirements were relatively large, like they were 30 years ago. Its 30-year old memory allocation scheme was replaced 15 years in 386BSD, but came back.) I use HZ = 100 without the above change (but with timeouts used again, and the 200 in siosettimeout() adjusted to 1000) and haven't seen this problem at any speed <= 921600 bps. Bruce