From owner-freebsd-bugs@FreeBSD.ORG  Fri Feb 16 22:32:13 2007
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
X-Original-To: freebsd-bugs@freebsd.org
Delivered-To: freebsd-bugs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id C57FE16A402;
	Fri, 16 Feb 2007 22:32:13 +0000 (UTC) (envelope-from bde@zeta.org.au)
Received: from mailout1.pacific.net.au (mailout1-3.pacific.net.au [61.8.2.210])
	by mx1.freebsd.org (Postfix) with ESMTP id 587BB13C48E;
	Fri, 16 Feb 2007 22:32:13 +0000 (UTC) (envelope-from bde@zeta.org.au)
Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au
	[61.8.2.162])
	by mailout1.pacific.net.au (Postfix) with ESMTP id DFB285A7CE3;
	Sat, 17 Feb 2007 09:32:11 +1100 (EST)
Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246])
	by mailproxy1.pacific.net.au (Postfix) with ESMTP id 24F358C11;
	Sat, 17 Feb 2007 09:32:10 +1100 (EST)
Date: Sat, 17 Feb 2007 09:32:08 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@epsplex.bde.org
To: Greg Ansley <gja@ansley.com>
In-Reply-To: <200702161513.l1GFDqi5006017@www.freebsd.org>
Message-ID: <20070217093159.K705@epsplex.bde.org>
References: <200702161513.l1GFDqi5006017@www.freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-bugs@freebsd.org, freebsd-gnats-submit@freebsd.org
Subject: Re: kern/109232: [sio][patch] ibufsize calculation wrong causing
 data loss
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Feb 2007 22:32:13 -0000

On Fri, 16 Feb 2007, Greg Ansley wrote:

>> Description:
> sio comwakeup time (sio_timeout) is limited to 200 hz (see siosettimeout())  but calculation for sio ibufsize uses hz which is now >> 200 hz causing insufficiently sized interrupt level sio buffers.  This in turn causes lost incoming data and the kernel issues the "interrupt-level buffer overflow" message when receiving long strings of high speed (baud > 19200) data.

The problem is actually a little different:
- the timeout is normally limited to 1 second and is not normally used
    (except in FreeBSD-1 and maybe FreeBSD-2).  It is used mainly for for
    polling (when polling is enabled on some sio device, the timeout is
    scalled by hz but limited to 200 Hz) but the polling case is broken
    in other ways.
- ibufsize is limited from below to 128.  This should be plenty for low
    speeds up to 57600 bps, and should barely work at the low speed of 115200
    bps too.  128 character times at 115200 bps = 11.1111 msec, so the system
    only needs to have a worst-case tty software interrupt latency of than
    11.1111 msec to avoid interrupt-level buffer overruns.  However, the
    broken interrupt latency in RELENG_[5-7] is apparently worse than that
    in some cases, and the calculation of ibufsize is supposed to be leaving
    a safety margin of a factor of 4.
- A period of 11.1111 msec is a frequency of just 90 Hz.  The system also
    needs a worst-case _clock_ software interrupt latency of slightly better
    than that just to keep up with the old default HZ of 100.  Both the clock
    software interrupt handler (SWI) and the tty SWI have some Giant locking,
    so they can't actually keep up with that in all cases, even on very fast
    CPUs.  The default for HZ is now 10 times larger, so there are many more
    cases in which the system can't actually keep up.  This is not usually
    a problem, provided the CPU is not very slow -- the system can keep up
    in most cases, and the other cases work provided everything uses
    enormous buffers or doesn't mind dropping i/o.
- the calculation of ibufsize is written under the naive assumption that
    the _clock_  interrupt handler can actually keep up with HZ.  Note
    that sio is essentially independent of HZ.  The calculation just
    uses hz as a hint of what the system can keep up with.  The tty SWI
    has a higher priority than the clock SWI.  Thus if the clock SWI can
    keep up with HZ, the tty SWI can keep up too.  At least before
    RELENG_5.  With mutex locking, priority inversion is possible; also,
    it is possible for the clock SWI to not have any Giant locking, while
    the tty SWI always has it; thus the clock SWI might be less limited
    by Giant locking than the tty SWI.  I think that in practice the
    priorities don't affect this problem much, but Giant locking does,
    and that the clock SWI normally blocks at least once on Giant, and that
    it only takes one long blockage on Giant in the clock SWI to destroy
    its worst-case latency in the same way as for the tty SWI.

> Probalby has bearing on kern/26261 although I don't know when hz was raised above 200.

kern/26261 seems to be mostly about lower-level problems.  Some of these
problems are smaller now that "fast" interrupt handlers can be shared.
Some of them are larger since shared interrupt handlers can only be "fast",
unless they cooperate with each other (using a scheduler a bit like the
one used for DEVICE_POLLING), but the sio interrupt handler works best
when it is fast.

>> How-To-Repeat:
> Open serial port at > 19200 baud and recieve long (> 128 bytes) of continuous full speed data.
>> Fix:
> Index: sio.c
> sio.c:1929
>
> Change:
>       cp4ticks = speed / 10 / hz * 4;
> to
>       cp4ticks = speed / 10 / (hz > 200 ? 200 : hz)  * 4;

The 200 shouldn't be used here, except possibly in the polling case.

I previously suggested bumping the factor of 4 safety margin to a
factor of 40.  This is also not quite right.  Since memory is now
plentiful and hz is only indirectly relevant, it might be best to
make the buffer size independent of hz.  E.g., scale it for hz = 10
or hz = 1, since no one should set hz that low.  The tty layer already
does this.  IIRC, it uses a size of

        (1 second's worth of data) + 2 * ibufsize

It needs a reserve of (2 * ibufsize) since the ibuf level may want to
add up to ibufsize characters to it on short notice.  The factor is 2
instead of 1 to support spacing of watermarks sufficient for flow
control.  This shows that 1 second's worth of buffering in ibuf would
be too much -- it would almost triple the size of the ttybuf.  1/10
second's worth would be OK.

ibuf needs a similar reserve adding up to <hardware rx fifo size>
characters to it at short notice.  This is currently only handled
indirectly via the factor of 4.  Some UARTs have an rx fifo size of
128 or 256, but it is currently a confguration error to use these
since using sizes >= 16 has only small possible positive benefits
and has negative benefits in practice since sio's watermark processing
is not quite right for larger sizes.  In particular, the watermark/
flow control stuff is more primitive than in the tty layer, but for
large rx fifos it needs to be similar.

So I suggest changing

   	cp4ticks = speed / 10 / hz * 4;

to

   	cp4ticks = speed / 10 / 10;

and renaming the variable and adjusting comments, since hz, ticks and 4
are no longer involved.  The wastage would be small since the tty layer
already uses huge buffers.  (However, the tty layer uses dynamic allocation
while ibuf is malloced.  The complexities for dynamic allocation are silly
given plentiful memory.  The tty subsystem only has small memory
requirements, but it manages its memory as if its requirements were
relatively large, like they were 30 years ago.  Its 30-year old memory
allocation scheme was replaced 15 years in 386BSD, but came back.)

I use HZ = 100 without the above change (but with timeouts used again,
and the 200 in siosettimeout() adjusted to 1000) and haven't seen this
problem at any speed <= 921600 bps.

Bruce