From owner-freebsd-bugs  Thu Mar  9 15:10:29 1995
Return-Path: bugs-owner
Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id PAA25454 for bugs-outgoing; Thu, 9 Mar 1995 15:10:29 -0800
Received: from cs.weber.edu (cs.weber.edu [137.190.16.16]) by freefall.cdrom.com (8.6.10/8.6.6) with SMTP id PAA25446 for <freebsd-bugs@FreeBSD.org>; Thu, 9 Mar 1995 15:10:22 -0800
Received: by cs.weber.edu (4.1/SMI-4.1.1)
	id AA08011; Thu, 9 Mar 95 16:03:57 MST
From: terry@cs.weber.edu (Terry Lambert)
Message-Id: <9503092303.AA08011@cs.weber.edu>
Subject: Re: QIC-80 problem
To: bakul@netcom.com (Bakul Shah)
Date: Thu, 9 Mar 95 16:03:57 MST
Cc: freebsd-bugs@FreeBSD.org
In-Reply-To: <199503092219.OAA20388@netcom17.netcom.com> from "Bakul Shah" at Mar 9, 95 02:19:28 pm
X-Mailer: ELM [version 2.4dev PL52]
Sender: bugs-owner@FreeBSD.org
Precedence: bulk

> There is only one input and one output to steam.  Now
> *ideally* we'd like to say: put input block n in buffer 0,
> block n+1 in buffer 1, and so on, and oh, by the way, call
> me when a buffer is filled or eof is reached.  Similarly for
> output side: when we are notified that block n is filled, we
> tell the output side to write it out in sequence and tell us
> when done so that we can recycle the buffer -- this is pretty
> much what we do at driver level if the controller supports
> queueing.
> 
> But this is *not* what we can do at user level under Unix
> with any number of dups and O_ASYNC in a single process
> [actually O_ASYNC is a misnomer; it should really be called
> O_SIGIO or O_SIGNAL_ME_WHEN_IO_IS_POSSIBLE or something].
> Or have you guys added real asyncio to FreeBSD while I was
> not looking?  If so, that is great news!!

Yes, this was predicated on having aioread/aiowrite/aiowait/aiocancel
primitives available to the program, just like on a Sun machine or
on SVR4.

Sorry if this wasn't clear; you're right that O_ASYNC is a bogosity
that is not useful.

There was recently a discussion by someone on the hackers list who
was implementing real async I/O.

I had the primitives implemented in the 386BSD 0.1+patchkit 2 days
when I reimplemented Sun LWP first on Sun's primitives so I
didn't have to deal with register winodws and stack switching
and later brought the code over to 386BSD.  That code is totally
totally useless now that we are onto 4.4 and major changes to
the SCSI and other disk I/O have taken place.

In reality, you want to be able to do this with an alternate gate
to make *any* potentially blocking system call async -- the work
to do that is BTSOTD (Beyond The Scope Of This Discussion)  8-).
It's actually the work needed prior to kernel multithreading, and
kernel preemption is the work needed before SMP.

I didn't expect the developement to be done (at least not done on
FreeBSD ) by say next Tuesday.  8-).

> > Admittedly, the context switch issue is _relatively_ less of a
> > problem (I won't say that it isn't a problem entirely).  But
> > there is also the issue of the token passing using pipes between
> > the process, etc..  An implementation using aio in a single
> > process context would avoid a lot of system call overhead as
> > well as context switch overhead in the token passing, and also
> > avoid things like pipe read/write latency, since team does NOT
> > interleave that.
> 
> team interleaves pipe IO *if* the incoming data or outgoing data
> is via pipes.  team can not interleave read/write of control
> pipes because they are used for synchronization.  The core
> loop for a team member is something like this:

[ ... ]

> I grant your basic point that a single process implementation
> will avoid a lot of context switch and syscall overhead.  But
> I just do not see how it can be done (even if we give up
> on portability).

Like I said, I grant your point of the current async I/O in BSD
using O_ASYNC being simply insufficient.

> Note that all that context switching + syscall overhead is a
> problem only if the read/write quantum is so small that a
> tape device can do it in a few milliseconds.

I don't understand this statement; the write has to run to
completion in the calling processes context to allow the ft to
run with a safe margin.  The problem is that the next write
*isn't* started sufficiently soon, either because of data
unavailability (which ft supposedly fixes) or because of no
quantum for ft to do its fixing.  It's this second case that
wants the low overhead soloution.  Even so, such a soloution
is just a bandaid on the real problem which is the ft driver.

It's interesting to consider clever ways to apply the bandaid,
and to note that these uses are universally applicable performance
winds for the problem they are really intended to solve (ie: the
bandaid is a side effect).  Which is what got us onto this
tangent in the first place.  8-).


> On my 25Mhz 486 ovehead of team is about 1.2ms *per* block
> of data (and *regardless* of the number of team processes).
> As long as your tape read/write takes atleast 10 times as
> long, you should be pretty safe on a lightly loaded system.
> I don't know the data rate of a QIC-80 but on a 4mm DAT that
> translates to a blocksize of 250KB/s*12ms or about 3KBytes
> on my machine!

Yeah, the key here is system loading and competition of the ft
program for process quantum because of increased load because
of the compress.  I think adding competing processes (the better
to rob ft of needed quanta) by using team is probably not the
way you really want to go anyway.


I think other than some minor semantic issues, we are in violent
agreement!.  8-).


					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.