From owner-cvs-all  Mon Sep 20  2:34: 9 1999
Delivered-To: cvs-all@freebsd.org
Received: from overcee.netplex.com.au (overcee.netplex.com.au [202.12.86.7])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4EB9914DBF; Mon, 20 Sep 1999 02:33:59 -0700 (PDT)
	(envelope-from peter@netplex.com.au)
Received: from netplex.com.au (localhost [127.0.0.1])
	by overcee.netplex.com.au (Postfix) with ESMTP
	id 75A841CC5; Mon, 20 Sep 1999 17:33:58 +0800 (WST)
	(envelope-from peter@netplex.com.au)
X-Mailer: exmh version 2.0.2 2/24/98
To: Jesper Skriver <jesper@skriver.dk>
Cc: Poul-Henning Kamp <phk@critter.freebsd.dk>,
	Matthew Dillon <dillon@apollo.backplane.com>,
	cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org
Subject: Re: User block device access (was: cvs commit: src/sys/miscfs/specfs spec_vnops.c src/sys/sys vnode.h src/sys/kern vfs_subr.c) 
In-reply-to: Your message of "Mon, 20 Sep 1999 10:46:01 +0200."
             <19990920104601.B75298@skriver.dk> 
Date: Mon, 20 Sep 1999 17:33:58 +0800
From: Peter Wemm <peter@netplex.com.au>
Message-Id: <19990920093358.75A841CC5@overcee.netplex.com.au>
Sender: owner-cvs-all@FreeBSD.ORG
Precedence: bulk

Jesper Skriver wrote:
> On Sun, Sep 19, 1999 at 10:05:04PM +0200, Poul-Henning Kamp wrote:
> > In message <Pine.BSF.4.05.9909191254200.42176-100000@semuta.feral.com>, Mat
    thew
> >  Jacob writes:
> > 
> > >
> > >Okay, then. Really, seriously, though- if we're all stuck arguing a
> > >major issue from different viewpoints for lack of < 1K$ equipment, this is
> > >an easy problem to solve from the K$ point of view (hadn't thought about
> > >customs- I guess I just can't express mail these puppies, can I? :-))
> > 
> > You know, only Matt Dillon thought this was a hardware issue, I don't
> > think it is. 
> 
> I see the same a Poul-Henning
> 
> # time dd if=/dev/rda0 of=/dev/null bs=8k count=10000
> 10000+0 records in
> 10000+0 records out
> 81920000 bytes transferred in 6.293370 secs (13016873 bytes/sec)
>     6.30s real     0.04s user     0.46s system
> # time dd if=/dev/da0 of=/dev/null bs=8k count=10000  
> 10000+0 records in
> 10000+0 records out
> 81920000 bytes transferred in 12.496958 secs (6555195 bytes/sec)
>    12.51s real     0.02s user     3.26s system

        raw     rawcpu  block   blkcpu  file    filecpu
bsize   MB/s    sec     MB/s    sec     MB/s    sec
----------------------------------------------------
1024k   21.4     0.16   12.5     9.09   13.7    3.11
512k    21.3     0.15   12.2     9.23   13.6    3.12
256k    21.6     0.17   12.4     9.08   13.6    2.89
128k    21.6     0.18   12.4     8.80   13.6    2.54
64k     21.4     0.22   12.3     8.82   13.6    2.58
32k     21.3     0.36   12.4     8.82   15.5    2.62
16k     21.6     0.68   12.4     8.99   15.5    2.65
8k      20.0     1.28   12.2     9.18   16.8    2.69
4k      19.1     2.52   12.4     9.60   17.2    2.79
2k      12.6     4.89   12.3    10.56   17.2    3.37
1k       7.4     9.96   12.3    11.87   17.2    4.82
0.5k     4.2    19.67   11.9    14.58   17.3    7.41

Notice three things-
- raw (character) IO nosedives in throughput with smaller block sizes and it's
  cpu cost goes through the roof
- block IO throughput remains fairly constant with block size but cpu usage
  is fairly high (but is cheaper than raw IO at small bsize with dd).
- file IO (a fs with the same size file in the partition) is significantly
  faster than block IO and *increases* throughput as the block size goes down.
  file IO is never slower than block IO to the same disk zone and is cheaper
  in cpu cost.

The obvious question is.. why isn't block IO implemented the same way as
read() ends up going to the device.  ie: zap the caching aspects of bio and
make specfs use the same access methods for "block" devices as read() uses
to get to the devices in a filesystem.

Wouldn't this achieve the goals of all parties?  bio would be dramatically
simplified as it has no caching or coherency issues to deal with, and there
would still be mmap/unaligned read/buffering/coherency/etc provided by the
VM system, and it would make bdevs faster in the process.

(This was tested without Matt's patches to use vmio)

Regarding the different names of devices (rda0s1e vs da0s1e) and the
confusion that arises ("I fsck rdaXXX but mount daXXX right?"), I think it
would be better to rename slightly.  ie: rdaXXX becomes daXXX (char devices
are already mountable if I recall correctly), and the old daXXX devices
become bdaXXX or something else.  Then you end up with all user exposure to
the raw devices for everything from fsck, mount, etc, gives us a chance to
renumber so bmaj == cmaj, and still allows block access "out of the way"
for things like mmaping an INN cyclic news spool and still get the required
caching.

FWIW:Pentium III (450-MHz), mem = 256M.  System in use running X and
not getting any advantage from in-core caching:
107M Active, 72M Inact, 57M Wired, 11M Cache, 17M Buf, 1292K Free

ahc0: <Adaptec aic7880 Ultra SCSI adapter> irq 15 at device 17.0 on pci0
ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs
da0: <QUANTUM ATLAS IV 9 WLS 0808> Fixed Direct Access SCSI-3 device 
da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da0: 8761MB (17942584 512 byte sectors: 255H 63S/T 1116C)

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message