From owner-freebsd-scsi  Thu Oct 22 15:28:53 1998
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id PAA12601
          for freebsd-scsi-outgoing; Thu, 22 Oct 1998 15:28:53 -0700 (PDT)
          (envelope-from owner-freebsd-scsi@FreeBSD.ORG)
Received: from dingo.cdrom.com (dingo.cdrom.com [204.216.28.145])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id PAA12587
          for <freebsd-scsi@FreeBSD.ORG>; Thu, 22 Oct 1998 15:28:48 -0700 (PDT)
          (envelope-from mike@dingo.cdrom.com)
Received: from dingo.cdrom.com (localhost.cdrom.com [127.0.0.1])
	by dingo.cdrom.com (8.9.1/8.8.8) with ESMTP id PAA01402;
	Thu, 22 Oct 1998 15:31:39 -0700 (PDT)
	(envelope-from mike@dingo.cdrom.com)
Message-Id: <199810222231.PAA01402@dingo.cdrom.com>
X-Mailer: exmh version 2.0.2 2/24/98
To: Thomas F Keefe <keefe@cse.psu.edu>
cc: patton@sysnet.net, freebsd-scsi@FreeBSD.ORG
Subject: Re: Sequential Disk I/O 
In-reply-to: Your message of "Thu, 22 Oct 1998 12:08:02 EDT."
             <199810221608.MAA05083@remulak.cse.psu.edu> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 22 Oct 1998 15:31:39 -0700
From: Mike Smith <mike@smith.net.au>
Sender: owner-freebsd-scsi@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> > if you study how the big boys liek Oracle do it, they use block sizes in
> > excess of 8kb. Depending on how big your disks are 32kb blocks are
> > considered more reasonable. What they do is fit multiple database "blocks"
> > or records into a logical block. I suggest a vastly bigger blocksize than
> > 512bytes if you're serious about performance.
> 
> By using large blocks I can approximate the
> performance of sequentail access. I may have
> a seek and rotation penalty at the beginning
> of the transfer, but that one penalty is 
> amortized over a large number of sectors.
> This may be my only choice if I cannot 
> solve this problem. At this point I have
> spent a lot of time and become very curious
> about this.

It looks like you're suffering from some major misconceptions about the 
relative performance and behaviour of the I/O subsystem.  This makes it 
difficult to relate to your questions, because they're deeply founded 
on these misconceptions.  

To start with, you have to understand that I/O transactions move along 
queues, and the behaviour of each of these queues is different, and 
many of these behaviours are dependant upon other factors.

> 
> What I was hoping to get from this group, was
> something like:
> 	(1) It is impossible to avoid the rotational
> 	    latency when issuing writes to adjacent
> 	    sectors on a SCSI disk because the time
> 	    required between the completion of one command
> 	    and the start of the next is a significant 
> 	    fraction of the time it takes for a 5400RPM disk
> 	    platter to rotate. Thus, even with large 
> 	    strides, the next command comes to late.
> 	    This, makes only one access per revolution
> 	    possible.

Issues of rotational latency are completely irrelevant.  The decoupling 
between the system and the media on a SCSI disk is complete.

> 	or;
> 
> 	(2) Modern SCSI drives (by default) access an
> 	    entire track on both reads and writes.
> 	    Thus, only one access is possible per revolution.
> 	    This default behavior can be disabled through
> 	    the mode page as follows ...

The second does not follow from the first, but again, this is 
completely irrelevant.

> Any enlightenment you can offer on this topic will be 
> appreciated.

Without meaning offense, the amount of "enlightenment" required to 
correct your understandings on the topic is beyond the scope of any 
single email, and will probably only be gathered over a number of years 
of experience. 

To answer what I recall as being your original question as succinctly 
as possible; there is an effectively fixed overhead involved in any 
given I/O transaction.

So in moving data, there are two costs:

 - tD, the time taken to move the data itself.
 - tO, the overhead.

tD is directly proportional to the amount of data involved, so for any 
given amount of data, tD is constant.

tO is effectively fixed per transaction.  Thus, if you move the data in
small chunks, total tO is proportionally larger than if you use large
chunks.

When you come to compute throughput for a single consumer, there's 
another cost:

 - tL, the latency time (time between making the response and receiving
       an answer which is not accounted for in tO).

tL varies significantly depending on load, but it also has a fixed 
component per transaction.  Thus, again, smaller chunks mean a larger 
total tL.

tO and tL are significant for many of the queues, and so the system 
does what it can to coalesce as many read/write operations as possible, 
as well as caching, etc. in order reduce them where possible.

However, it is in your application that you can make the best
performance optimisations because only your application can know its
access behaviour in advance.

If you're interested in pursuing this further, you might want to start
by studying the FreeBSD I/O subsystem.  Then you'll want to look at how
SCSI works, starting with a typical SCSI host adapter, the SCSI standard
and perhaps some high-level documentation for a SCSI disk.

Please don't think you can apply something like the early-70's vintage 
disk theory from eg. Tanenbaum to modern disk systems; this will only 
confuse you.

-- 
\\  Sometimes you're ahead,       \\  Mike Smith
\\  sometimes you're behind.      \\  mike@smith.net.au
\\  The race is long, and in the  \\  msmith@freebsd.org
\\  end it's only with yourself.  \\  msmith@cdrom.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message