FreeBSD Mail Archives

Date:      Wed, 18 Jun 1997 00:23:01 -0700 (PDT)
From:      Simon Shapiro <Shimon@i-Connect.Net>
To:        (Satoshi Asami) <asami@cs.berkeley.edu>
Cc:        FreeBSd-SCSI@FreeBSD.org
Subject:   Re: RAID Configuration Notes
Message-ID:  <XFMail.970618002301.Shimon@i-Connect.Net>
In-Reply-To: <199706180252.TAA08303@silvia.HIP.Berkeley.EDU>

Hi Satoshi Asami;  On 18-Jun-97 you wrote: 

...

>  * *  A single SCSI-{II,III) disk can perform about 140 disk I/Os per
> second.
>  *    This statememt is true for block_size < 8K.  (Almost) Regardless of
>  *    narrow/wide, ultra/fast, etc.  The reason being that, according to
> SCSI
>  *    specifications, all negotiations and handshakes happen in narrow,
> async
>  *    5MHz.  Otherwise slow/old devices will surely hang the bus.
> 
> I'm not well versed about SCSI specs (I'll leave that for Justin or
> Stefan) but this is certainly not true.  By doing small reads from a
> very small area on the raw disk device (like 1K out of 8K), I can get
> 220 IO/s from the IBM's at work, 100 from the Microp at home (don't
> they have a cache?!?), 1,500 from the A-I at home and 2,400 from the
> A-II at home.  These are repeated reads with one process.

Caching, maybe even at the HBA level.  My numbers talk about RANDOM seeks.

Check the SCSI specs, it takes almost a full ms to get a command posted on a
BUSY bus.  Sometimes more.  If the drive disconnects, that takes time.  When
it reconnects it takes time.  BTW, I will be delighted to find that SCSI
disks
can do 1,500 I/Os per second.

> No I'm not reading it from the disk cache, the reads are done from the
> raw device and I see the disk activity light stay on during the test.
> (Besides, I get 62,000 if I read from the block device. :)
> 
> Of course, if you meant "I/Os from the disk surface" and not the
> cache, the limit is probably a hundred and something, but then the
> disk type certainly will make a huge difference (not the interface,
> but seek time and rotational speed).  Also, you need to define what
> kind of I/O's you are talking about.  A random read from the outer
> half of the disk surface will take less time than a random read from
> all over the disk, for instance.

Random read and writes, from all over the disk.  My tests are geared towards
database work, and particular application at that.  We assume the worst
conditions as we have to guarantee delivery on random access.  I ran these
tests on Slowlaris 2.51, Linux 2.0.27 (or whatever it was that week) and
FreeBSD.  Linux had DPT, FreeBSd had AHA-2940UW, Slowlaris had whatever 
(I think it is a Qlogic chip).  They were all within few percentage points,
except that the SPARC had totally FLAT response curve up ti 4K.  We could
not
test larger blocks as it hangs the calling process if you do lockf() on 8K+.

>  * *  A ribbon-cable SCSI bus (parallel, not FCAL) can support up to 440
> or so
>  *    Tx/Sec.  Yes, this means that for very high activity, much more
> than 4
>  *    drives per bus is a waste.
> 
> This is not true.  As I said above, I can get over 2,400 from a single 
> disk on an 20MHz string.  By running many in parallel I could go up to
> 2,660 with 14 disks (running at 10MHz).  Here is how it grows:
> 
>  1    2    3    4    5    6    7    8    9   10   11   12   13   14
> 214  425  635  849 1066 1278 1489 1706 1910 2126 2319 2488 2591 2665
> 
> This is with the 1K/8K size given above.  With a 1K read from all over 
> the drive surface, I get a little over 1,800 with 14 disks
> (130/disk).  These are with one process per disk.

These are very interesting numbers.  They are better than the industry
recognizes for FCAL!  According to your numbers you can sustain almost
15MB/Sec per single bus (4k transfers), which is about the theoretical 
limit.  Maybe with high cache, maybe the drive, upon cache hit does not
disconnect (this will double the throughput), maybe, maybe :-)

...

> It's only if you are running RAID-1 with two disks.  The write
> performance is typically a little less than a RAID-0 spanning half the
> drives.  Of course, that depends on the number of disks (the data has
> to go over the SCSI bus twice, so if you have enough fast disks to
> saturate the bus, it will hit the ceiling faster).  For instance, here
> with two 20MHz strings, I get 29MB/s for 4 disks striped and 20MB/s
> for 8 disks striped/mirrored.

Compound arrays are definitely useful.  BTW, how do you measure the
performance?  RAID-1, by definition is two drives. People will create
compound arrays and call them RAID-1.  They are not.  I heared them being
called RAID-10!

...

> That's very good compared to software parity.  (Just another
> disincentive for implementing parity in ccd.... ;)

Ccd has two important features for our typical, average uer:

A.  It works on ANY vlock device
B.  CHEAP

When considerint the DPT (vs. other solutions) I placed high value on 
administrative ability.  The DPT will recover and repair an array on-line
and automatically.  Virtually all others will not.  The environment
control (P/S, fans, temperature, etc.) are also critical when your system
is literally on a mountaintop hundreds of miles away.  CCD does not have
these features yet :-)

...

> (I'm not sure what a hot spare will do for your RAID-0 array, but
>  that's ok. :)

Cost money :-)  Hot spares are for RAID-{1,5}.  As tempting as it may
be, RAID-0 is useless for critical applications.  One disk decides to 
hickup and many gigabytes get a stroke.  I use it for /usr/obj, /var/tmp,
etc.  /usr/src is on RAID-5.  /RCS, /CVS, etc are on RAID-1.

...

> How about having two controllers on two PCs share the same string?
> That will guard against PC and adapter failures.  We are planning to
> do this with our system.  The Adaptecs are happy as long as you don't
> try to boot both machines at the same time with the boot disks on the
> shared string (if you have a system disk on an unshared string and
> disable the BIOS, it will be ok).  Do the DPTs allow for the SCSI ID's 
> to be changed?

This is what the DIO (phase I) does.  It allows several hosts to share
a disk array.  Locking is at arbitrary granuality.  you can lock at a
sector level, page level, RDBMS block level, partition, disk, whatever.
It uses the DLM which allows a superset of semaphores/locks to span
Unix instances.  Phase II will add the ability to do I/O on remote
machines (which do not share a device).

Yes, the DPT's allow you to change target ID, on the fly too.  The driver
does NOT allow dynamic re-assignment at this time.

Simon

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.970618002301.Shimon>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation