Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Jan 1997 12:44:23 -0800 (PST)
From:      Simon Shapiro <Shimon@i-Connect.Net>
To:        Julian Elischer <julian@whistle.com>
Cc:        freebsd-scsi@freebsd.org
Subject:   Re: NewComer Questions...
Message-ID:  <XFMail.970130154549.Shimon@i-Connect.Net>
In-Reply-To: <32EEFCD5.794BDF32@whistle.com>

next in thread | previous in thread | raw e-mail | index | archive | help

Hi Julian Elischer;  On 29-Jan-97 you wrote: 
> Simon Shapiro wrote:
> > 
> > I am learning slowly, and just discovered this mailing list.
> > 
> > In way of introduction, I am working on a high speed database
> > engine for embedded telephony applications.
> > 
> > We need to develop the following functionality:
> > 
> > 1.  Multi-initiator support
> I assume you mean in SCSI?
> we ahve some basic support for that but it requires a  SCSI host
> adapter that supports it.. it hasn't been exercised in years.
> (I wrote it iwith Peter Dufault but it's a rarely used feature.
> Or are you alking about several machines sharing a single bus?)

Yes.  Multiple machines sharing the same SCSI bus.  We have the HBA
to do that and are porting the basic driver now.  I am just 
wondering about what is there already and what is not.

> > 2.  DLM
> Daringly Lowfat Milk?

You almost got it right... :-)  But to be more precise, it stands
for Distributed Lock Manager.  A creature that is used in concert 
with multi-initiator SCSI busses and is responsible for providing
the coordination necessary for such a mayhem.  this arrangement is
useful in two places:  Large, complex databases, where more than
one host (CPU, system) wants access to the same physical database
and in HRA (High Reiliability and Availability) systems where one
system failure still leaves a path to the database through another.

> > 3.  Non-stop operation
> hmm this is a tricky one..
> what's your definition of non-stop?

A failure of any single component does not stop the system from
providing the same services as before the failure.

If you picture a RAID-{1,5} box conected to two hosts, you get the
SCSI part of ``non-stop'';  If a disk fails, the RAID array
continues to run.  The RAID box actually knows how to put a hot
spare into service, so performance is restored in short order.
If a host fails (without putting a short on the SCSI cable), the 
other host can continue and access the same storage, etc.

> > 4.  Very large (hundreds of Gigabytes) databases
> not un heard of.. we have several people into teh > 100MB range..
>  it does scale, though I have some ideas of some little NITS that> will
require hitting on the head.. i.e. nomenclomature thi
ngs
> not really technical limits.

Good.  I need to talk about the nomeclature soon...  I also need to
learn the minor to dev mapping, etc.

> > 5.  Very fast (400 I/O's per second sustained) databases.
> we can get about 100 per disk so with 4 disks :)

Yes, you are on target.  On a wide/fast SCSI you can expect about
130-140 T/s, with a bus total of about 430 T/s.  this was confirmed
on SPARC-20 with Slowlaris 2.5.1, on Linux with a DPT PCI HBA, and
on FreeBSD with AHA2940W.  this hosds true for transfers of up to
about 4K in length.  After that, funny things happen.
Slowlaris HANGS the process if you do O_SYNC writes and reads
concurrently on records of 8K and larger.  The DPT controller holds
linear degradation to 32K transfers and then peaks up again at 64K
transfers.  the AHA on FreeBSD exhibits similar behaviour, slower
peaking and about 1/5th the throughput, etc.  We are very anxious 
to see what the DPT will do on FreeBSD.

I have some unique questions in this area, but am curious as to how
interested is this forum in these things...

> > Because O/S source is very criticsl for such effort, the
> > ``free'' ones area natural choice.
> > 
> > After 2 years or more of Linux usage.  I decided (at least
> > for now) to not use it.  FreeBSD seems very attractive.
> That's why we use it..

Endorsement?

> > 1.  Minor device designation for systems with up to 20 disk
> >     controllers (PCI), FCAL interfaces (with over 100 targets
> >     per bus.  controllers with multiple busses, etc.
> your nomenclature is confusing me..
> FCAL?

Sorry.  I hate these abbreviations...
PCI we all know.  (Plug and Pray on Intel Invented Local Bus...).
FCAL is Fiber Channel Arbitrated Loop.  A nifty trick, where all
the SCSI devices sit on a loop of a fiber channel.  Something like
this:

    HOST-A --------  Disk-1 ------- Disk-2 -------- Disk-n -----+
      |                                                         |
      +---------------------------------------------------------+

Now, the way I remember it, data normally flows clockwise in this 
daisy chain.  In case of failure, it can flow backwards to reach
``the other side'' of the failure.

Advantages are numerous:  

* All traffic is actually network traffic.  The SCSI bus setup,
  arbitration, etc. is all gone.  Typically, a single loop will
  support more than 1,400 T/s, vs. 400 on a normal SCSI cable.

* Transfer rates are much faster, on the order of several hundreds
  of MB/Sec.

* Inherently reliable, with redundency built in.

* The ``SCSI bus'' can support (I think) 255 devices per loop.

* Cabling advantages;  Very long runs (several hundred meters),
  immunity from EMI/RFI, etc.

Now I may be a bit off in some of these details, but the thing is
real.  Costs are not abnormally high either (about $50/drive extra)

This does present the question of how do we name target ID 159 on
Bus 79.  Does it not?  And what will its minor number be?


> there are boards with 3 bosses an d2 bisses that are supported.
> W can support PCI bridges to get more slots

Yes.  this is exactly where I am going with it.  The Adaptec 3940
is really 2 controllers, not two busses.  It appeas that FreeBSD
config is OK in this regard, but I can find no documentation.
When the DPT driver is done, I will be glad to contribute it to
FreeBSD...

> > 2.  Naming conventions for /dev entries for such beasts.
> /dev/{r}sd[0-9][0-9]
> there is a limit at the moment to about 32 drives per machine
> (I think) but it's a rather artificial limit
> and could be removed relativly easily.

Good.  How is this limit imposed?  I need to know.
We can have sd0-sdf.  What we need is either sd00-sdff or (better?)
c[0-f]b[0-f]d[00-ff]s[0-400][a-h].  This gives you exactly 32 bits
minors which I noticed FreeBSD to use already.


> > 3.  Moving all SCSI devices to a /dev/{r}dsk...
> not sure what you mean by this.....
> if you don't like the names there is always mknod:)
> (so that ca't be what you mean.. it's too easy)

This is exactly what I mean.  Not all questions are difficult.
On a large system, the number of sd* and rds* entries in /dev can
be a bit overwhelming.  What I propose is considering moving them
to their own directory, like SunOS, etc. have done.  I realize I 
could do that on ``my own system''.


> > 4.  Any documents about SCSI HBA driver entry points
> ah there's the rub..
> I guess I'm going to have to write that one day..
> 6 years isn't too long is it? (since I wrote the code)

:-)  We are discovering these slowly.  What we do not want to miss
is the full intention of the original architect.  We are comparing 
and analyzing the bt and the aic7xxx drivers, but that only tells
me what these writers understood and thought appropriate for their
hardware, not what the system actually CAN do.

for example, for MI work, we need open/close and init/start on the
HBA, as well as the devices attached.  We discovered these entry
points in scsiconf.h, but no driver uses them.  We then notice (as
I posted before), that the standard open/close pass no arguments,
but the PC98 passes something to the open.

Devices have a nicer open, but when does it get called?  When will
the open to the ADAPTER be called?

When the initializing routines are running, are interrupts enabled?
Is the VM totally functional?  Can one sleep in the initializing?
This is the class of questions we have.
If we can gain good understanding here, i may be able to initiate 
the documentation of the SCSI subsystem as part of this project.

...

Thanx so much for your help...

Simon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.970130154549.Shimon>