Date: Thu, 30 Jan 1997 12:44:23 -0800 (PST) From: Simon Shapiro <Shimon@i-Connect.Net> To: Julian Elischer <julian@whistle.com> Cc: freebsd-scsi@freebsd.org Subject: Re: NewComer Questions... Message-ID: <XFMail.970130154549.Shimon@i-Connect.Net> In-Reply-To: <32EEFCD5.794BDF32@whistle.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Julian Elischer; On 29-Jan-97 you wrote: > Simon Shapiro wrote: > > > > I am learning slowly, and just discovered this mailing list. > > > > In way of introduction, I am working on a high speed database > > engine for embedded telephony applications. > > > > We need to develop the following functionality: > > > > 1. Multi-initiator support > I assume you mean in SCSI? > we ahve some basic support for that but it requires a SCSI host > adapter that supports it.. it hasn't been exercised in years. > (I wrote it iwith Peter Dufault but it's a rarely used feature. > Or are you alking about several machines sharing a single bus?) Yes. Multiple machines sharing the same SCSI bus. We have the HBA to do that and are porting the basic driver now. I am just wondering about what is there already and what is not. > > 2. DLM > Daringly Lowfat Milk? You almost got it right... :-) But to be more precise, it stands for Distributed Lock Manager. A creature that is used in concert with multi-initiator SCSI busses and is responsible for providing the coordination necessary for such a mayhem. this arrangement is useful in two places: Large, complex databases, where more than one host (CPU, system) wants access to the same physical database and in HRA (High Reiliability and Availability) systems where one system failure still leaves a path to the database through another. > > 3. Non-stop operation > hmm this is a tricky one.. > what's your definition of non-stop? A failure of any single component does not stop the system from providing the same services as before the failure. If you picture a RAID-{1,5} box conected to two hosts, you get the SCSI part of ``non-stop''; If a disk fails, the RAID array continues to run. The RAID box actually knows how to put a hot spare into service, so performance is restored in short order. If a host fails (without putting a short on the SCSI cable), the other host can continue and access the same storage, etc. > > 4. Very large (hundreds of Gigabytes) databases > not un heard of.. we have several people into teh > 100MB range.. > it does scale, though I have some ideas of some little NITS that> will require hitting on the head.. i.e. nomenclomature thi ngs > not really technical limits. Good. I need to talk about the nomeclature soon... I also need to learn the minor to dev mapping, etc. > > 5. Very fast (400 I/O's per second sustained) databases. > we can get about 100 per disk so with 4 disks :) Yes, you are on target. On a wide/fast SCSI you can expect about 130-140 T/s, with a bus total of about 430 T/s. this was confirmed on SPARC-20 with Slowlaris 2.5.1, on Linux with a DPT PCI HBA, and on FreeBSD with AHA2940W. this hosds true for transfers of up to about 4K in length. After that, funny things happen. Slowlaris HANGS the process if you do O_SYNC writes and reads concurrently on records of 8K and larger. The DPT controller holds linear degradation to 32K transfers and then peaks up again at 64K transfers. the AHA on FreeBSD exhibits similar behaviour, slower peaking and about 1/5th the throughput, etc. We are very anxious to see what the DPT will do on FreeBSD. I have some unique questions in this area, but am curious as to how interested is this forum in these things... > > Because O/S source is very criticsl for such effort, the > > ``free'' ones area natural choice. > > > > After 2 years or more of Linux usage. I decided (at least > > for now) to not use it. FreeBSD seems very attractive. > That's why we use it.. Endorsement? > > 1. Minor device designation for systems with up to 20 disk > > controllers (PCI), FCAL interfaces (with over 100 targets > > per bus. controllers with multiple busses, etc. > your nomenclature is confusing me.. > FCAL? Sorry. I hate these abbreviations... PCI we all know. (Plug and Pray on Intel Invented Local Bus...). FCAL is Fiber Channel Arbitrated Loop. A nifty trick, where all the SCSI devices sit on a loop of a fiber channel. Something like this: HOST-A -------- Disk-1 ------- Disk-2 -------- Disk-n -----+ | | +---------------------------------------------------------+ Now, the way I remember it, data normally flows clockwise in this daisy chain. In case of failure, it can flow backwards to reach ``the other side'' of the failure. Advantages are numerous: * All traffic is actually network traffic. The SCSI bus setup, arbitration, etc. is all gone. Typically, a single loop will support more than 1,400 T/s, vs. 400 on a normal SCSI cable. * Transfer rates are much faster, on the order of several hundreds of MB/Sec. * Inherently reliable, with redundency built in. * The ``SCSI bus'' can support (I think) 255 devices per loop. * Cabling advantages; Very long runs (several hundred meters), immunity from EMI/RFI, etc. Now I may be a bit off in some of these details, but the thing is real. Costs are not abnormally high either (about $50/drive extra) This does present the question of how do we name target ID 159 on Bus 79. Does it not? And what will its minor number be? > there are boards with 3 bosses an d2 bisses that are supported. > W can support PCI bridges to get more slots Yes. this is exactly where I am going with it. The Adaptec 3940 is really 2 controllers, not two busses. It appeas that FreeBSD config is OK in this regard, but I can find no documentation. When the DPT driver is done, I will be glad to contribute it to FreeBSD... > > 2. Naming conventions for /dev entries for such beasts. > /dev/{r}sd[0-9][0-9] > there is a limit at the moment to about 32 drives per machine > (I think) but it's a rather artificial limit > and could be removed relativly easily. Good. How is this limit imposed? I need to know. We can have sd0-sdf. What we need is either sd00-sdff or (better?) c[0-f]b[0-f]d[00-ff]s[0-400][a-h]. This gives you exactly 32 bits minors which I noticed FreeBSD to use already. > > 3. Moving all SCSI devices to a /dev/{r}dsk... > not sure what you mean by this..... > if you don't like the names there is always mknod:) > (so that ca't be what you mean.. it's too easy) This is exactly what I mean. Not all questions are difficult. On a large system, the number of sd* and rds* entries in /dev can be a bit overwhelming. What I propose is considering moving them to their own directory, like SunOS, etc. have done. I realize I could do that on ``my own system''. > > 4. Any documents about SCSI HBA driver entry points > ah there's the rub.. > I guess I'm going to have to write that one day.. > 6 years isn't too long is it? (since I wrote the code) :-) We are discovering these slowly. What we do not want to miss is the full intention of the original architect. We are comparing and analyzing the bt and the aic7xxx drivers, but that only tells me what these writers understood and thought appropriate for their hardware, not what the system actually CAN do. for example, for MI work, we need open/close and init/start on the HBA, as well as the devices attached. We discovered these entry points in scsiconf.h, but no driver uses them. We then notice (as I posted before), that the standard open/close pass no arguments, but the PC98 passes something to the open. Devices have a nicer open, but when does it get called? When will the open to the ADAPTER be called? When the initializing routines are running, are interrupts enabled? Is the VM totally functional? Can one sleep in the initializing? This is the class of questions we have. If we can gain good understanding here, i may be able to initiate the documentation of the SCSI subsystem as part of this project. ... Thanx so much for your help... Simon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.970130154549.Shimon>