From owner-freebsd-hackers  Tue Mar  3 22:16:50 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id WAA02461
          for freebsd-hackers-outgoing; Tue, 3 Mar 1998 22:16:50 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from sendero.simon-shapiro.org (sendero-fxp0.Simon-Shapiro.ORG [206.190.148.34])
          by hub.freebsd.org (8.8.8/8.8.8) with SMTP id WAA02447
          for <hackers@freebsd.org>; Tue, 3 Mar 1998 22:16:36 -0800 (PST)
          (envelope-from shimon@sendero-fxp0.simon-shapiro.org)
Received: (qmail 28856 invoked by uid 1000); 4 Mar 1998 06:23:33 -0000
Message-ID: <XFMail.980303222332.shimon@simon-shapiro.org>
X-Mailer: XFMail 1.3-alpha-021598 [p0] on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <19980303232444.59397@mcs.net>
Date: Tue, 03 Mar 1998 22:23:32 -0800 (PST)
Reply-To: shimon@simon-shapiro.org
Organization: The Simon Shapiro Foundation
From: Simon Shapiro <shimon@simon-shapiro.org>
To: Karl Denninger <karl@mcs.net>
Subject: Re: SCSI Bus redundancy...
Cc: Wilko Bulte <wilko@yedi.iaf.nl>, sbabkin@dcn.att.com,
        tlambert@primenet.com, jdn@acp.qiv.com, blkirk@float.eli.net,
        hackers@FreeBSD.ORG, grog@lemis.com
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 04-Mar-98 Karl Denninger wrote:
 
...

> The probelm is getting enough cache to matter.
> 
> We have a 25GB RAID 0+1 news spool.  Its about half full right now, on
> its
> way upward (we keep tuning expiration times).  There is basically ZERO
> locality of reference by the readers, which means that you'd need at
> least a
> couple of GB of RAM to make any difference at all.

Well known problem.  Similar problems happen with RDBMS code in the CPU
proper;  How much L1/L2 cache can you throw at it.

> Now the RAID adapter helps - a lot - by striping the writes and reads, 
> primarily.  The ultra SCSI bus ends up being the controlling factor.

Yup. And fast it is not.  This is what I like about the DPT setup;  I have
three independant SCSI busses to work with.

 ...

> Yep.  The other problem is that a kernel RAID *cannot* do writeback
> caching.
> If it does, you're f*d if the power goes out or the OS goes down dirty.

Yup.  Layering and modularity and compartmentalizing take many shapes.

> The standalones CAN do writeback, because they can have a battery on them
> AND if the CPU dies they keep running.

They are a true I/O Channel, in the mainframe tradition.

> RAID 5, in particular, benefits enormously from writeback, as it allows 
> it to defer writes until an entire stripe is ready, which means no 
> read/compute/write cycle.  This is a monstrous win for performance.

I played, on the DPT with RAID{0,1,5} stripe size vs. perfromance.  The
numbers really move around.  I used to even know how to compute this
stuff...
 
...

> The best I've seen off our RAID systems right now is about 11MB/sec
> (that's
> megaBYTES, not bits).  That's on an Ultra bus, with 2 ultra busses going
> to
> the RAID disks.

About right.  SCSI-II used to be 4-5 MB/bus.  Ultra-wide is about 5-6, for
small O/S-type blocks.  I see about 18 MB/Sec on the DPT on three busses.
The difficulty is in having FreeBSD capable of producing this traffic on
small blocks (dd if=/dev/zero of=/dev/something bs=64k is NOT typical
application).

> Neither the disk buses nor the RAID controller CPU are saturated.  I 
> believe this is pretty much the wall on one SCSI channel, at least with 
> 16 SCBs.  I'm going to try it with SCBPAGING turned on and see if that 
> helps, but for sequential reads it probably won't matter much.

Hook up a DPT to one of these boxes.  Will be interesting to see what
happens. Seriously.

> I could run two host channels on this thing across two RAID sets into two
> Adaptec adapters.  That might be a big win.
> 
> I suspect the bottleneck is in the AIC code at this point, or the bus 
> itself, or the interrupt latency on the DMA completion is killing me.  
> There is no appreciable difference between running at 40MB/sec (ultra 
> full-bore) and 20MB/sec, indicating that perhaps the hold-up is in the 
> Adaptec microcode, driver, and/or the Adaptec/PCI bus interface.

I read the AIC code quite carefully when writing the DPT code.  There is
nothing obviously wrong with it.  Justin is a very careful engineer.
It is either the sequencer itself, or the SCSI layer, or FreeBSD.  To get a
DPT to saturate, you need about (with 4KB transfers, random across the
entire array), 1,900 transactions per second.  To reach 1,740 or so, from
userspace, I have to run about 256 copies of st.c.  The LA is about 220 at
this point.  Not very good for interactive work.

Trying the same with SMP either crashes, or goes down to about 880 TPS.

Simon


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message