From owner-freebsd-current  Thu Oct  8 15:08:02 1998
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id PAA00925
          for freebsd-current-outgoing; Thu, 8 Oct 1998 15:08:02 -0700 (PDT)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from nomis.simon-shapiro.org (nomis.simon-shapiro.org [209.86.126.163])
          by hub.freebsd.org (8.8.8/8.8.8) with SMTP id PAA00791
          for <freebsd-current@freebsd.org>; Thu, 8 Oct 1998 15:06:27 -0700 (PDT)
          (envelope-from shimon@simon-shapiro.org)
Received: (qmail 16198 invoked by uid 1000); 8 Oct 1998 23:10:21 -0000
Message-ID: <XFMail.981008191021.shimon@simon-shapiro.org>
X-Mailer: XFMail 1.3 [p0] on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
Date: Thu, 08 Oct 1998 19:10:21 -0400 (EDT)
X-Face: (&r=uR0&yvh>h^ZL4"-T<Ako6fc!xyO!<qETO3]"NFbiV!}=/kj5G1L>H61PD}/|Y'~58Z#
 Gz&BK'&uLAf:2wLb~L7YcWfau{;N(#LR2)\i.l8'ZqVhv~$rNx$]Om6Sv36S'<b{Ioa^9\-Z$3!Z7-
 ;\X0`\'JZI~c>\~5m/U'"i/L)&t$R0&?,)tm0l5xZ!\hZU^yMyCdt!KTcQ376cCkQ^Q_n.GH;Dd-q+
 O51^+.K-1Kq?WsP9;cw-Ki+b.iY-5@3!YB5{I$h;E][Xlg*sPO61^5=:5k)JdGet,M|$"lq!1!j_>? $0Yc?
Reply-To: shimon@simon-shapiro.org
Organization: The Simon Shapiro Foundation
From: Simon Shapiro <shimon@simon-shapiro.org>
To: freebsd-current@FreeBSD.ORG
Subject: Give a Hand of Appreciation to the CAM Team
Cc: freebsd-scsi@FreeBSD.ORG
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

I would like to stand, tip my hat off and give a big, warm compliment to
the CAM team.

It is small secret that I was not all happy and in agreement with the way
the project developed, but, in the end, I was proven wrong.  They did an
impressive job.

What follows are some notes on this matter.

I was just done with some stress, load, and performance testing on the DPT
driver, in the CAM kernel for 3.0.

*  Work was based on a 3.0-current kernel from 4-Oct-1998.

*  We tested on two platforms, nomis and lead-100.

*  Nomis is a Super Micro P6DNH2 motherboard with 2 P6-200 processors,
   384MB or DRAM, 2 DPT PM3334UDW, each with 64MB of ECC cache, and three
   SCSI busses. There are some 35 disk drives attached, all in some RAID
   array arrangement or another.  Testing was done on a 19GB RAID-0
   partition (6 Barracuda Ultra Wide with striping factor of 32KB).

*  Lead-100 is a test machine with a nasty ASUS motherboard having some
   128MB or SDRAM and a single single bus PM3334UW attached to 3 Cheetah
   drives arranged in a RAID-5 with 64K stripes.  We use a 1.8GB partitions
   for testing.

*  For testing we use a modified nasty.sh which forks 256 instances at a
   time.  Each instance runs st.c, in read-only,4KB records.  St.c simply
   reads random records from the given disk. It is NOT a file-system test,
   but an I/O engine test;  All access was made to raw partitions.

*  these are available form ftp://simon-shapiro.org/crash/tools.

*  Test configuration is designed to stress the DPT firmware, The device
   driver, the O/S kernel, etc.  Not to yield the maximum quotable megabytes
   per second.

*  As a benchmark, we used tests done on a pre-CAM kernel, in a similar
   arrangement to that on nomis.

*  Successive runs of nasty.sh are initiated until he system dies or until
   1,024 instances of st.c are running.

*  The pre-CAM tests included internal measurements of latencies in the
   interrupt handler code, the DPT firmware, the driver internals, and the
   O/S stack.  These were not available in the CAM version.  they were taken
   out and not put back in again (my fault).

*  Briefly, the pre-CAM kernel, UP, on a P6-200 achieved 1,943 transfers
   per second.  This is on 1,024 instances.  Keyboard response was
   excellent, and IDE access was usable.  Interactive access to anything
   hooked up to the DPT was experiencing 6minutes response time.
   Reliability was excellent;  the test would run for 24-48 hours and then
   manually terminated without any signs of failure.  Running intensive NFS
   work in parallel, dod not seem to change the overall throughput. I
   published similar tests on these lists before.

*  Disk utilization (the observed load on the physical disks, as compared
   to load on the controller) seems to be around 30-50%.  SCSI bus
   utilization seems to be around 50-75%.  controller utilization seems to
   be around 90% visible, and about 99% of theoretical limits.

Now to the CAM results:

*  P6-200 SMP:  128 processes LA = 50.51, 1,671 tps (Transfers/Sec.)
                256 processes LA = 158-188, 1329-1898 tps
                320 processes LA = 199-240, 162-1904 tps
   We could not get much higher than that, the system simply dies.  No
   crashes, but no response to anything is visible.  Current processes
   seemed to still be running.

*  P-II- 350 UP: 256 processes LA = 59-67, 1,536-1,738 tps
                 512 Processes LA = 250-270, 569-1248 tps.
   We could not scale any higher.  Same symptoms as SMP at around 320
   processes.

*  We could not perform any NFS tests, as at these loads, the network code
   times out on anything we try.

Summary:

*  These are, by no means, conclusive results.  Neither the methods nor the
   reporting are scientifically, or even statistically valid.  But some
   behavior patters are rather visible.

*  From I/O performance perspective, SMP on FreeBSD seems like a net loss.
   I do not think there are any news here.

*  The CAM code, IRT disks (we did not test any tape nor CD devices),
   seems rather robust.  The systems boot significantly faster, and
   correctly, and we experienced no failures attributable to CAM.

*  Peak performance, on a DPT controller, seems about 5% slower than
   previous code.  This does NOT necessarily mean that the CAM code is
   slower.  It only means, that when viewed as a whole, the CAM solution is
   somewhat slower.

*  Scalability seems distinctly poorer.  Where the pre-CAM code scaled
   linearly to its peak, we see the CAM code peak around 128-256 concurrent
   processes, with erratic behavior in the 320-500 processes, coupled with
   clear and distinct decline in throughput.

*  Caveat;  While sharing major pieces of code, the DPT pre-CAM and CAM
   drivers are very different creatures.  We had not put in the time and
   effort yet to study the old driver architecture under CAM.

Conclusion:

A CAM/DPT based solution is probably production grade. It is very unlikely
that most users will approach the loads described here.  Even under severe
loads, the code holds together, with fewer quirks in the driver, to allow
for reliable operation.

OTOH, there is much room for improvement in scalability and some room for
peak performance.  We will not be bored, but busy.

Again, Thanx to Justin and Ken for all their excellent work!


Sincerely Yours,                 Shimon@Simon-Shapiro.ORG
                                             770.265.7340
Simon Shapiro

Unwritten code has no bugs and executes at twice the speed of mouth


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message