FreeBSD Mail Archives

Date:      Thu, 02 Jul 1998 16:13:14 -0400 (EDT)
From:      Simon Shapiro <shimon@simon-shapiro.org>
To:        wjw@IAEhv.nl
Cc:        scsi@FreeBSD.ORG
Subject:   RE: Using a DPT controller, pros and cons
Message-ID:  <XFMail.980702161314.shimon@simon-shapiro.org>
In-Reply-To: <199807021148.NAA23248@surf.IAE.nl>

On 02-Jul-98 Willem Jan  Withagen wrote:

...

> I finally got this to work at 5 Mhz, but that'll due to the cableing.
> That
> is currently not the most important issue.
> During the creation of these arrays, not everything went smoothly. And
> thus
> rebuilds were fequent. My main problem is that once there is the slightes
> hickup, the RAID is invalidated. And you have no chance of telling the
> box
> that it was your failure. Now there are some DOS tools, but you'd have to
> remember that we wanted RAID's to prevent downtimes in the first place.

this is truely the downside of the DPT system;  The interconnect between
the controller and its disks is under user control.  this means that bad
disks, bad canisters, bad enclosures, bad cables, bad power supplies, can
all cause disruption.

It is also true that if the DPT ``looses'' a device on the bus, and the
device ``re-appears'' after a boot, the DPT still marks it red-flag.  But
what is the alternative?  To blindly take a device that previously FAILED
and make it good?  Automatically?

I agree that we could have the optimal utility ported, but frankly I was
convinced by my technical contacts at DPT that this is not a good idea.
If you had a failure, then you should explicitly override the failure tag.

BTW, I said it before, and I'll say it again;  All my problems of this
nature have disappeared once I switched to DPT supplied disks, canisters,
enclosures and cables.  I also, without fail, use ECC memory supplied by
DPT.  Yes, it is expensive, but so is downntime.

The RAID boxes do not give you a choice.  they insist that not only they
supply everything except for a host, most of them will not allow you to
swap a disk yourself!

> [[ One thing I'd have to say:
>       I never had FreeBSD-stable crash while booting when there was a
>       rebuild. Note that I have the bootdisk on a measly IDE.
> ]]

This is understandable.  The I/O load associated with booting was not put
on the array being built.  There is a firmware bug/weakness regarding heavy
disk loads during a rebuild.  I belive the proper people at DPT are aware
of it.  I will post a new firmware when one is available.  In the meantime,
use 7L0, NOT 7M0.  If you insist on 7M0, have 7L0 available to re-load to
the controller.

> BUT what I really missed are any applications to be able to figure out
> what
> the RAID is doing. ALL I see is that the controller and the drive LEDs
> are
> on. Other than that: nothing is known.
> Simon has promissed that ASAP he'll make some applications which will
> allow
> us to get more control over our RAID's

3.0-current has certain utilities.  I will provide more once the migration
to CAM and the initial release of the 5th generation controllers is out.

> Would any pressure from clients persuade DPT to at least release some
> code
> which would give us some dptmgr functionality????

I have some code samples already.  I will port them to FreeBSD as time
permits.

> The other issue:
>       I've done 3 'dd if=/dev/zero of=/mnt?/bar' in parallel on the
>       system.
>       2 go to the DPT RAID-1 and RAID-5
>       1 goes to a seperate MICROPOLIS on the onboard ahc0 contoller.
> 
> Now the MICROPOLIS get a sustained write stream of something like 2.7
> Mb/sec. The two RAID's do not stream the data at all. There's seconds of
> time where either of the disks doesn't take any data. Then one disk has,
> like 5Mb/sec, and the other 0-4 blocks/sec.
> (And this get worse if rebuild are on the way)

Forward this to DPT.  The FreeBSD code has nothing to do with such things
:-(  You can try and run dptcntl to see the data and transaction rates, the
longest delay and a host of other metrics.

>From experience, you will not cause the DPt to ``stream'' before you have
around 64 dd's.

> All in all something I would not expect. 
>       the DPT contoller: 3334UW with 32Mb firmware 07L0
>       other hardware: 
>               ASUS 55TP4S
>               bootdisk conner IDE 540 Mb
>               on ahc0: cdrom, dat-tape, mo, micropolis 4.3 Gb
>               DEC de0 10Mb ethernet
>               ATI Mach 2Mb video
> 
> and in the logs I find things like:
> dpt0 ERROR: Marking 246407 (Write (10) [6.1.18]) on c0b0t12u0
>             as late after 17711004usec
> dpt0 ERROR: Marking 246409 (Write (10) [6.1.18]) on c0b0t12u0
>             as late after 17739244usec
> dpt0: Salvaging Tx 246378 from the jaws of destruction (10000/18149777)
> dpt0: Salvaging Tx 246380 from the jaws of destruction (10000/18205091)

Ah!  Thanx!

Explanation:  when the SCSi abstraction layer (in the FreeBSD kernel,
probably sys/scsi/sd.c) sent that command, it marked it to be timed out in
10 seconds (10,000ms.  After 17 seconds (17,710,004us, about 17,000ms), I
mark the transaction as ``probably lost, should be aborted'', but, from
experience, I know better and wait a little longer (other activitiy
continues!).  Indeed, after one more second (for a total of about 18
seconds or more) the transaction completes and re-appears.  This is where
the ``Salvaging...'' message tells you that and completes the transaction
as if all was well.

A comment:

*  If you get these message to go away (more on that later), remove these
   kernel options that enable this code.  They are slow, invasive and
   ugly.  they should be used only for debugging.

What can cause a transaction to take 18 seconds to complete, while others
complete much faster?

The best I can tell is either of these scenarios:

*  There is a soft error somewhere between the disk and the controller.
   Lots of re-tries, etc.
*  The disk is starving the transaction.  This happens a lot with certain
   disk drives.  I had a horrible time with some Micropolis disks.  Here is
   what happens:  The DPT sends I/O request for sectors 10, 10000, 5, and
   30. Very shortly afterwards, it sends more requests, all for sectors in
   the (example!) 0-500 range.  The drive employs an elevator sort to
   optimize seeks.  As a result of a subtle bug in the firmware, the drive
   never gets to the sector 10000.  It is very busy in the 0-500 range. 
   Result:  The ``far-out'' sector does not get served.
   Why is it happening on the DPT more than normal controllers?  The DPT,
   especially when writing to RAID arrays, write in fairly large stripes
   (128KB to 1MB are not uncommon).  this makes the I/O bursty.  In RAID-5
   it is even worse as the DPT reads/writes to several drives, in quick
   bursts when executing a WRITE request.  Also, the way caching works is
   to hold back writes until the pages age.  Then they are burst written to
   disk.  The access profile is dramatically different from ``normal'' drive
   attached to ``normal'' controller.

*  Soft errors during transfer phases.  Marginal cables, enclosures,
   backplanes, etc.  can aggravate the reliability problem, inducing many
   resets, re-tries, etc.

   Solution?  Smaller stripes, more cache memory on the DPT, ECC memory on
   the controller, ECC disks, good packaging, cabling, etc.

BTW, 2.5MB/Sec is slow, but do not expect RAID-5 to run as fast as RAID-0,
nor should you expect sequential access to be as fast on a RAID array as on
a single disk.  You need lots of cache for read-ahead to get that.

Simon

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.980702161314.shimon>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation