FreeBSD Mail Archives

Date:      Wed, 13 May 1998 10:27:06 -0600
From:      "Justin T. Gibbs" <gibbs@plutotech.com>
To:        fireston@lexmark.com
Cc:        gibbs@plutotech.com (Justin T. Gibbs), freebsd-scsi@FreeBSD.ORG
Subject:   Re: SCSI timeouts in dataout mode 
Message-ID:  <199805131630.KAA09877@pluto.plutotech.com>
In-Reply-To: Your message of "Wed, 13 May 1998 09:53:45 EDT." <199805131416.IAA28579@pluto.plutotech.com>

>The problem occurs with only one LUN or many.  The HP does not have a status
>monitor that we can access, although the blinky lights on the front of the
>unit are quite hypnotic.  

Does it go "ping!"?? 8-)

>> Do you have access to a SCSI bus analyzer?
>
>Yes, and we will be using that tomorrow (our normal duties have gone ignored
>a bit too long ).  I have no clue how to read the output from such a beast -
>what should we be looking for?

A transaction that doesn't reconnect, as well as the mean time between 
request and completions.

>I will make the observation that this error is somehow related to the amount
>of time it takes to write the file.  If it takes more then 10 seconds to
>write, we see this error.  We have not yet tried to read a large file to see
>what problems that can cause.

Ahh.  This is a big clue.  Depending on how the array performs write 
caching, it may well be that we are hitting it with so many transactions
that the cache isn't draining fast enough for them to complete within the
10 second window.  10 seconds seem like an eternity to me, but I suppose 
if you hit the array with 64 64k transactions to a RAID5 array, it 
performs write back caching meaning that you can have 64 "completed
transactions" waiting in cache flush at the same time you have 64 "pending 
transactions" it could be awhile before there is cache space to service the
new guys.  You might want to bump up the timeout.  Look for 
DA_DEFAULT_TIMEOUT in sys/cam/scsi/scsi_da.c.  You may also want to 
increase the frequency of ordered tagged transactions as this will impose 
a touch of write ordering so the array doesn't starve "expensive writes"
into causing a timeout.  See DA_ORDEREDTAG_INTERVAL in scsi_da.c.

>To hazard a guess, I really think this is some strange interaction between the
>scsi code and the cache on the HP.  Either the HP is doing the right kind of
>handshaking or the FreeBSD box is missing it.  But this is only a guess.

It's probably not a protocol error at all.  The timeout is somewhat 
arbitrary, but as it is difficult to ask the device "hey are you still 
working on this?" you have to draw the line somewhere.  Perhaps we should 
choose a larger timeout value by default?

>-- 
>Mik Firestone fireston@lexmark.com
>If ever I become an Evil Overlord:
>All naive, busty tavern wenches in my realm will be replaced with surly,
>world-weary waitresses who will provide no unexpected reinforcement and/or
>romantic subplot for the hero or his sidekick.

--
Justin



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199805131630.KAA09877>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation