Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Nov 1999 13:47:14 -0500
From:      Simon Shapiro <shimon@simon-shapiro.org>
To:        Greg Lehey <grog@lemis.com>
Cc:        "Kenneth D. Merry" <ken@kdm.org>, Randell Jesup <rjesup@wgate.com>, freebsd-arch@freebsd.org
Subject:   Re: I/O Evaluation Questions (Long but interesting!)
Message-ID:  <3831A6B2.433979FE@simon-shapiro.org>
References:  <382B52F9.2C6D1E00@simon-shapiro.org> <199911120116.SAA30871@panzer.kdm.org> <19991113211846.31442@mojave.sitaranetworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Greg Lehey wrote:

...


> I missed the beginning of this discussion, but I'd be interested in
> seeing how you measured it.

ftp://simon-shapiro.org/crash/tools/st.d.c

To compile:

make st.d && install -c- m 755 st.d /usr/local/bin

To run (example):

st.d -f /dev/ri2o5s4a -S -i 200 &
...

kill -info %1
...
kill -hup %1

*  Compilation is self explanatory.
*  st.d with no arguments gives usage.
   Somewhare on the same fto server is a user manual
   for st.d.
*  -f /dev/ri2o0s4a specifies that the I/O is against the 
   first i2o bblock storage device, the fourth slice,
   first partition, as a character device.
*  -S says find the size of that device (stat does not tell
   it) and report it on stdout.
*  -i 200 says fork 200 instances (workers)
*  Defaults say do random read of 4KB blocks.
*  kill -info to the parent causes statistics to pring.
*  kill -hup causes all workers to exit, a final report
   to print.

> I'd be interested to see what's going on here.  I've seen some very
> weird corruption in Vinum (though not on the stack; instead, I get
> specific corruption of buffer headers, which is asynchronous to
> anything that Vinum is doing: in other words, it it not obviously
> related to the execution of any particular part of Vinum).

A bit complicated, but bear with me:

(micro-tutorial of i2o OSM block I/O execution flow):

*  We ignore initialization and open/close here.

*  Strategy called; Pop a command block off the Free Q.
   Assign the buf struct to this CB, parse the buf into an
   i2o inbund message, and add to the tail of the Waiting Q.
   Generate a software interrupt and returns.

*  Software interrupt scans the Submitted Q, verified the IOP
   has room on the Inbound FIFO, and pushes as many inbound 
   messages into the IOP inbound FIFO.  Each message thus 
   pushed is marked and added to the tail of the Waiting Queue.

*  The IOP does its thing, DMAs whatever into user memory 
   and generates a hardware interrupt.

*  Hisr identifies the completion, pops the CB off the Waiting
   Q, copies some state into the CB, adds the CB to the tail of 
   the Completed Q and returns.

*  Software ISR walks the completed Q. Pops a CB off the Q, 
   analyzes its completetion, (normally) calls biodone, 
   adds the CB to the FreeQ, and returns.

Obviously there are more details, SPLs, locks, state machine
fingerprints, etc.

The mechanism works well and is mighty fast.  About every
2-3 million operations, under heavy CPU load (the load on
the IOP is less than 50% - Cannot load it with FreeBSD
with a single CPU), the Completed Q gets corrupted.
It becomes a single CB, which points to itself.
This CB has already been done (biodone has been called
on it), and has the state flags of a member of the Free Q.

Compiling the driver with -volatile -f volatile-global has no
significant impact.

From about a week of digging, we cannot identify where the
failure is.  Tracing is difficult, for the abvious reasons.

> Having said that, you shouldn't be running against the block device.
> We know that there are some problems there, and they're not going to
> be fixed.  For example, it's relatively trivial to panic the machine
> doing a newfs on a block device.  It's easier with Vinum, but it
> sometimes works with normal disk block devices as well.  Since user
> access to block devices is going away, this will not be fixed.

This problem happens quickly with the block device, and more 
slowly with the raw device.  But still happens.
Playing with timing, one can cause the problem to 
move from the driver to something irrelevant, like the 
filesystem.  It panics during newfs, cpio, fsck, or
anything else with some bizzarre errors I reported before.

I belive it is related to how busy the CPU is on the Unix
side.  This is perhaps why vinum sees it in more places
than i2o (uses more logic).

Your last statement is unclear to me.  If users have no access
to block devices, there will be no buffered I/O?

> > Well, since you've done a lot of work to try to isolate the problem in your
> > code, but haven't tracked it down, I'd suggest taking your code out of the
> > picture as a variable.
> >
> > Create a CCD or Vinum array, using the same disks on Adaptec controllers.
> > Run the same tests, against the raw and block devices, and see if you get
> > the same sort of weird behavior.
> >
> > If you do, you have solid proof that it's not your code, since your code
> > wasn't in the kernel.  If you don't, unfortunately, you don't have solid
> > proof either way.  (Since in that case, it could be some set of
> > circumstances that your driver tickles that CCD or Vinum don't.)
> 
> Recall also my problems.  You might end up falling over in a different
> way, in which case you don't know whther

Agree.

...

> > One other thing to make sure of is that you're running a -stable with
> > Justin's Adaptec driver bug fix from September 20th.  It fixed some cases
> > where corruption could happen with Ultra 2 Adaptec controllers.
> 
> FWIW, the problems I have seen have been on other host adapters.

I am almost certain the IOP is not the culprit here.
It is either a race condition in the OSM or some volatile
memory is not.

-- 


Sincerely Yours,                 Shimon@Simon-Shapiro.ORG
                                             404.664.6401
Simon Shapiro

Unwritten code has no bugs and executes at twice the speed of mouth




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3831A6B2.433979FE>