Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Jan 1999 16:08:29 +0100
From:      Bernd Walter <ticso@cicely.de>
To:        Greg Lehey <grog@lemis.com>, freebsd-scsi@FreeBSD.ORG
Subject:   Re: new Quirk candidate and vinum behavour
Message-ID:  <19990102160829.32955@cicely.de>
In-Reply-To: <19990102205553.G66110@freebie.lemis.com>; from Greg Lehey on Sat, Jan 02, 1999 at 08:55:53PM %2B1030
References:  <19990102105138.35033@cicely.de> <19990102205553.G66110@freebie.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jan 02, 1999 at 08:55:53PM +1030, Greg Lehey wrote:
> On Saturday,  2 January 1999 at 10:51:38 +0100, Bernd Walter wrote:
> >
> > I have had one of my hosts crashed sometime.
> > Today I got a crash after setting logs to another volume:
> >
> > Jan  2 03:30:16 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32
> > Jan  2 03:30:18 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31
> > Jan  2 03:30:32 cicely7 syslogd: /var/log/messages: Input/output error
> > Jan  2 03:30:32 cicely7 syslogd: /var/log/all.log: Input/output error
> > Jan  2 03:30:32 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
> > Jan  2 03:30:32 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x0
> > Jan  2 03:30:32 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 1 SCBs aborted
> > Jan  2 03:30:32 cicely7 /kernel: vinum: subdisk var.p0.s0 is crashed
> > Jan  2 03:30:32 cicely7 /kernel: vinum: plex var.p0 is degraded
> > Jan  2 03:30:32 cicely7 /kernel: vinum: subdisk var.p0.s0 is stale
> > Jan  2 03:30:32 cicely7 /kernel: vinum: plex var.p0 is down
> > Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32
> > Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31
> > Jan  2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
> > Jan  2 03:30:48 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40
> > Jan  2 03:30:48 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 31 SCBs aborted
> > Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32
> > Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31
> > Jan  2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
> > Jan  2 03:30:48 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40
> > Jan  2 03:30:48 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 31 SCBs aborted
> > Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32
> > Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31
> > Jan  2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
> > Jan  2 03:30:48 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40
> > Jan  2 03:30:48 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 31 SCBs aborted
> > Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32
> > Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31
> > Jan  2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
> > Jan  2 03:30:49 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x0, SEQ_FLAGS == 0x40
> > Jan  2 03:30:49 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 16 SCBs aborted
> > Jan  2 03:31:04 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32
> > Jan  2 03:31:04 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
> > Jan  2 03:31:04 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40
> > and so on ...
> >
> > As you can see the host was not realy crashed but unuseable after it
> > happened.  The Problem with da0:ahc0:0:1:0 happens every time the
> > tagged openings are increased The side effect is that I'm now
> > running /var on a vinum volume on da1 and da2 which are drives on
> > the same channel and it looks like the bdr or anything between the
> > tag increase and the bdr is the reason for the subdisk crash.
> 
> On the face of it, of course, this is a SCSI problem, not a Vinum
> problem.  Vinum reacted correctly to the error (this time :-).  But
Which error?

> we've seen a surprising number of this kind of problem in connection
> with Vinum, and I think the reason is that Vinum tickles otherwise
> unseen hardware problems in SCSI chains.  It's quite common for Vinum
The SCSI-Chain is OK and even the power suply :)

> to issue a series of I/O commands on a number of devices on a chain
> (for example, with striped or RAID-5 volumes which require accessing
What I mean is that da0 has problems with the tag increasing - they are
directly after it each time - looks much like often discussed Firmware problems.
No problems without increasing and no problems after.
That's point one and there's no doubt about the reason.

da1 and da2 are on the same chain and hold /var so they are accesed to write logs.
No reason for errors on da1 and da2 since only a BDR happend.
No errors on da1 and da2 are logged.
vinum marks var.p0.s0 as crashed without any logged error from one of the drives.

> several drives at a time for a single user request).  You might like
> to set debug flag 1:
Not good on this host - logfiles are writen to a vinum drive, because
the da0 don't write very well in the error condition and all other disks a
striped

> 
> vinum -> debug 1
> 
> This will log to syslogd details of all transfers; in combination with
> the log you show, it might help the SCSI guys figure out where things
> are happening.  But I have a suspicion that the real problem is
> hardware (less than perfect SCSI chain, for whatever reason) rather
> than software.
> 

-- 
  B.Walter


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990102160829.32955>