Date: Sat, 2 Jan 1999 20:55:53 +1030 From: Greg Lehey <grog@lemis.com> To: Bernd Walter <ticso@cicely.de>, freebsd-scsi@FreeBSD.ORG Subject: Re: new Quirk candidate and vinum behavour Message-ID: <19990102205553.G66110@freebie.lemis.com> In-Reply-To: <19990102105138.35033@cicely.de>; from Bernd Walter on Sat, Jan 02, 1999 at 10:51:38AM %2B0100 References: <19990102105138.35033@cicely.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday, 2 January 1999 at 10:51:38 +0100, Bernd Walter wrote: > > I have had one of my hosts crashed sometime. > Today I got a crash after setting logs to another volume: > > Jan 2 03:30:16 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32 > Jan 2 03:30:18 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31 > Jan 2 03:30:32 cicely7 syslogd: /var/log/messages: Input/output error > Jan 2 03:30:32 cicely7 syslogd: /var/log/all.log: Input/output error > Jan 2 03:30:32 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > Jan 2 03:30:32 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x0 > Jan 2 03:30:32 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 1 SCBs aborted > Jan 2 03:30:32 cicely7 /kernel: vinum: subdisk var.p0.s0 is crashed > Jan 2 03:30:32 cicely7 /kernel: vinum: plex var.p0 is degraded > Jan 2 03:30:32 cicely7 /kernel: vinum: subdisk var.p0.s0 is stale > Jan 2 03:30:32 cicely7 /kernel: vinum: plex var.p0 is down > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32 > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31 > Jan 2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > Jan 2 03:30:48 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40 > Jan 2 03:30:48 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 31 SCBs aborted > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32 > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31 > Jan 2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > Jan 2 03:30:48 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40 > Jan 2 03:30:48 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 31 SCBs aborted > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32 > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31 > Jan 2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > Jan 2 03:30:48 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40 > Jan 2 03:30:48 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 31 SCBs aborted > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32 > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31 > Jan 2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > Jan 2 03:30:49 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x0, SEQ_FLAGS == 0x40 > Jan 2 03:30:49 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 16 SCBs aborted > Jan 2 03:31:04 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32 > Jan 2 03:31:04 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > Jan 2 03:31:04 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40 > and so on ... > > As you can see the host was not realy crashed but unuseable after it > happened. The Problem with da0:ahc0:0:1:0 happens every time the > tagged openings are increased The side effect is that I'm now > running /var on a vinum volume on da1 and da2 which are drives on > the same channel and it looks like the bdr or anything between the > tag increase and the bdr is the reason for the subdisk crash. On the face of it, of course, this is a SCSI problem, not a Vinum problem. Vinum reacted correctly to the error (this time :-). But we've seen a surprising number of this kind of problem in connection with Vinum, and I think the reason is that Vinum tickles otherwise unseen hardware problems in SCSI chains. It's quite common for Vinum to issue a series of I/O commands on a number of devices on a chain (for example, with striped or RAID-5 volumes which require accessing several drives at a time for a single user request). You might like to set debug flag 1: vinum -> debug 1 This will log to syslogd details of all transfers; in combination with the log you show, it might help the SCSI guys figure out where things are happening. But I have a suspicion that the real problem is hardware (less than perfect SCSI chain, for whatever reason) rather than software. Greg -- See complete headers for address, home page and phone numbers finger grog@lemis.com for PGP public key To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990102205553.G66110>