From owner-freebsd-scsi Sat Jan 2 02:26:48 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id CAA02939 for freebsd-scsi-outgoing; Sat, 2 Jan 1999 02:26:48 -0800 (PST) (envelope-from owner-freebsd-scsi@FreeBSD.ORG) Received: from allegro.lemis.com (allegro.lemis.com [192.109.197.134]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id CAA02929 for ; Sat, 2 Jan 1999 02:26:44 -0800 (PST) (envelope-from grog@freebie.lemis.com) Received: from freebie.lemis.com (freebie.lemis.com [192.109.197.137]) by allegro.lemis.com (8.9.1/8.9.0) with ESMTP id UAA22976; Sat, 2 Jan 1999 20:55:50 +1030 (CST) Received: (from grog@localhost) by freebie.lemis.com (8.9.1/8.9.0) id UAA66483; Sat, 2 Jan 1999 20:55:53 +1030 (CST) Message-ID: <19990102205553.G66110@freebie.lemis.com> Date: Sat, 2 Jan 1999 20:55:53 +1030 From: Greg Lehey To: Bernd Walter , freebsd-scsi@FreeBSD.ORG Subject: Re: new Quirk candidate and vinum behavour References: <19990102105138.35033@cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.91.1i In-Reply-To: <19990102105138.35033@cicely.de>; from Bernd Walter on Sat, Jan 02, 1999 at 10:51:38AM +0100 WWW-Home-Page: http://www.lemis.com/~grog Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-41-739-7062 Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Saturday, 2 January 1999 at 10:51:38 +0100, Bernd Walter wrote: > > I have had one of my hosts crashed sometime. > Today I got a crash after setting logs to another volume: > > Jan 2 03:30:16 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32 > Jan 2 03:30:18 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31 > Jan 2 03:30:32 cicely7 syslogd: /var/log/messages: Input/output error > Jan 2 03:30:32 cicely7 syslogd: /var/log/all.log: Input/output error > Jan 2 03:30:32 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > Jan 2 03:30:32 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x0 > Jan 2 03:30:32 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 1 SCBs aborted > Jan 2 03:30:32 cicely7 /kernel: vinum: subdisk var.p0.s0 is crashed > Jan 2 03:30:32 cicely7 /kernel: vinum: plex var.p0 is degraded > Jan 2 03:30:32 cicely7 /kernel: vinum: subdisk var.p0.s0 is stale > Jan 2 03:30:32 cicely7 /kernel: vinum: plex var.p0 is down > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32 > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31 > Jan 2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > Jan 2 03:30:48 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40 > Jan 2 03:30:48 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 31 SCBs aborted > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32 > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31 > Jan 2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > Jan 2 03:30:48 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40 > Jan 2 03:30:48 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 31 SCBs aborted > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32 > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31 > Jan 2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > Jan 2 03:30:48 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40 > Jan 2 03:30:48 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 31 SCBs aborted > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32 > Jan 2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31 > Jan 2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > Jan 2 03:30:49 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x0, SEQ_FLAGS == 0x40 > Jan 2 03:30:49 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 16 SCBs aborted > Jan 2 03:31:04 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32 > Jan 2 03:31:04 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > Jan 2 03:31:04 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40 > and so on ... > > As you can see the host was not realy crashed but unuseable after it > happened. The Problem with da0:ahc0:0:1:0 happens every time the > tagged openings are increased The side effect is that I'm now > running /var on a vinum volume on da1 and da2 which are drives on > the same channel and it looks like the bdr or anything between the > tag increase and the bdr is the reason for the subdisk crash. On the face of it, of course, this is a SCSI problem, not a Vinum problem. Vinum reacted correctly to the error (this time :-). But we've seen a surprising number of this kind of problem in connection with Vinum, and I think the reason is that Vinum tickles otherwise unseen hardware problems in SCSI chains. It's quite common for Vinum to issue a series of I/O commands on a number of devices on a chain (for example, with striped or RAID-5 volumes which require accessing several drives at a time for a single user request). You might like to set debug flag 1: vinum -> debug 1 This will log to syslogd details of all transfers; in combination with the log you show, it might help the SCSI guys figure out where things are happening. But I have a suspicion that the real problem is hardware (less than perfect SCSI chain, for whatever reason) rather than software. Greg -- See complete headers for address, home page and phone numbers finger grog@lemis.com for PGP public key To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message