From owner-freebsd-scsi@FreeBSD.ORG  Mon Jan  8 11:08:51 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@FreeBSD.org
Delivered-To: freebsd-scsi@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 625C316A505
	for <freebsd-scsi@FreeBSD.org>; Mon,  8 Jan 2007 11:08:51 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org [69.147.83.40])
	by mx1.freebsd.org (Postfix) with ESMTP id 4F6D313C45A
	for <freebsd-scsi@FreeBSD.org>; Mon,  8 Jan 2007 11:08:51 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (linimon@localhost [127.0.0.1])
	by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id l08B8ptY016610
	for <freebsd-scsi@FreeBSD.org>; Mon, 8 Jan 2007 11:08:51 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.13.4/8.13.4/Submit) id l08B8nrw016606
	for freebsd-scsi@FreeBSD.org; Mon, 8 Jan 2007 11:08:49 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 8 Jan 2007 11:08:49 GMT
Message-Id: <200701081108.l08B8nrw016606@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: linimon set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-scsi@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to you
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Jan 2007 11:08:51 -0000

Current FreeBSD problem reports
Critical problems
Serious problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/27059   scsi       [sym] SCSI subsystem hangs under heavy load on (Server
o kern/39388   scsi       ncr/sym drivers fail with 53c810 and more than 256MB m
o kern/40895   scsi       wierd kernel / device driver bug
o kern/52638   scsi       [panic] SCSI U320 on SMP server won't run faster than 
s kern/57398   scsi       [mly] Current fails to install on mly(4) based RAID di
o kern/60598   scsi       wire down of scsi devices conflicts with config
o kern/60641   scsi       [sym] Sporadic SCSI bus resets with 53C810 under load
s kern/61165   scsi       [panic] kernel page fault after calling cam_send_ccb
o kern/74627   scsi       [ahc] [hang] Adaptec 2940U2W Can't boot 5.3
o kern/81887   scsi       [aac] Adaptec SCSI 2130S aac0: GetDeviceProbeInfo comm
o kern/90282   scsi       [sym] SCSI bus resets cause loss of ch device
o kern/92798   scsi       [ahc] SCSI problem with timeouts
o kern/93128   scsi       [sym] FreeBSD 6.1 BETA 1 has problems with Symbios/LSI
o kern/94838   scsi       Kernel panic while mounting SD card with lock switch o
o kern/99954   scsi       [ahc] reading from DVD failes on 6.x (regression)

15 problems total.

Non-critical problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/23314   scsi       aic driver fails to detect Adaptec 1520B unless PnP is
o kern/35234   scsi       World access to /dev/pass? (for scanner) requires acce
o kern/38828   scsi       [feature request] DPT PM2012B/90 doesn't work
o kern/44587   scsi       dev/dpt/dpt.h is missing defines required for DPT_HAND
o kern/76178   scsi       [ahd] Problem with ahd and large SCSI Raid system
o kern/96133   scsi       [scsi] [patch] add scsi quirk for joyfly 128mb flash u
o kern/103702  scsi       [cam] [patch] ChipsBnk: Unsupported USB memory stick

7 problems total.


From owner-freebsd-scsi@FreeBSD.ORG  Tue Jan  9 07:32:04 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@FreeBSD.org
Delivered-To: freebsd-scsi@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id AEBA616A407;
	Tue,  9 Jan 2007 07:32:04 +0000 (UTC)
	(envelope-from danny@cs.huji.ac.il)
Received: from cs1.cs.huji.ac.il (cs1.cs.huji.ac.il [132.65.16.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 6A4C013C459;
	Tue,  9 Jan 2007 07:32:04 +0000 (UTC)
	(envelope-from danny@cs.huji.ac.il)
Received: from pampa.cs.huji.ac.il ([132.65.80.32])
	by cs1.cs.huji.ac.il with esmtp
	id 1H4B4I-0001eX-UC; Tue, 09 Jan 2007 09:06:46 +0200
X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.2
To: freebsd-scsi@FreeBSD.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Tue, 09 Jan 2007 09:06:46 +0200
From: Danny Braniss <danny@cs.huji.ac.il>
Message-ID: <E1H4B4I-0001eX-UC@cs1.cs.huji.ac.il>
Cc: freebsd-hackers@freebsd.org
Subject: iSCSI disconnects dilema
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jan 2007 07:32:04 -0000

Hi,
While I think I have almost solved the problem of network disconnects,
It downed on me a major problem:
When a 'local' disk crashes, the kernel will probably hang/panic/crash.
if i don't try to recover, then there is no change in the above scenario.
if i try to recover, then the client does not know that it should
umount/fsck/mount.
While all this seems familiar, removing  a floppy/disk-on-key while it's
mounted, we could always say "you shouldn't have done that!", with
a network connection, it can happen very often - rebooting the target, a
network hickup, etc.

So, any ideas?

	danny


From owner-freebsd-scsi@FreeBSD.ORG  Tue Jan  9 14:53:25 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 8683A16A412;
	Tue,  9 Jan 2007 14:53:25 +0000 (UTC)
	(envelope-from lists@jnielsen.net)
Received: from ns1.jnielsen.net (ns1.jnielsen.net [69.55.238.237])
	by mx1.freebsd.org (Postfix) with ESMTP id 4D25013C44C;
	Tue,  9 Jan 2007 14:53:25 +0000 (UTC)
	(envelope-from lists@jnielsen.net)
Received: from localhost (jn@ns1 [69.55.238.237]) (authenticated bits=0)
	by ns1.jnielsen.net (8.12.9p2/8.12.9) with ESMTP id l09EY44o042517;
	Tue, 9 Jan 2007 06:34:05 -0800 (PST)
	(envelope-from lists@jnielsen.net)
From: John Nielsen <lists@jnielsen.net>
To: freebsd-hackers@freebsd.org
Date: Tue, 9 Jan 2007 09:31:28 -0500
User-Agent: KMail/1.9.5
References: <E1H4B4I-0001eX-UC@cs1.cs.huji.ac.il>
In-Reply-To: <E1H4B4I-0001eX-UC@cs1.cs.huji.ac.il>
X-Face: #X5#Y*q>F:]zT!DegL3z5Xo'^MN[$8k\[4^3rN~wm=s=Uw(sW}R?3b^*f1Wu*.<=?utf-8?q?of=5F4NrS=0A=09P*M/9CpxDo!D6?=)IY1w<9B1jB;
	tBQf[RU-R<,I)e"$q7N7
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200701090931.28786.lists@jnielsen.net>
X-Virus-Scanned: ClamAV version 0.88.4,
	clamav-milter version 0.88.4 on ns1.jnielsen.net
X-Virus-Status: Clean
Cc: freebsd-scsi@freebsd.org
Subject: Re: iSCSI disconnects dilema
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jan 2007 14:53:25 -0000

On Tuesday 09 January 2007 02:06, Danny Braniss wrote:
> Hi,
> While I think I have almost solved the problem of network disconnects,
> It downed on me a major problem:
> When a 'local' disk crashes, the kernel will probably hang/panic/crash.
> if i don't try to recover, then there is no change in the above scenario.
> if i try to recover, then the client does not know that it should
> umount/fsck/mount.
> While all this seems familiar, removing  a floppy/disk-on-key while it's
> mounted, we could always say "you shouldn't have done that!", with
> a network connection, it can happen very often - rebooting the target, a
> network hickup, etc.
>
> So, any ideas?

I think that an iSCSI network disconnect (if handled properly) is more like a 
bad/flakey set of sectors and/or extremely high latency than a total disk 
crash. The initiator should stall as long as it can while trying to reconnect 
the session, and then send "hardware" timeout errors up the stack. The the 
rest of the OS should handle those the same as it would any other timeout 
errors--retry a certain number of times and then fail. I don't know how 
graceful the failure case is (perhaps not very), but it's an honest 
approximation.

The above approach is IMO more than adequate for network interruptions lasting 
a few seconds (or a bit more). I'm not sure there's anything you can 
realistically do more than that. Administrators who intentionally reboot a 
nonredundant iSCSI target while it has active sessions are asking for 
trouble, and if the reboot is accidental they should do one or more of a) 
know to run fsck manually, b) get a better UPS, c) get a more 
stable/redundant iSCSI target device.

Disclaimer: I know next to nothing about kernel programming, device driver 
development, or scsi in general. I've just been playing with and thinking 
about iSCSI on FreeBSD a fair amount lately. Thanks for your continued work 
on this.

JN

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jan  9 16:38:52 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@FreeBSD.ORG
Delivered-To: freebsd-scsi@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 63C6916A40F
	for <freebsd-scsi@FreeBSD.ORG>; Tue,  9 Jan 2007 16:38:52 +0000 (UTC)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (lurza.secnetix.de [83.120.8.8])
	by mx1.freebsd.org (Postfix) with ESMTP id CBF3E13C448
	for <freebsd-scsi@FreeBSD.ORG>; Tue,  9 Jan 2007 16:38:51 +0000 (UTC)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (uvqlwx@localhost [127.0.0.1])
	by lurza.secnetix.de (8.13.4/8.13.4) with ESMTP id l09GGUZT020582;
	Tue, 9 Jan 2007 17:16:35 +0100 (CET)
	(envelope-from oliver.fromme@secnetix.de)
Received: (from olli@localhost)
	by lurza.secnetix.de (8.13.4/8.13.1/Submit) id l09GGTJu020581;
	Tue, 9 Jan 2007 17:16:29 +0100 (CET) (envelope-from olli)
Date: Tue, 9 Jan 2007 17:16:29 +0100 (CET)
Message-Id: <200701091616.l09GGTJu020581@lurza.secnetix.de>
From: Oliver Fromme <olli@lurza.secnetix.de>
To: freebsd-hackers@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG, danny@cs.huji.ac.il
In-Reply-To: <E1H4B4I-0001eX-UC@cs1.cs.huji.ac.il>
X-Newsgroups: list.freebsd-hackers
User-Agent: tin/1.8.2-20060425 ("Shillay") (UNIX) (FreeBSD/4.11-STABLE (i386))
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2
	(lurza.secnetix.de [127.0.0.1]);
	Tue, 09 Jan 2007 17:16:35 +0100 (CET)
Cc: 
Subject: Re: iSCSI disconnects dilema
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: freebsd-hackers@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG,
	danny@cs.huji.ac.il
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jan 2007 16:38:52 -0000

Danny Braniss wrote:
 > While I think I have almost solved the problem of network disconnects,
 > It downed on me a major problem:
 > When a 'local' disk crashes, the kernel will probably hang/panic/crash.
 > if i don't try to recover, then there is no change in the above scenario.
 > if i try to recover, then the client does not know that it should
 > umount/fsck/mount.
 > While all this seems familiar, removing  a floppy/disk-on-key while it's
 > mounted, we could always say "you shouldn't have done that!", with
 > a network connection, it can happen very often - rebooting the target, a
 > network hickup, etc.

The IEEE1394 code (firewire) contains a hack so you can
remove a _mounted_ drive (yes, pull the plug!) and later
reconnect it and continue to use the filesystem.  I think
processes that try to access the file system during the
drive being unavailable are blocked ("D" state a.k.a.
"diskwait").  The purpose of that feature is that you can
change the topology (e.g. remove a device that's not at
the end of the bus) without having to unmount all other
devices.

Well, it's just a hack, and I don't know if something
similar is applicable to the iSCSI situation.  But I
thought it wouldn't hurt to mention it anyhow.

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"If you think C++ is not overly complicated, just what is a protected
abstract virtual base pure virtual private destructor, and when was the
last time you needed one?"
        -- Tom Cargil, C++ Journal

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jan  9 17:05:19 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 2369516A494;
	Tue,  9 Jan 2007 17:05:19 +0000 (UTC)
	(envelope-from lists@jnielsen.net)
Received: from ns1.jnielsen.net (ns1.jnielsen.net [69.55.238.237])
	by mx1.freebsd.org (Postfix) with ESMTP id 04A7B13C4A6;
	Tue,  9 Jan 2007 17:05:18 +0000 (UTC)
	(envelope-from lists@jnielsen.net)
Received: from localhost (jn@ns1 [69.55.238.237]) (authenticated bits=0)
	by ns1.jnielsen.net (8.12.9p2/8.12.9) with ESMTP id l09H574o019218;
	Tue, 9 Jan 2007 09:05:07 -0800 (PST)
	(envelope-from lists@jnielsen.net)
From: John Nielsen <lists@jnielsen.net>
To: freebsd-hackers@freebsd.org
Date: Tue, 9 Jan 2007 12:02:31 -0500
User-Agent: KMail/1.9.5
References: <E1H4B4I-0001eX-UC@cs1.cs.huji.ac.il>
In-Reply-To: <E1H4B4I-0001eX-UC@cs1.cs.huji.ac.il>
X-Face: #X5#Y*q>F:]zT!DegL3z5Xo'^MN[$8k\[4^3rN~wm=s=Uw(sW}R?3b^*f1Wu*.<=?utf-8?q?of=5F4NrS=0A=09P*M/9CpxDo!D6?=)IY1w<9B1jB;
	tBQf[RU-R<,I)e"$q7N7
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200701091202.32226.lists@jnielsen.net>
X-Virus-Scanned: ClamAV version 0.88.4,
	clamav-milter version 0.88.4 on ns1.jnielsen.net
X-Virus-Status: Clean
Cc: freebsd-scsi@freebsd.org, Dan Nelson <dnelson@allantgroup.com>
Subject: Re: iSCSI disconnects dilema
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jan 2007 17:05:19 -0000

Forwarding a relevant comment from a parallel discussion on -questions.

----------  Forwarded Message  ----------

Subject: Re: iSCSI
Date: Tuesday 09 January 2007 11:35
From: Dan Nelson <dnelson@allantgroup.com>
To: DAve <dave.list@pixelhammer.com>
Cc: Free BSD Questions list <freebsd-questions@freebsd.org>

In the last episode (Jan 09), DAve said:
> The developers response, for those who are interested.
>
> hi Dave,
> 	the initiator for iSCSI will hit stable/current real soon now.
> that was the good news, now for the down side:
> what was missing all along was recovery from network disconnects, so
> while I think I have it almost worked out, I've come across a major
> flow in the iscsi design:
> 	when the targets crashes, and comes back, there is no way
> to tell the client to run an fsck. This is not a problem if the
> client is mounting the iscsi partition read only.
>
> 	danny

Why should the client need to do an fsck?  From its point of view it
should just look like the target had the iSCSI equivalent of a bus
reset.  It should resend any queued requests and continue.


On Tuesday 09 January 2007 02:06, Danny Braniss wrote:
> Hi,
> While I think I have almost solved the problem of network disconnects,
> It downed on me a major problem:
> When a 'local' disk crashes, the kernel will probably hang/panic/crash.
> if i don't try to recover, then there is no change in the above scenario.
> if i try to recover, then the client does not know that it should
> umount/fsck/mount.
> While all this seems familiar, removing  a floppy/disk-on-key while it's
> mounted, we could always say "you shouldn't have done that!", with
> a network connection, it can happen very often - rebooting the target, a
> network hickup, etc.

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jan  9 21:08:26 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id C6D3E16A407
	for <freebsd-scsi@freebsd.org>; Tue,  9 Jan 2007 21:08:26 +0000 (UTC)
	(envelope-from freebsd-scsi@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 75ACB13C46B
	for <freebsd-scsi@freebsd.org>; Tue,  9 Jan 2007 21:08:26 +0000 (UTC)
	(envelope-from freebsd-scsi@m.gmane.org)
Received: from root by ciao.gmane.org with local (Exim 4.43)
	id 1H4NIK-000139-Kh
	for freebsd-scsi@freebsd.org; Tue, 09 Jan 2007 21:10:04 +0100
Received: from 89-172-49-221.adsl.net.t-com.hr ([89.172.49.221])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-scsi@freebsd.org>; Tue, 09 Jan 2007 21:10:04 +0100
Received: from ivoras by 89-172-49-221.adsl.net.t-com.hr with local (Gmexim
	0.1 (Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-scsi@freebsd.org>; Tue, 09 Jan 2007 21:10:04 +0100
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-scsi@freebsd.org
From: Ivan Voras <ivoras@fer.hr>
Date: Tue, 09 Jan 2007 21:04:28 +0100
Lines: 28
Message-ID: <eo0sgk$tm5$1@sea.gmane.org>
References: <E1H4B4I-0001eX-UC@cs1.cs.huji.ac.il>
	<200701090931.28786.lists@jnielsen.net>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enigB4F4BA0A9F0163D722FD25B5"
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: 89-172-49-221.adsl.net.t-com.hr
User-Agent: Thunderbird 1.5.0.9 (Windows/20061207)
In-Reply-To: <200701090931.28786.lists@jnielsen.net>
X-Enigmail-Version: 0.94.1.2
Sender: news <news@sea.gmane.org>
Cc: freebsd-hackers@freebsd.org
Subject: Re: iSCSI disconnects dilema
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jan 2007 21:08:26 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigB4F4BA0A9F0163D722FD25B5
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

John Nielsen wrote:

> I don't know how=20
> graceful the failure case is (perhaps not very)...

Not at all - removing a mounted USB device panics the kernel.


--------------enigB4F4BA0A9F0163D722FD25B5
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFo/VSldnAQVacBcgRAmDJAJ994m1Rk2FiPv/HC3jrJlgd8IkyfACfTqQV
Qao+ofnehodBCORsIFDE5qM=
=SSFc
-----END PGP SIGNATURE-----

--------------enigB4F4BA0A9F0163D722FD25B5--


From owner-freebsd-scsi@FreeBSD.ORG  Wed Jan 10 13:42:16 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 5B48316A403
	for <freebsd-scsi@freebsd.org>; Wed, 10 Jan 2007 13:42:16 +0000 (UTC)
	(envelope-from cstdenis@ctgameinfo.com)
Received: from luna.ctgameinfo.com (luna.ctgameinfo.com [65.110.52.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 2FF6F13C457
	for <freebsd-scsi@freebsd.org>; Wed, 10 Jan 2007 13:42:16 +0000 (UTC)
	(envelope-from cstdenis@ctgameinfo.com)
Received: from [192.168.1.100] (S01060016b606ed02.vc.shawcable.net
	[24.87.22.207]) (AUTH: LOGIN chris@ctgameinfo.com)
	by luna.ctgameinfo.com with esmtp; Wed, 10 Jan 2007 05:02:38 -0800
	id 00078C19.45A4E3EF.00015860
Message-ID: <45A4E3AD.1040600@ctgameinfo.com>
Date: Wed, 10 Jan 2007 05:01:33 -0800
From: Cstdenis <cstdenis@ctgameinfo.com>
User-Agent: Thunderbird 1.5.0.9 (Windows/20061207)
MIME-Version: 1.0
To: freebsd-scsi@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Bug in aac?
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Jan 2007 13:42:16 -0000

I am running 6.1-p11 with a Adaptec SAS RAID 4800SAS running a mirror of 
2 15k rpm SCSI drives.

Under heavy IO load (Its a database server) I get the following 
accompanied by serious system lag:

Jan  9 20:52:42 ayu kernel: aac0: COMMAND 0xc8f56f80 TIMEOUT AFTER 34 
SECONDS
Jan  9 20:52:42 ayu kernel: aac0: COMMAND 0xc8f54f00 TIMEOUT AFTER 34 
SECONDS
Jan  9 20:52:42 ayu kernel: aac0: COMMAND 0xc8f56b40 TIMEOUT AFTER 34 
SECONDS
Jan  9 20:52:42 ayu kernel: aac0: COMMAND 0xc
Jan  9 20:52:43 ayu kernel: 8f56740 TIMEOUT AFTER 34 SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f57640 TIMEOUT AFTER 34 
SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f58440 TIMEOUT AFTER 34 
SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f57bc0 TIMEOUT AFTER 34 
SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f59b40 TIMEOUT AFTER 35 
SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f57c80 TIMEOUT AFTER 35 
SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f59600 TIMEOUT AFTER 35 
SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f5a0c0 TIMEOUT AFTER 35 
SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f58c00 TIMEOUT AFTER 35 
SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f55f40 TIMEOUT AFTER 35 
SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f580c0 TIMEOUT AFTER 35 
SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f53280 TIMEOUT AFTER 35 
SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f58bc0 TIMEOUT AFTER 35 
SECONDS
Jan  9 20:52:43 ayu kernel: aac0: COMMAND 0xc8f59940 TIMEOUT AFTER 35 
SECONDS

(hex after command and number of seconds varies)

Excerpts from dmesg
-------------------

FreeBSD 6.1-RELEASE-p11 #0: Wed Jan  3 19:06:12 CST 2007
    root@ayu.ctgameinfo.com:/usr/obj/usr/src/sys/AYU
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU            3060  @ 2.40GHz (2394.01-MHz 
686-class CPU)
  Origin = "GenuineIntel"  Id = 0x6f6  Stepping = 6
  
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0xe3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,<b9>,CX16,<b14>,<b15>>
  AMD Features=0x20100000<NX,LM>
  AMD Features2=0x1<LAHF>
  Cores per package: 2
real memory  = 3622699008 (3454 MB)
avail memory = 3545722880 (3381 MB)
<snip>
aac0: <Adaptec SAS RAID 4800SAS> mem 
0xd8400000-0xd85fffff,0xd8200000-0xd83fffff,0xe0000000-0xe7ffffff irq 26 
at device 14.0 on pci11
aac0: New comm. interface enabled
aac0: Adaptec Raid Controller 2.0.0-1
<snip>
aacd0: <RAID 1 (Mirror)> on aac0
aacd0: 69988MB (143335424 sectors)


The problem happens a few times a day each time lasting only a matter of 
minutes.


I searched the mailing lists for other having this problem, but all I 
found were older ones from early 5.x that are supposed to be fixed now.

From owner-freebsd-scsi@FreeBSD.ORG  Thu Jan 11 16:27:46 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id A623316A403
	for <freebsd-scsi@freebsd.org>; Thu, 11 Jan 2007 16:27:46 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.freebsd.org (Postfix) with ESMTP id 6B81613C441
	for <freebsd-scsi@freebsd.org>; Thu, 11 Jan 2007 16:27:46 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11])
	(authenticated bits=0)
	by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id l0BGRblN021358;
	Thu, 11 Jan 2007 09:27:43 -0700 (MST)
	(envelope-from scottl@samsco.org)
Message-ID: <45A66576.9010106@samsco.org>
Date: Thu, 11 Jan 2007 08:27:34 -0800
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US;
	rv:1.8.1.2pre) Gecko/20061227 SeaMonkey/1.1
MIME-Version: 1.0
To: Cstdenis <cstdenis@ctgameinfo.com>
References: <45A4E3AD.1040600@ctgameinfo.com>
In-Reply-To: <45A4E3AD.1040600@ctgameinfo.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (pooker.samsco.org [168.103.85.57]);
	Thu, 11 Jan 2007 09:27:43 -0700 (MST)
X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed 
	version=3.1.1
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org
Cc: freebsd-scsi@freebsd.org
Subject: Re: Bug in aac?
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jan 2007 16:27:46 -0000

Cstdenis wrote:
> I am running 6.1-p11 with a Adaptec SAS RAID 4800SAS running a mirror of
> 2 15k rpm SCSI drives.
> 
> Under heavy IO load (Its a database server) I get the following
> accompanied by serious system lag:
> 

The system recovers after this?  Strange.  What the messages mean is
that I/O has been sent to the controller, and the controller has not
responded in a reasonable period of time.  Usually this is a sign of
the controller has died and will not recover.  So if it is recovering
then either there is a firmware bug that is making the controller pause
for a long period of time, or there is some sort of yet-undiscovered
driver bug.  Of course you should make sure that you're running the
latest firmware from Adaptec.  These cards are new and SAS in general
is relatively new, so bugs are not unlikely.  One question, though, how
are you running SCSI drives on a SAS controller?  Are you going through
some sort of converter?

Scott

From owner-freebsd-scsi@FreeBSD.ORG  Thu Jan 11 18:10:09 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id BF5B616A412
	for <freebsd-scsi@freebsd.org>; Thu, 11 Jan 2007 18:10:09 +0000 (UTC)
	(envelope-from cstdenis@ctgameinfo.com)
Received: from luna.ctgameinfo.com (luna.ctgameinfo.com [65.110.52.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 7176013C46A
	for <freebsd-scsi@freebsd.org>; Thu, 11 Jan 2007 18:10:09 +0000 (UTC)
	(envelope-from cstdenis@ctgameinfo.com)
Received: from [192.168.1.100] (S01060016b606ed02.vc.shawcable.net
	[24.87.22.207]) (AUTH: LOGIN chris@ctgameinfo.com)
	by luna.ctgameinfo.com with esmtp; Thu, 11 Jan 2007 10:10:09 -0800
	id 00078C7F.45A67D81.0001195F
Message-ID: <45A67D56.1080706@ctgameinfo.com>
Date: Thu, 11 Jan 2007 10:09:26 -0800
From: Cstdenis <cstdenis@ctgameinfo.com>
User-Agent: Thunderbird 1.5.0.9 (Windows/20061207)
MIME-Version: 1.0
To: Scott Long <scottl@samsco.org>
References: <45A4E3AD.1040600@ctgameinfo.com> <45A66576.9010106@samsco.org>
In-Reply-To: <45A66576.9010106@samsco.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-scsi@freebsd.org
Subject: Re: Bug in aac?
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jan 2007 18:10:10 -0000

Yes the system does recover. I have not been actively using the system 
when this happens so I'm not sure how long, but it looks like a few to 
several minutes.

Its a dedicated server at a hosting company -- I don't have physical 
access to the hardware so I don't know the exact details. The info I 
gave was a combination of dmesg and what I ordered.

Here is what the web control panel says I have

    Motherboard    SuperMicro PDSMI+ Intel Pentium DualCore SingleProc 
Sata [1Proc]
       
    Processor    Intel Xeon 3060-Dual Core [2.4GHz]
       
    Drive Controller    Adaptec 4800SAS SA-SCSI RAID-1 Controller 
Available upgrades
       
    Hard Drive 1    Fujitsu MAX3073 SAS 3073 [73GB] Available upgrades
    Hard Drive 2    Fujitsu MAX3073 SAS 3073 [73GB] Available upgrades


I will try requesting a firmware upgrade. If that doesn't work is there 
more information I can provide to help get the bug fixed? I tried 
compiling AAC_DEBUG=3 into the kernel but it made the system unusable 
with the constant flow of debug data. I worry that AAC_DEBUG=1 will also 
be too much for the system to be usable, but I'm not sure.


Scott Long wrote:
> Cstdenis wrote:
>   
>> I am running 6.1-p11 with a Adaptec SAS RAID 4800SAS running a mirror of
>> 2 15k rpm SCSI drives.
>>
>> Under heavy IO load (Its a database server) I get the following
>> accompanied by serious system lag:
>>
>>     
>
> The system recovers after this?  Strange.  What the messages mean is
> that I/O has been sent to the controller, and the controller has not
> responded in a reasonable period of time.  Usually this is a sign of
> the controller has died and will not recover.  So if it is recovering
> then either there is a firmware bug that is making the controller pause
> for a long period of time, or there is some sort of yet-undiscovered
> driver bug.  Of course you should make sure that you're running the
> latest firmware from Adaptec.  These cards are new and SAS in general
> is relatively new, so bugs are not unlikely.  One question, though, how
> are you running SCSI drives on a SAS controller?  Are you going through
> some sort of converter?
>
> Scott
>   


From owner-freebsd-scsi@FreeBSD.ORG  Fri Jan 12 19:25:19 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@FreeBSD.org
Delivered-To: freebsd-scsi@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id B646616A47B;
	Fri, 12 Jan 2007 19:25:19 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl
	[83.17.198.132])
	by mx1.freebsd.org (Postfix) with ESMTP id 2A6EA13C461;
	Fri, 12 Jan 2007 19:25:19 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id 5161948808; Fri, 12 Jan 2007 20:03:27 +0100 (CET)
Received: from localhost (154.81.datacomsa.pl [195.34.81.154])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id C7EE6487F0;
	Fri, 12 Jan 2007 20:03:20 +0100 (CET)
Date: Fri, 12 Jan 2007 20:02:49 +0100
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Danny Braniss <danny@cs.huji.ac.il>
Message-ID: <20070112190249.GB90718@garage.freebsd.pl>
References: <E1H4B4I-0001eX-UC@cs1.cs.huji.ac.il>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="s/l3CgOIzMHHjg/5"
Content-Disposition: inline
In-Reply-To: <E1H4B4I-0001eX-UC@cs1.cs.huji.ac.il>
X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc
X-OS: FreeBSD 7.0-CURRENT i386
User-Agent: mutt-ng/devel-r804 (FreeBSD)
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham 
	version=3.0.4
Cc: freebsd-scsi@FreeBSD.org, freebsd-hackers@freebsd.org
Subject: Re: iSCSI disconnects dilema
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Jan 2007 19:25:19 -0000


--s/l3CgOIzMHHjg/5
Content-Type: text/plain; charset=iso-8859-2
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 09, 2007 at 09:06:46AM +0200, Danny Braniss wrote:
> Hi,
> While I think I have almost solved the problem of network disconnects,
> It downed on me a major problem:
> When a 'local' disk crashes, the kernel will probably hang/panic/crash.
> if i don't try to recover, then there is no change in the above scenario.
> if i try to recover, then the client does not know that it should
> umount/fsck/mount.
> While all this seems familiar, removing  a floppy/disk-on-key while it's
> mounted, we could always say "you shouldn't have done that!", with
> a network connection, it can happen very often - rebooting the target, a
> network hickup, etc.
>=20
> So, any ideas?

In my opinion it should be done this way:

You have a queue of I/O requests. You send the to the other end and wait
for confirmation. Until confirmation is received, you keep the requests
queued. If the other end dies, you try to reconnect (until some timeout
expires, the processes which send those requests will just wait), if you
reconnect successfully, you resend not-confirmed requests, if you won't
be able to reconnect, you just pass the errors up.

This is what I did in ggate and it seems to work.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--s/l3CgOIzMHHjg/5
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFFp9tZForvXbEpPzQRAv4EAKD3CxdlCygVo4AgET/J5bD8XZM4dgCgpmCV
FUgOAZDi82SVgQSFXu+PqTY=
=BHwP
-----END PGP SIGNATURE-----

--s/l3CgOIzMHHjg/5--

From owner-freebsd-scsi@FreeBSD.ORG  Fri Jan 12 19:31:06 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@FreeBSD.org
Delivered-To: freebsd-scsi@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id D6FB016A47E;
	Fri, 12 Jan 2007 19:31:06 +0000 (UTC)
	(envelope-from danny@cs.huji.ac.il)
Received: from cs1.cs.huji.ac.il (cs1.cs.huji.ac.il [132.65.16.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 90A3713C480;
	Fri, 12 Jan 2007 19:31:06 +0000 (UTC)
	(envelope-from danny@cs.huji.ac.il)
Received: from pampa.cs.huji.ac.il ([132.65.80.32])
	by cs1.cs.huji.ac.il with esmtp
	id 1H5S7E-000BS0-RR; Fri, 12 Jan 2007 21:31:04 +0200
X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.2
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
In-reply-to: Your message of Fri, 12 Jan 2007 20:02:49 +0100 .
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 12 Jan 2007 21:31:04 +0200
From: Danny Braniss <danny@cs.huji.ac.il>
Message-ID: <E1H5S7E-000BS0-RR@cs1.cs.huji.ac.il>
Cc: freebsd-scsi@FreeBSD.org, freebsd-hackers@freebsd.org
Subject: Re: iSCSI disconnects dilema 
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Jan 2007 19:31:06 -0000

> 
> --s/l3CgOIzMHHjg/5
> Content-Type: text/plain; charset=iso-8859-2
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
> 
> On Tue, Jan 09, 2007 at 09:06:46AM +0200, Danny Braniss wrote:
> > Hi,
> > While I think I have almost solved the problem of network disconnects,
> > It downed on me a major problem:
> > When a 'local' disk crashes, the kernel will probably hang/panic/crash.
> > if i don't try to recover, then there is no change in the above scenario.
> > if i try to recover, then the client does not know that it should
> > umount/fsck/mount.
> > While all this seems familiar, removing  a floppy/disk-on-key while it's
> > mounted, we could always say "you shouldn't have done that!", with
> > a network connection, it can happen very often - rebooting the target, a
> > network hickup, etc.
> >=20
> > So, any ideas?
> 
> In my opinion it should be done this way:
> 
> You have a queue of I/O requests. You send the to the other end and wait
> for confirmation. Until confirmation is received, you keep the requests
> queued. If the other end dies, you try to reconnect (until some timeout
> expires, the processes which send those requests will just wait), if you
> reconnect successfully, you resend not-confirmed requests, if you won't
> be able to reconnect, you just pass the errors up.
> 
> This is what I did in ggate and it seems to work.

That is basically what i'm doing - unacked request get requed.
the problem I fear (and maybe I'm paranoid :-):

assume the following scenario, the client(initiator) sends a write command,
the target acks it, then it crashes, if the write was never completed,
the initiator goes on as nothing ever happened. 

danny


From owner-freebsd-scsi@FreeBSD.ORG  Fri Jan 12 20:14:12 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 09EC516A415;
	Fri, 12 Jan 2007 20:14:12 +0000 (UTC)
	(envelope-from wb@freebie.xs4all.nl)
Received: from smtp-vbr16.xs4all.nl (smtp-vbr16.xs4all.nl [194.109.24.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 9654413C474;
	Fri, 12 Jan 2007 20:14:11 +0000 (UTC)
	(envelope-from wb@freebie.xs4all.nl)
Received: from freebie.xs4all.nl (freebie.xs4all.nl [213.84.32.253])
	by smtp-vbr16.xs4all.nl (8.13.8/8.13.8) with ESMTP id l0CJtow6022924;
	Fri, 12 Jan 2007 20:55:51 +0100 (CET)
	(envelope-from wb@freebie.xs4all.nl)
Received: from freebie.xs4all.nl (localhost [127.0.0.1])
	by freebie.xs4all.nl (8.13.8/8.13.3) with ESMTP id l0CJtoGl077324;
	Fri, 12 Jan 2007 20:55:50 +0100 (CET)
	(envelope-from wb@freebie.xs4all.nl)
Received: (from wb@localhost)
	by freebie.xs4all.nl (8.13.8/8.13.6/Submit) id l0CJto9I077323;
	Fri, 12 Jan 2007 20:55:50 +0100 (CET) (envelope-from wb)
Date: Fri, 12 Jan 2007 20:55:50 +0100
From: Wilko Bulte <wb@freebie.xs4all.nl>
To: Danny Braniss <danny@cs.huji.ac.il>
Message-ID: <20070112195549.GA77181@freebie.xs4all.nl>
References: <E1H5S7E-000BS0-RR@cs1.cs.huji.ac.il>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <E1H5S7E-000BS0-RR@cs1.cs.huji.ac.il>
User-Agent: Mutt/1.5.11
X-Virus-Scanned: by XS4ALL Virus Scanner
Cc: freebsd-scsi@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>,
	freebsd-hackers@freebsd.org
Subject: Re: iSCSI disconnects dilema
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Jan 2007 20:14:12 -0000

On Fri, Jan 12, 2007 at 09:31:04PM +0200, Danny Braniss wrote..
> > 
> > --s/l3CgOIzMHHjg/5
> > Content-Type: text/plain; charset=iso-8859-2
> > Content-Disposition: inline
> > Content-Transfer-Encoding: quoted-printable
> > 
> > On Tue, Jan 09, 2007 at 09:06:46AM +0200, Danny Braniss wrote:
> > > Hi,
> > > While I think I have almost solved the problem of network disconnects,
> > > It downed on me a major problem:
> > > When a 'local' disk crashes, the kernel will probably hang/panic/crash.
> > > if i don't try to recover, then there is no change in the above scenario.
> > > if i try to recover, then the client does not know that it should
> > > umount/fsck/mount.
> > > While all this seems familiar, removing  a floppy/disk-on-key while it's
> > > mounted, we could always say "you shouldn't have done that!", with
> > > a network connection, it can happen very often - rebooting the target, a
> > > network hickup, etc.
> > >=20
> > > So, any ideas?
> > 
> > In my opinion it should be done this way:
> > 
> > You have a queue of I/O requests. You send the to the other end and wait
> > for confirmation. Until confirmation is received, you keep the requests
> > queued. If the other end dies, you try to reconnect (until some timeout
> > expires, the processes which send those requests will just wait), if you
> > reconnect successfully, you resend not-confirmed requests, if you won't
> > be able to reconnect, you just pass the errors up.
> > 
> > This is what I did in ggate and it seems to work.
> 
> That is basically what i'm doing - unacked request get requed.
> the problem I fear (and maybe I'm paranoid :-):

Paranoia is a Good Thing(TM) in data storage land :-)

> assume the following scenario, the client(initiator) sends a write command,
> the target acks it, then it crashes, if the write was never completed,
> the initiator goes on as nothing ever happened. 

Yes, but what can the initiator do about that?  I mean, it does not have any
visibility of what the target has (or has not) done with the data.  '

This is roughly the same as a RAID box accepting a write into a writeback cache
and ACK-ing to the host.  You can only assume that the RAID box' cache
will get flushed to the spindles properly.  All the usual horror scenarios
with a broken battery backup of the cache and a powerfailure etc apply here.

Wilko

-- 
Wilko Bulte				wilko@FreeBSD.org

From owner-freebsd-scsi@FreeBSD.ORG  Fri Jan 12 20:59:50 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id D7FBB16A416
	for <freebsd-scsi@freebsd.org>; Fri, 12 Jan 2007 20:59:50 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.freebsd.org (Postfix) with ESMTP id 8FD9713C43E
	for <freebsd-scsi@freebsd.org>; Fri, 12 Jan 2007 20:59:50 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11])
	(authenticated bits=0)
	by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id l0CKxJb0031877;
	Fri, 12 Jan 2007 13:59:24 -0700 (MST)
	(envelope-from scottl@samsco.org)
Message-ID: <45A7F6A4.4030707@samsco.org>
Date: Fri, 12 Jan 2007 13:59:16 -0700
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US;
	rv:1.8.1.2pre) Gecko/20061227 SeaMonkey/1.1
MIME-Version: 1.0
To: Wilko Bulte <wb@freebie.xs4all.nl>
References: <E1H5S7E-000BS0-RR@cs1.cs.huji.ac.il>
	<20070112195549.GA77181@freebie.xs4all.nl>
In-Reply-To: <20070112195549.GA77181@freebie.xs4all.nl>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (pooker.samsco.org [168.103.85.57]);
	Fri, 12 Jan 2007 13:59:24 -0700 (MST)
X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed 
	version=3.1.1
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org
Cc: freebsd-scsi@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>,
	freebsd-hackers@freebsd.org
Subject: Re: iSCSI disconnects dilema
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Jan 2007 20:59:51 -0000

Wilko Bulte wrote:
> On Fri, Jan 12, 2007 at 09:31:04PM +0200, Danny Braniss wrote..
>>> --s/l3CgOIzMHHjg/5
>>> Content-Type: text/plain; charset=iso-8859-2
>>> Content-Disposition: inline
>>> Content-Transfer-Encoding: quoted-printable
>>>
>>> On Tue, Jan 09, 2007 at 09:06:46AM +0200, Danny Braniss wrote:
>>>> Hi,
>>>> While I think I have almost solved the problem of network disconnects,
>>>> It downed on me a major problem:
>>>> When a 'local' disk crashes, the kernel will probably hang/panic/crash.
>>>> if i don't try to recover, then there is no change in the above scenario.
>>>> if i try to recover, then the client does not know that it should
>>>> umount/fsck/mount.
>>>> While all this seems familiar, removing  a floppy/disk-on-key while it's
>>>> mounted, we could always say "you shouldn't have done that!", with
>>>> a network connection, it can happen very often - rebooting the target, a
>>>> network hickup, etc.
>>>> =20
>>>> So, any ideas?
>>> In my opinion it should be done this way:
>>>
>>> You have a queue of I/O requests. You send the to the other end and wait
>>> for confirmation. Until confirmation is received, you keep the requests
>>> queued. If the other end dies, you try to reconnect (until some timeout
>>> expires, the processes which send those requests will just wait), if you
>>> reconnect successfully, you resend not-confirmed requests, if you won't
>>> be able to reconnect, you just pass the errors up.
>>>
>>> This is what I did in ggate and it seems to work.
>> That is basically what i'm doing - unacked request get requed.
>> the problem I fear (and maybe I'm paranoid :-):
> 
> Paranoia is a Good Thing(TM) in data storage land :-)
> 
>> assume the following scenario, the client(initiator) sends a write command,
>> the target acks it, then it crashes, if the write was never completed,
>> the initiator goes on as nothing ever happened. 
> 
> Yes, but what can the initiator do about that?  I mean, it does not have any
> visibility of what the target has (or has not) done with the data.  '
> 
> This is roughly the same as a RAID box accepting a write into a writeback cache
> and ACK-ing to the host.  You can only assume that the RAID box' cache
> will get flushed to the spindles properly.  All the usual horror scenarios
> with a broken battery backup of the cache and a powerfailure etc apply here.
> 
> Wilko
> 

I forget, does iSCSI have a concept of a flush_cache command, or the
equivalent of what parallel SCSI does with ordered tags?  If so, then
that's how your app or OS knows that the transaction got committed to
stable storage.  It's been long assumed in the external storage world
that you are at the mercy of the external storage cache, so the problem
that Danny is referring to is nothing new.  The real question is how
to implement the equivalent mechanism that iSCSI provides in a way that
the OS/app can make use of it.  For example, CAM issues an ordered tag
periodically to flush the disk cache to stable storage.  Most storage
drivers, including CAM, will issue some sort of a flush_cache command to
the controller and media during system shutdown.

Scott


From owner-freebsd-scsi@FreeBSD.ORG  Sat Jan 13 10:13:57 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id EA30E16A412;
	Sat, 13 Jan 2007 10:13:57 +0000 (UTC)
	(envelope-from danny@cs.huji.ac.il)
Received: from cs1.cs.huji.ac.il (cs1.cs.huji.ac.il [132.65.16.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 782FE13C4DB;
	Sat, 13 Jan 2007 10:13:57 +0000 (UTC)
	(envelope-from danny@cs.huji.ac.il)
Received: from pampa.cs.huji.ac.il ([132.65.80.32])
	by cs1.cs.huji.ac.il with esmtp
	id 1H5ftb-000Okd-FO; Sat, 13 Jan 2007 12:13:55 +0200
X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.2
To: Scott Long <scottl@samsco.org>
In-reply-to: <45A7F6A4.4030707@samsco.org> 
References: <E1H5S7E-000BS0-RR@cs1.cs.huji.ac.il>
	<20070112195549.GA77181@freebie.xs4all.nl>
	<45A7F6A4.4030707@samsco.org>
Comments: In-reply-to Scott Long <scottl@samsco.org>
	message dated "Fri, 12 Jan 2007 13:59:16 -0700."
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Sat, 13 Jan 2007 12:13:55 +0200
From: Danny Braniss <danny@cs.huji.ac.il>
Message-ID: <E1H5ftb-000Okd-FO@cs1.cs.huji.ac.il>
Cc: Wilko Bulte <wb@freebie.xs4all.nl>, Pawel Jakub Dawidek <pjd@freebsd.org>,
	freebsd-hackers@freebsd.org, freebsd-scsi@freebsd.org
Subject: Re: iSCSI disconnects dilema 
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Jan 2007 10:13:58 -0000

> Wilko Bulte wrote:
> > On Fri, Jan 12, 2007 at 09:31:04PM +0200, Danny Braniss wrote..
> >>> --s/l3CgOIzMHHjg/5
> >>> Content-Type: text/plain; charset=iso-8859-2
> >>> Content-Disposition: inline
> >>> Content-Transfer-Encoding: quoted-printable
> >>>
> >>> On Tue, Jan 09, 2007 at 09:06:46AM +0200, Danny Braniss wrote:
> >>>> Hi,
> >>>> While I think I have almost solved the problem of network disconnects,
> >>>> It downed on me a major problem:
> >>>> When a 'local' disk crashes, the kernel will probably hang/panic/crash.
> >>>> if i don't try to recover, then there is no change in the above scenario.
> >>>> if i try to recover, then the client does not know that it should
> >>>> umount/fsck/mount.
> >>>> While all this seems familiar, removing  a floppy/disk-on-key while it's
> >>>> mounted, we could always say "you shouldn't have done that!", with
> >>>> a network connection, it can happen very often - rebooting the target, a
> >>>> network hickup, etc.
> >>>> =20
> >>>> So, any ideas?
> >>> In my opinion it should be done this way:
> >>>
> >>> You have a queue of I/O requests. You send the to the other end and wait
> >>> for confirmation. Until confirmation is received, you keep the requests
> >>> queued. If the other end dies, you try to reconnect (until some timeout
> >>> expires, the processes which send those requests will just wait), if you
> >>> reconnect successfully, you resend not-confirmed requests, if you won't
> >>> be able to reconnect, you just pass the errors up.
> >>>
> >>> This is what I did in ggate and it seems to work.
> >> That is basically what i'm doing - unacked request get requed.
> >> the problem I fear (and maybe I'm paranoid :-):
> > 
> > Paranoia is a Good Thing(TM) in data storage land :-)
> > 
> >> assume the following scenario, the client(initiator) sends a write command,
> >> the target acks it, then it crashes, if the write was never completed,
> >> the initiator goes on as nothing ever happened. 
> > 
> > Yes, but what can the initiator do about that?  I mean, it does not have any
> > visibility of what the target has (or has not) done with the data.  '
> > 
> > This is roughly the same as a RAID box accepting a write into a writeback cache
> > and ACK-ing to the host.  You can only assume that the RAID box' cache
> > will get flushed to the spindles properly.  All the usual horror scenarios
> > with a broken battery backup of the cache and a powerfailure etc apply here.
> > 
> > Wilko
> > 
> 
> I forget, does iSCSI have a concept of a flush_cache command, or the
> equivalent of what parallel SCSI does with ordered tags?

not realy - or I can't find it. iSCSI is mainly and envelope for
scsi commands, so whatever the CAM does, it will pass it on. 
There are some managemenet commands, so the target can tell the initiator
that it's going down for example (and what should the driver
do in such a case in freebsd?)

>                                                           If so, then
> that's how your app or OS knows that the transaction got committed to
> stable storage.  It's been long assumed in the external storage world
> that you are at the mercy of the external storage cache, so the problem
> that Danny is referring to is nothing new.  The real question is how
> to implement the equivalent mechanism that iSCSI provides in a way that
> the OS/app can make use of it.  For example, CAM issues an ordered tag
> periodically to flush the disk cache to stable storage. 
nice, (or wishful thinking :-), the scsi part of iSCSI is/can be 
software/virtual.

>                                                         Most storage
> drivers, including CAM, will issue some sort of a flush_cache command to
> the controller and media during system shutdown.

this took me a long time to fix! the userland program got killed at shutdown,
the link was lost, and so there was no way to flush buffers, fixed by calling
fget(...) too.

I guess I can summarize: (and use the 3 monkey law :-)
	1- assume the target is 'well behaved' and will flush cache.
	2- there is - currently - no way to tell the OS that not all
	   seems to be as expected.
	3- keep quiet and hope for the best.
danny


From owner-freebsd-scsi@FreeBSD.ORG  Sat Jan 13 17:42:56 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 993FD16A412;
	Sat, 13 Jan 2007 17:42:56 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.freebsd.org (Postfix) with ESMTP id 200CC13C459;
	Sat, 13 Jan 2007 17:42:56 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11])
	(authenticated bits=0)
	by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id l0DHgSel038730;
	Sat, 13 Jan 2007 10:42:33 -0700 (MST)
	(envelope-from scottl@samsco.org)
Message-ID: <45A91A02.906@samsco.org>
Date: Sat, 13 Jan 2007 10:42:26 -0700
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US;
	rv:1.8.1.2pre) Gecko/20061227 SeaMonkey/1.1
MIME-Version: 1.0
To: Danny Braniss <danny@cs.huji.ac.il>
References: <E1H5S7E-000BS0-RR@cs1.cs.huji.ac.il>
	<20070112195549.GA77181@freebie.xs4all.nl>
	<45A7F6A4.4030707@samsco.org> <E1H5ftb-000Okd-FO@cs1.cs.huji.ac.il>
In-Reply-To: <E1H5ftb-000Okd-FO@cs1.cs.huji.ac.il>
X-Enigmail-Version: 0.94.1.2
X-Enigmail-Version: 0.94.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (pooker.samsco.org [168.103.85.57]);
	Sat, 13 Jan 2007 10:42:33 -0700 (MST)
X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed 
	version=3.1.1
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org
Cc: Wilko Bulte <wb@freebie.xs4all.nl>, Pawel Jakub Dawidek <pjd@freebsd.org>,
	freebsd-hackers@freebsd.org, freebsd-scsi@freebsd.org
Subject: Re: iSCSI disconnects dilema
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Jan 2007 17:42:56 -0000

Danny Braniss wrote:
>> I forget, does iSCSI have a concept of a flush_cache command, or the
>> equivalent of what parallel SCSI does with ordered tags?
> 
> not realy - or I can't find it. iSCSI is mainly and envelope for
> scsi commands, so whatever the CAM does, it will pass it on. 
> There are some managemenet commands, so the target can tell the initiator
> that it's going down for example (and what should the driver
> do in such a case in freebsd?)
> 

If the periph is open (i.e. mounted), I'd just ignore this and have the
stack go through a normal retry timeout cycle to see if the device comes
back.  If it's closed, then I'd remove the periph.  Knowing if it's
opened or closed is likely hard to do from the iSCSI driver, which is
one reason why iSCSI knowledge needs to eventually be moved upwards in
CAM.

>>                                                           If so, then
>> that's how your app or OS knows that the transaction got committed to
>> stable storage.  It's been long assumed in the external storage world
>> that you are at the mercy of the external storage cache, so the problem
>> that Danny is referring to is nothing new.  The real question is how
>> to implement the equivalent mechanism that iSCSI provides in a way that
>> the OS/app can make use of it.  For example, CAM issues an ordered tag
>> periodically to flush the disk cache to stable storage. 
> nice, (or wishful thinking :-), the scsi part of iSCSI is/can be 
> software/virtual.
> 

If the target device returns a successful completion from a command, the
initiator must assume that it's not lying.  You could do a flush/sync
cache command after every I/O, but then you'd have a completely
unacceptable level of performance.  But again, this is not a new problem
specific to iSCSI.  It's long been a design consideration of external
storage, and is why external storage 1) carries a high price tag to
accompany good engineering and testing, and 2) comes with some form of
battery backup, to prevent data loss in case of power loss.

>>                                                         Most storage
>> drivers, including CAM, will issue some sort of a flush_cache command to
>> the controller and media during system shutdown.
> 
> this took me a long time to fix! the userland program got killed at shutdown,
> the link was lost, and so there was no way to flush buffers, fixed by calling
> fget(...) too.
> 
> I guess I can summarize: (and use the 3 monkey law :-)
> 	1- assume the target is 'well behaved' and will flush cache.
> 	2- there is - currently - no way to tell the OS that not all
> 	   seems to be as expected.
> 	3- keep quiet and hope for the best.
> danny
> 
> 

So you had a scenario where a program was doing I/O right up to system
(initiator) shutdown, and some of those I/O's got lost in the process?
I guess I don't understand why the OS didn't flush all outstanding I/O
buffers after terminating the program and before finishing the shutdown.
Maybe you are doing something illegal in your driver, or maybe you need
to implement a kernel shutdown hook that will allow you to block the
shutdown until everything is flushed.

Scott

From owner-freebsd-scsi@FreeBSD.ORG  Sat Jan 13 18:53:54 2007
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 9DC5116A403;
	Sat, 13 Jan 2007 18:53:54 +0000 (UTC)
	(envelope-from gibbs@scsiguy.com)
Received: from ns1.scsiguy.com (mail.scsiguy.com [70.89.174.89])
	by mx1.freebsd.org (Postfix) with ESMTP id 56B0A13C428;
	Sat, 13 Jan 2007 18:53:54 +0000 (UTC)
	(envelope-from gibbs@scsiguy.com)
Received: from [70.89.174.89] (www.scsiguy.com [70.89.174.89])
	by ns1.scsiguy.com (8.13.8/8.13.8) with ESMTP id l0DII5h0015549;
	Sat, 13 Jan 2007 11:18:05 -0700 (MST)
	(envelope-from gibbs@scsiguy.com)
Message-ID: <45A9225D.4080907@scsiguy.com>
Date: Sat, 13 Jan 2007 11:18:05 -0700
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
User-Agent: Thunderbird 1.5.0.8 (X11/20061214)
MIME-Version: 1.0
To: mjacob@freebsd.org
References: <20070104225519.Q92958@ns1.feral.com>
	<459E8AE7.90104@samsco.org>	<20070105093930.Y34456@ns1.feral.com>
	<459E97E6.4000603@samsco.org>	<459E989C.2020602@samsco.org>
	<20070105103431.A34456@ns1.feral.com>
	<20070105104021.D34456@ns1.feral.com>
In-Reply-To: <20070105104021.D34456@ns1.feral.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-scsi@freebsd.org
Subject: Re: CAM rescanner thread?
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Jan 2007 18:53:54 -0000

 > Actually, no. Now that I think about it and look at the code in
 > cam_xpt.c, AC_FOUND_DEVICE seems to have a different semantic. It
 > seems to be an announcement to all periph's who care *after* the
 > device has been probed and configured.

Yes.

 > If you look at xpt_async itself, it walks existing target and device
 > entries delivering the async_code. Even if the path for the async
 > event is a wildcard, it still needs a cam_ed to deliver something
 > to.

There are wildcard cam_ed's in the tree that allow callbacks to
be registered for events that happen at any level - even a fully
wildcarded path.

 > The broadcast async stuff appears like it is *thinking* about having
 > this done. In fact, code in da (daasync) seems to want to do this-
 > but it requires initial inquiry data (via a ccb_getdev argument)
 > which really makes me scratch my head a bit.

AC_FOUND_DEVICE should only be issued once the transport layer believes
that a device is configured sufficiently to be used.

 > This is a prime example of how not having a mind-meld with Ken or
 > Justin really hurts. We can ask them what they were thinking about
 > this, and it'll probably make sense, but because this isn't all
 > very highly documented the architecture is often what you *guess*
 > it is :-).

When the CAM code for FreeBSD was originally written, CAM3 was in
development but not quite out yet.  The draft documents contain
some fledgling support for dynamic configuration and binding operations
that dissassociate physical from logic addressing.  You can still get
CAM-3 here:

    http://www.t10.org/ftp/t10/drafts/cam3/cam3r03.pdf

It's discovery and bind CCB types may be a good starting point for
addressing these issues.  With the discovery process moved to a
thread and some augmentation to XPT_SCAN_*, we should be good enough
for now.

The only tricky part about using CCBs to intiate scanning is that
they potentially require the allocation of memory from interrupt context.
It would be nice to provide a service to all SIMs that can perform
dynamic discovery such that they have a high probability of attaining
their CCB in these situations.

--
Justin