From owner-freebsd-hackers@FreeBSD.ORG Tue Jan 9 14:53:25 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8683A16A412; Tue, 9 Jan 2007 14:53:25 +0000 (UTC) (envelope-from lists@jnielsen.net) Received: from ns1.jnielsen.net (ns1.jnielsen.net [69.55.238.237]) by mx1.freebsd.org (Postfix) with ESMTP id 4D25013C44C; Tue, 9 Jan 2007 14:53:25 +0000 (UTC) (envelope-from lists@jnielsen.net) Received: from localhost (jn@ns1 [69.55.238.237]) (authenticated bits=0) by ns1.jnielsen.net (8.12.9p2/8.12.9) with ESMTP id l09EY44o042517; Tue, 9 Jan 2007 06:34:05 -0800 (PST) (envelope-from lists@jnielsen.net) From: John Nielsen To: freebsd-hackers@freebsd.org Date: Tue, 9 Jan 2007 09:31:28 -0500 User-Agent: KMail/1.9.5 References: In-Reply-To: X-Face: #X5#Y*q>F:]zT!DegL3z5Xo'^MN[$8k\[4^3rN~wm=s=Uw(sW}R?3b^*f1Wu*.<=?utf-8?q?of=5F4NrS=0A=09P*M/9CpxDo!D6?=)IY1w<9B1jB; tBQf[RU-R<,I)e"$q7N7 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200701090931.28786.lists@jnielsen.net> X-Virus-Scanned: ClamAV version 0.88.4, clamav-milter version 0.88.4 on ns1.jnielsen.net X-Virus-Status: Clean Cc: freebsd-scsi@freebsd.org Subject: Re: iSCSI disconnects dilema X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jan 2007 14:53:25 -0000 On Tuesday 09 January 2007 02:06, Danny Braniss wrote: > Hi, > While I think I have almost solved the problem of network disconnects, > It downed on me a major problem: > When a 'local' disk crashes, the kernel will probably hang/panic/crash. > if i don't try to recover, then there is no change in the above scenario. > if i try to recover, then the client does not know that it should > umount/fsck/mount. > While all this seems familiar, removing a floppy/disk-on-key while it's > mounted, we could always say "you shouldn't have done that!", with > a network connection, it can happen very often - rebooting the target, a > network hickup, etc. > > So, any ideas? I think that an iSCSI network disconnect (if handled properly) is more like a bad/flakey set of sectors and/or extremely high latency than a total disk crash. The initiator should stall as long as it can while trying to reconnect the session, and then send "hardware" timeout errors up the stack. The the rest of the OS should handle those the same as it would any other timeout errors--retry a certain number of times and then fail. I don't know how graceful the failure case is (perhaps not very), but it's an honest approximation. The above approach is IMO more than adequate for network interruptions lasting a few seconds (or a bit more). I'm not sure there's anything you can realistically do more than that. Administrators who intentionally reboot a nonredundant iSCSI target while it has active sessions are asking for trouble, and if the reboot is accidental they should do one or more of a) know to run fsck manually, b) get a better UPS, c) get a more stable/redundant iSCSI target device. Disclaimer: I know next to nothing about kernel programming, device driver development, or scsi in general. I've just been playing with and thinking about iSCSI on FreeBSD a fair amount lately. Thanks for your continued work on this. JN