From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 12 20:14:12 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 09EC516A415; Fri, 12 Jan 2007 20:14:12 +0000 (UTC) (envelope-from wb@freebie.xs4all.nl) Received: from smtp-vbr16.xs4all.nl (smtp-vbr16.xs4all.nl [194.109.24.36]) by mx1.freebsd.org (Postfix) with ESMTP id 9654413C474; Fri, 12 Jan 2007 20:14:11 +0000 (UTC) (envelope-from wb@freebie.xs4all.nl) Received: from freebie.xs4all.nl (freebie.xs4all.nl [213.84.32.253]) by smtp-vbr16.xs4all.nl (8.13.8/8.13.8) with ESMTP id l0CJtow6022924; Fri, 12 Jan 2007 20:55:51 +0100 (CET) (envelope-from wb@freebie.xs4all.nl) Received: from freebie.xs4all.nl (localhost [127.0.0.1]) by freebie.xs4all.nl (8.13.8/8.13.3) with ESMTP id l0CJtoGl077324; Fri, 12 Jan 2007 20:55:50 +0100 (CET) (envelope-from wb@freebie.xs4all.nl) Received: (from wb@localhost) by freebie.xs4all.nl (8.13.8/8.13.6/Submit) id l0CJto9I077323; Fri, 12 Jan 2007 20:55:50 +0100 (CET) (envelope-from wb) Date: Fri, 12 Jan 2007 20:55:50 +0100 From: Wilko Bulte To: Danny Braniss Message-ID: <20070112195549.GA77181@freebie.xs4all.nl> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.11 X-Virus-Scanned: by XS4ALL Virus Scanner Cc: freebsd-scsi@freebsd.org, Pawel Jakub Dawidek , freebsd-hackers@freebsd.org Subject: Re: iSCSI disconnects dilema X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jan 2007 20:14:12 -0000 On Fri, Jan 12, 2007 at 09:31:04PM +0200, Danny Braniss wrote.. > > > > --s/l3CgOIzMHHjg/5 > > Content-Type: text/plain; charset=iso-8859-2 > > Content-Disposition: inline > > Content-Transfer-Encoding: quoted-printable > > > > On Tue, Jan 09, 2007 at 09:06:46AM +0200, Danny Braniss wrote: > > > Hi, > > > While I think I have almost solved the problem of network disconnects, > > > It downed on me a major problem: > > > When a 'local' disk crashes, the kernel will probably hang/panic/crash. > > > if i don't try to recover, then there is no change in the above scenario. > > > if i try to recover, then the client does not know that it should > > > umount/fsck/mount. > > > While all this seems familiar, removing a floppy/disk-on-key while it's > > > mounted, we could always say "you shouldn't have done that!", with > > > a network connection, it can happen very often - rebooting the target, a > > > network hickup, etc. > > >=20 > > > So, any ideas? > > > > In my opinion it should be done this way: > > > > You have a queue of I/O requests. You send the to the other end and wait > > for confirmation. Until confirmation is received, you keep the requests > > queued. If the other end dies, you try to reconnect (until some timeout > > expires, the processes which send those requests will just wait), if you > > reconnect successfully, you resend not-confirmed requests, if you won't > > be able to reconnect, you just pass the errors up. > > > > This is what I did in ggate and it seems to work. > > That is basically what i'm doing - unacked request get requed. > the problem I fear (and maybe I'm paranoid :-): Paranoia is a Good Thing(TM) in data storage land :-) > assume the following scenario, the client(initiator) sends a write command, > the target acks it, then it crashes, if the write was never completed, > the initiator goes on as nothing ever happened. Yes, but what can the initiator do about that? I mean, it does not have any visibility of what the target has (or has not) done with the data. ' This is roughly the same as a RAID box accepting a write into a writeback cache and ACK-ing to the host. You can only assume that the RAID box' cache will get flushed to the spindles properly. All the usual horror scenarios with a broken battery backup of the cache and a powerfailure etc apply here. Wilko -- Wilko Bulte wilko@FreeBSD.org