From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 12 19:31:06 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D6FB016A47E; Fri, 12 Jan 2007 19:31:06 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from cs1.cs.huji.ac.il (cs1.cs.huji.ac.il [132.65.16.10]) by mx1.freebsd.org (Postfix) with ESMTP id 90A3713C480; Fri, 12 Jan 2007 19:31:06 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from pampa.cs.huji.ac.il ([132.65.80.32]) by cs1.cs.huji.ac.il with esmtp id 1H5S7E-000BS0-RR; Fri, 12 Jan 2007 21:31:04 +0200 X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.2 To: Pawel Jakub Dawidek In-reply-to: Your message of Fri, 12 Jan 2007 20:02:49 +0100 . Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 12 Jan 2007 21:31:04 +0200 From: Danny Braniss Message-ID: Cc: freebsd-scsi@FreeBSD.org, freebsd-hackers@freebsd.org Subject: Re: iSCSI disconnects dilema X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jan 2007 19:31:06 -0000 > > --s/l3CgOIzMHHjg/5 > Content-Type: text/plain; charset=iso-8859-2 > Content-Disposition: inline > Content-Transfer-Encoding: quoted-printable > > On Tue, Jan 09, 2007 at 09:06:46AM +0200, Danny Braniss wrote: > > Hi, > > While I think I have almost solved the problem of network disconnects, > > It downed on me a major problem: > > When a 'local' disk crashes, the kernel will probably hang/panic/crash. > > if i don't try to recover, then there is no change in the above scenario. > > if i try to recover, then the client does not know that it should > > umount/fsck/mount. > > While all this seems familiar, removing a floppy/disk-on-key while it's > > mounted, we could always say "you shouldn't have done that!", with > > a network connection, it can happen very often - rebooting the target, a > > network hickup, etc. > >=20 > > So, any ideas? > > In my opinion it should be done this way: > > You have a queue of I/O requests. You send the to the other end and wait > for confirmation. Until confirmation is received, you keep the requests > queued. If the other end dies, you try to reconnect (until some timeout > expires, the processes which send those requests will just wait), if you > reconnect successfully, you resend not-confirmed requests, if you won't > be able to reconnect, you just pass the errors up. > > This is what I did in ggate and it seems to work. That is basically what i'm doing - unacked request get requed. the problem I fear (and maybe I'm paranoid :-): assume the following scenario, the client(initiator) sends a write command, the target acks it, then it crashes, if the write was never completed, the initiator goes on as nothing ever happened. danny